Teng Wang 王腾
Hi, there! I'm Teng Wang. I am a researcher at Tencent ARC Lab, focusing on advancing multimodal foundation models and video understanding systems. Prior to this, I earned my Ph.D. in Computer Science from the University of Hong Kong (HKU) in 2024, where I was fortunate to be advised by Prof. Ping Luo and Prof. Feng Zheng. Before my doctoral studies, I completed my B.E. and M.E. degrees at Sun Yat-sen University (SYSU) under the supervision of Prof. Huicheng Zheng. Prospective Collaborators: We are actively seeking motivated research interns and collaborators to join our Multimodal Foundation Model team at Tencent ARC Lab. If you share an interest in vision-language-audio tasks, video understanding or multi-modal reasoning, feel free to reach out via email! News
ResearchMy research interests include:
Selected Publications* equal contribution
TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generation
Video understanding with large language models: A survey
UniAV: Unified Audio-Visual Perception for Multi-Task Video Localization
Reflective Instruction Tuning: Mitigating Hallucinations in Large Vision-Language Models
Caption anything: Interactive image description with diverse multimodal controls
Transferable decoding with visual entities for zero-shot image captioning
Knowledge-aware prompt tuning for generalizable vision-language models
Set-level guidance attack: Boosting adversarial transferability of vision-language pre-training models
Learning Grounded Vision-Language Representation for Versatile Understanding in Untrimmed Videos
Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline
Accelerating Vision-Language Pretraining with Free Language Modeling
VLMixer: Unpaired Vision-Language Pre-training via Cross-Modal CutMix
End-to-end dense video captioning with parallel decoding
Event-centric hierarchical representation for dense video captioning Academic service
Journal reviewer for IJCV, IEEE TNNLS, IEEE TIP, IEEE TMM, IEEE TCSVT Experience
Competitions & Awards
Rank 1 in Make-up Temporal Video Grounding Track of PIC challenge at ACM MM 2022 Rank 1 in Make-up Dense Video Captioning Track of PIC challenge at ACM MM 2022 Rank 2 in Generic Event Boundary Captioning Track of LOVEU Challenge at CVPR 2022 Rank 2 in Event Dense-Captioning Track of ActivityNet Challenge at CVPR 2020, CVPR2021, CVPR2022 Rank 3 in TinyAction Challenge at CVPR 2021 |