Teng Wang 王腾

I am a Ph.D. candidate (2020-Now) at the Department of Computer Science, The University of Hong Kong (HKU), fortunately supervised by Prof. Ping Luo and Prof. Feng Zheng. Before that, I obtained my B.E. and M.E. degrees from Sun Yat-sen University (SYSU) under the supervision of Prof. Huicheng Zheng. I was a research intern at Tencent AI Lab and Tencent Data Platform.

My recent research interests lie in vision-language multimodal learning and video understanding.

Email  /  CV  /  Google Scholar  /  Github

profile photo
Selected Research (Full list)

(* indicates equal contribution)

Transferable Decoding with Visual Entities for Zero-Shot Image Captioning
Junjie Fei*, Teng Wang*, Jinrui Zhang, Zhenyu He, Chengjie Wang, Feng Zheng
IEEE/CVF International Conference on Computer Vision (ICCV) 2023
[paper] [code]

Improve the generalization by overcoming the hallucination problem in LLMs-based visual models.

Knowledge-Aware Prompt Tuning for Generalizable Vision-Language Models
Baoshuo Kan*, Teng Wang*, Wenpeng Lu, Xiantong Zhen, Weili Guan, Feng Zheng
IEEE/CVF International Conference on Computer Vision (ICCV) 2023
[paper]

Wikipedia knowledge enhanced prompt tuning for CLIP-based few-shot classification.

Caption Anything: Interactive Image Description with Diverse Multimodal Controls
Teng Wang*, Jinrui Zhang*, Junjie Fei* et al.
arXiv 2023
[paper] [code] [demo]

Caption Anything generates descriptive captions for any object within an image, offering a range of language styles to accommodate diverse user preferences. It supports visual controls (mouse click) and language controls (length, sentiment, factuality, and language).

Learning Grounded Vision-Language Representation for Versatile Understanding in Untrimmed Videos
Teng Wang*, Jinrui Zhang*, Feng Zheng, Wenhao Jiang, Ran Cheng, Ping Luo
arXiv 2023
[paper] [code]

Learning video-language representation by solving a bidirectional set prediction problem. We won the first place in both video grounding and video captioning tracks of PIC Challenge 2022 .

Accelerating Vision-Language Pretraining with Free Language Modeling
Teng Wang, Yixiao Ge, Feng Zheng, Ran Cheng, Ying Shan, Xiaohu Qie, Ping Luo
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023
[paper] [code]

We accelerate the convergence of language modeling by predicting 100% tokens like Auto-Regressive (AR) meanwhile achieving competitive performance with Masked Language Modeling (MLM).

VLMixer: Unpaired Vision-Language Pre-training via Cross-Modal CutMix
Teng Wang, Wenhao Jiang, Zhichao Lu, Feng Zheng, Ran Cheng, Chengguo Yin, Ping Luo
International Conference on Machine Learning (ICML) 2022 (spotlight)
[paper] [code]

Vision-language learning from stand-alone image and text corpora by cross-modal augmentation.

End-to-End Dense Video Captioning With Parallel Decoding
Teng Wang, Ruimao Zhang, Zhichao Lu, Feng Zheng, Ran Cheng, Ping Luo
IEEE/CVF International Conference on Computer Vision (ICCV) 2021
[paper] [code]

The first DETR-style parallel decoding paradigm for dense video captioning.

Event-Centric Hierarchical Representation for Dense Video Captioning
Teng Wang, Huicheng Zheng, Mingjing Yu, Qian Tian, Haifeng Hu
IEEE Transactions on Circuits and Systems for Video Technology (TCSVT) 2020
[paper] [code]

Hierarchical visual representation across scene, event, and frame levels in untrimmed videos.

Awards
Rank 1 in Generic Event Boundary Captioning Track of LOVEU Challenge at CVPR 2023
Rank 1 in Make-up Temporal Video Grounding Track of PIC challenge at ACM MM 2022
Rank 1 in Make-up Dense Video Captioning Track of PIC challenge at ACM MM 2022
Rank 2 in Generic Event Boundary Captioning Track of LOVEU Challenge at CVPR 2022
Rank 2 in Event Dense-Captioning Track of ActivityNet Challenge at CVPR 2020, CVPR2021, CVPR2022
Rank 3 in TinyAction Challenge at CVPR 2021

Academic Service
Conference Reviewer:
International Conference on Machine Learning (ICML)
Annual Conference on Neural Information Processing Systems (NeurIPS)
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
IEEE/CVF International Conference on Computer Vision (ICCV)
European Conference on Computer Vision (ECCV)

Journal Reviewer:
International Journal of Computer Vision (IJCV)
IEEE Transactions on Multimedia (TMM)
IEEE Transactions on Circuits and Systems for Video Technology (TCSVT)
IEEE Transactions on Artificial Intelligence (TAI)