Publications

* denotes equal contribution, † denotes corresponding author.
See also my Google Scholar profile for a complete list.

Visual-RFT: Visual Reinforcement Fine-Tuning
Ziyu Liu, Zeyi Sun, Yuhang Zang, Xiaoyi Dong, Yuanhan Cao, Haodong Duan, Dahua Lin, Jiaqi Wang
IEEE/CVF International Conference on Computer Vision (ICCV), 2025
ProSA: Assessing and Understanding the Prompt Sensitivity of LLMs
Jingming Zhuo, Songyang Zhang, Xinyu Fang, Haodong Duan, Dahua Lin, Kai Chen
Empirical Methods in Natural Language Processing (EMNLP), Findings, 2024
InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD
Xiaoyi Dong, Pan Zhang, Yuhang Zang, Yuhang Cao, Bin Wang, Linke Ouyang, Songyang Zhang, Haodong Duan, et al.
Conference on Neural Information Processing Systems (NeurIPS), 2024
GMAI-MMBench: A Comprehensive Multimodal Evaluation Benchmark Towards General Medical AI
Pengcheng Chen*, Jin Ye*, Guoan Wang*, Yanjun Li*, Zhongying Deng*, Wei Li*, Tianbin Li*, Haodong Duan, et al.
Conference on Neural Information Processing Systems (NeurIPS), Datasets & Benchmarks Track, 2024
ShareGPT4Video: Improving Video Understanding and Generation with Better Captions
Lin Chen, Xilin Wei, Jinsong Li, Xiaoyi Dong, Pan Zhang, Yuhang Zang, Zehui Chen, Haodong Duan, Bin Lin, Zhenyu Tang, et al.
Conference on Neural Information Processing Systems (NeurIPS), Datasets & Benchmarks Track, 2024
Prism: A Framework for Decoupling and Assessing the Capabilities of VLMs
Yuxuan Qiao*, Haodong Duan*†, Xinyu Fang, Junming Yang, Lin Chen, et al.
Conference on Neural Information Processing Systems (NeurIPS), 2024
Are We on the Right Way for Evaluating Large Vision-Language Models?
Lin Chen, Jinsong Li, Xiaoyi Dong, Pan Zhang, Yuhang Zang, Zehui Chen, Haodong Duan, Jiaqi Wang, Yu Qiao, Dahua Lin, Feng Zhao
Conference on Neural Information Processing Systems (NeurIPS), 2024
MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video Understanding
Xinyu Fang*, Kangrui Mao*, Haodong Duan*†, Xiangyu Zhao, Yining Li, Dahua Lin, Kai Chen
Conference on Neural Information Processing Systems (NeurIPS), Datasets & Benchmarks Track, 2024
Co-first author & Corresponding author.
MMBench: Is Your Multi-modal Model an All-around Player?
Yuan Liu*, Haodong Duan*†, Yuanhan Zhang*, Bo Li*, Songyang Zhang*, Wangbo Zhao*, et al.
European Conference on Computer Vision (ECCV) (Oral), 2024
Co-first author & Corresponding author. A systematic benchmark for multi-modal models.
VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models
Haodong Duan†, Junming Yang, Yuxuan Qiao, Xinyu Fang, Lin Chen, Yuan Liu, Xiaoyi Dong, Yuhang Zang, Pan Zhang, Jiaqi Wang, Dahua Lin, Kai Chen
ACM International Conference on Multimedia (MM), 2024
First author & Corresponding author. An all-in-one toolkit supporting 220+ LMMs and 80+ benchmarks.
MathBench: Evaluating the Theory and Application Proficiency of LLMs with a Hierarchical Mathematics Benchmark
Hongwei Liu, Zilong Zheng, Yuxuan Qiao, Haodong Duan, Zhiwei Fei, Fengzhe Zhou, Wenwei Zhang, Songyang Zhang, Dahua Lin, Kai Chen
Annual Meeting of the Association for Computational Linguistics (ACL), Findings, 2024
Ada-LEval: Evaluating Long-Context LLMs with Length-Adaptable Benchmarks
Chonghua Wang, Haodong Duan, Songyang Zhang, Dahua Lin, Kai Chen
Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), 2024
BotChat: Evaluating LLMs’ Capabilities of Having Multi-Turn Dialogues
Haodong Duan, Jueqi Wei, Chonghua Wang, Hongwei Liu, Yixiao Fang, Songyang Zhang, Dahua Lin, Kai Chen
Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), 2024
JourneyDB: A Benchmark for Generative Image Understanding
Keqiang Sun, Junting Pan, Yuying Ge, Hao Li, Haodong Duan, Xiaoshi Wu, et al.
Conference on Neural Information Processing Systems (NeurIPS), Datasets & Benchmarks Track, 2023
SkeleTR: Towards Skeleton-based Action Recognition in the Wild
Jiaqi Chen, Haodong Duan, Mingze Xu, Cheng-i Jeff Pai, Shuangrui Ding, Dahua Lin
IEEE/CVF International Conference on Computer Vision (ICCV), 2023
Self-supervised Action Representation Learning from Partial Spatio-Temporal Skeleton Sequences
Yujie Zhou, Haodong Duan, Anyi Rao, Bing Su, Jiaqi Wang
AAAI Conference on Artificial Intelligence (AAAI), 2023
PYSKL: Towards Good Practices for Skeleton Action Recognition
Haodong Duan, Jiaqi Wang, Kai Chen, Dahua Lin
ACM International Conference on Multimedia (MM), 2022
TransRank: Self-supervised Video Representation Learning via Ranking-based Transformation Recognition
Haodong Duan, Nanxuan Zhao, Kai Chen, Dahua Lin
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (Oral), 2022
Oral Presentation
OCSampler: Compressing Videos to One Clip with Single-step Sampling
Jintao Lin, Haodong Duan, Kai Chen, Dahua Lin, Limin Wang
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022
Revisiting Skeleton-based Action Recognition
Haodong Duan, Yue Zhao, Kai Chen, Dahua Lin, Bo Dai
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (Oral), 2022
Oral Presentation
Omni-sourced Webly-supervised Learning for Video Recognition
Haodong Duan, Yue Zhao, Yuanjun Xiong, Wentao Liu, Dahua Lin
European Conference on Computer Vision (ECCV), 2020
TRB: A Novel Triplet Representation for Understanding 2D Human Body
Haodong Duan, Kwan-Yee Lin, Sheng Jin, Wentao Liu, Chen Qian, Wanli Ouyang
IEEE/CVF International Conference on Computer Vision (ICCV) (Oral), 2019
Oral Presentation