Haodong Duan (段浩东)

I am a researcher at ByteDance Seed, working on the evaluation and development of large language models and large multi-modality models (LMMs). I received my Ph.D. degree from the Multimedia Laboratory (MMLab) at The Chinese University of Hong Kong in 2023, supervised by Professor Dahua Lin. Before that, I received my B.S. degree in Data Science from Peking University in 2019.

My research interests span multi-modal learning, LLM/LMM evaluation, and video understanding. I have led the development of several widely-used evaluation toolkits and benchmarks, including VLMEvalKit (4k+ stars), MMBench, and OpenCompass (6.8k+ stars).

I am open to academic collaborations. Feel free to reach out via email.

News

2025.06 Visual-RFT is accepted by ICCV 2025.
2025.05 InternVL3.5 is released, achieving state-of-the-art performance across multimodal benchmarks.
2024.09 Three papers accepted by NeurIPS 2024 main conference: InternLM-XComposer2-4KHD, MMStar, Prism.
2024.09 Three papers accepted by NeurIPS 2024 Datasets & Benchmarks: ShareGPT4Video, GMAI-MMBench, MMBench-Video.
2024.08 MMBench is accepted by ECCV 2024 as Oral presentation. Oral
2024.05 MathBench is accepted by ACL 2024 Findings.
2024.03 Two papers accepted by NAACL 2024: BotChat, Ada-LEval.
2023.12 Released VLMEvalKit, an all-in-one toolkit for evaluating LMMs. Accepted by ACM MM 2024.
2023.08 Received my Ph.D. degree from MMLab @ CUHK.
2022.05 Released PYSKL, a codebase for skeleton action recognition. Accepted by ACM MM 2022.
2022.03 Three papers accepted by CVPR 2022: PoseC3D Oral, TransRank Oral, OCSampler.

Selected Publications

* denotes equal contribution, † denotes corresponding author. See the full list on my Publications page or Google Scholar.

MMBench: Is Your Multi-modal Model an All-around Player?

Yuan Liu*, Haodong Duan*†, Yuanhan Zhang*, Bo Li*, et al.

European Conference on Computer Vision (ECCV), 2024 — Oral Presentation

Paper Code Project

VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models

Haodong Duan†, Junming Yang, Yuxuan Qiao, et al.

ACM International Conference on Multimedia (MM), 2024

Paper Code

MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video Understanding

Xinyu Fang*, Kangrui Mao*, Haodong Duan*†, et al.

NeurIPS 2024 — Datasets & Benchmarks Track

Paper Code

Revisiting Skeleton-based Action Recognition (PoseC3D)

Haodong Duan, Yue Zhao, Kai Chen, Dahua Lin, Bo Dai

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022 — Oral

Paper Code

PYSKL: Towards Good Practices for Skeleton Action Recognition

Haodong Duan, Jiaqi Wang, Kai Chen, Dahua Lin

ACM International Conference on Multimedia (MM), 2022

Paper Code

Professional Activities

Conference Reviewer: ICCV (2021–2025), CVPR (2022–2025), NeurIPS (2022–2024), ECCV (2022–2024), AAAI (2022–2025), ICML (2023–2024), ICLR (2023–2025), WACV 2023, EuroGraphics 2023
Journal Reviewer: IEEE TPAMI, IJCV, IEEE TIP, Pattern Recognition, IEEE TMM

Open-Source Contributions

VLMEvalKit — 4k+ ★ OpenCompass — 6.8k+ ★ PYSKL — 1.2k+ ★ MMAction2 — 5k+ ★