Haodong Duan (段浩东)
I am a researcher at ByteDance Seed, working on the evaluation and development of large language models and large multi-modality models (LMMs). I received my Ph.D. degree from the Multimedia Laboratory (MMLab) at The Chinese University of Hong Kong in 2023, supervised by Professor Dahua Lin. Before that, I received my B.S. degree in Data Science from Peking University in 2019.
My research interests span multi-modal learning, LLM/LMM evaluation, and video understanding. I have led the development of several widely-used evaluation toolkits and benchmarks, including VLMEvalKit (4k+ stars), MMBench, and OpenCompass (6.8k+ stars).
I am open to academic collaborations. Feel free to reach out via email.
News
- 2025.06 Visual-RFT is accepted by ICCV 2025.
- 2025.05 InternVL3.5 is released, achieving state-of-the-art performance across multimodal benchmarks.
- 2024.09 Three papers accepted by NeurIPS 2024 main conference: InternLM-XComposer2-4KHD, MMStar, Prism.
- 2024.09 Three papers accepted by NeurIPS 2024 Datasets & Benchmarks: ShareGPT4Video, GMAI-MMBench, MMBench-Video.
- 2024.08 MMBench is accepted by ECCV 2024 as Oral presentation. Oral
- 2024.05 MathBench is accepted by ACL 2024 Findings.
- 2024.03 Two papers accepted by NAACL 2024: BotChat, Ada-LEval.
- 2023.12 Released VLMEvalKit, an all-in-one toolkit for evaluating LMMs. Accepted by ACM MM 2024.
- 2023.08 Received my Ph.D. degree from MMLab @ CUHK.
- 2022.05 Released PYSKL, a codebase for skeleton action recognition. Accepted by ACM MM 2022.
- 2022.03 Three papers accepted by CVPR 2022: PoseC3D Oral, TransRank Oral, OCSampler.
Selected Publications
* denotes equal contribution, † denotes corresponding author. See the full list on my Publications page or Google Scholar.
MMBench: Is Your Multi-modal Model an All-around Player?
European Conference on Computer Vision (ECCV), 2024 — Oral Presentation
VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models
ACM International Conference on Multimedia (MM), 2024
MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video Understanding
NeurIPS 2024 — Datasets & Benchmarks Track
Revisiting Skeleton-based Action Recognition (PoseC3D)
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022 — Oral
PYSKL: Towards Good Practices for Skeleton Action Recognition
ACM International Conference on Multimedia (MM), 2022
Professional Activities
- Conference Reviewer: ICCV (2021–2025), CVPR (2022–2025), NeurIPS (2022–2024), ECCV (2022–2024), AAAI (2022–2025), ICML (2023–2024), ICLR (2023–2025), WACV 2023, EuroGraphics 2023
- Journal Reviewer: IEEE TPAMI, IJCV, IEEE TIP, Pattern Recognition, IEEE TMM
