Publications
* denotes equal contribution, † denotes corresponding author.
See also my Google Scholar profile for a complete list.
Visual-RFT: Visual Reinforcement Fine-Tuning
IEEE/CVF International Conference on Computer Vision (ICCV), 2025
ProSA: Assessing and Understanding the Prompt Sensitivity of LLMs
Empirical Methods in Natural Language Processing (EMNLP), Findings, 2024
InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD
Conference on Neural Information Processing Systems (NeurIPS), 2024
GMAI-MMBench: A Comprehensive Multimodal Evaluation Benchmark Towards General Medical AI
Conference on Neural Information Processing Systems (NeurIPS), Datasets & Benchmarks Track, 2024
ShareGPT4Video: Improving Video Understanding and Generation with Better Captions
Conference on Neural Information Processing Systems (NeurIPS), Datasets & Benchmarks Track, 2024
Prism: A Framework for Decoupling and Assessing the Capabilities of VLMs
Conference on Neural Information Processing Systems (NeurIPS), 2024
Are We on the Right Way for Evaluating Large Vision-Language Models?
Conference on Neural Information Processing Systems (NeurIPS), 2024
MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video Understanding
Conference on Neural Information Processing Systems (NeurIPS), Datasets & Benchmarks Track, 2024
Co-first author & Corresponding author.
MMBench: Is Your Multi-modal Model an All-around Player?
European Conference on Computer Vision (ECCV) (Oral), 2024
Co-first author & Corresponding author. A systematic benchmark for multi-modal models.
VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models
ACM International Conference on Multimedia (MM), 2024
First author & Corresponding author. An all-in-one toolkit supporting 220+ LMMs and 80+ benchmarks.
MathBench: Evaluating the Theory and Application Proficiency of LLMs with a Hierarchical Mathematics Benchmark
Annual Meeting of the Association for Computational Linguistics (ACL), Findings, 2024
Ada-LEval: Evaluating Long-Context LLMs with Length-Adaptable Benchmarks
Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), 2024
BotChat: Evaluating LLMs’ Capabilities of Having Multi-Turn Dialogues
Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), 2024
JourneyDB: A Benchmark for Generative Image Understanding
Conference on Neural Information Processing Systems (NeurIPS), Datasets & Benchmarks Track, 2023
SkeleTR: Towards Skeleton-based Action Recognition in the Wild
IEEE/CVF International Conference on Computer Vision (ICCV), 2023
Self-supervised Action Representation Learning from Partial Spatio-Temporal Skeleton Sequences
AAAI Conference on Artificial Intelligence (AAAI), 2023
PYSKL: Towards Good Practices for Skeleton Action Recognition
ACM International Conference on Multimedia (MM), 2022
TransRank: Self-supervised Video Representation Learning via Ranking-based Transformation Recognition
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (Oral), 2022
Oral Presentation
OCSampler: Compressing Videos to One Clip with Single-step Sampling
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022
Revisiting Skeleton-based Action Recognition
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (Oral), 2022
Oral Presentation
Omni-sourced Webly-supervised Learning for Video Recognition
European Conference on Computer Vision (ECCV), 2020
