
About Me
My name is Zhe Chen (陈喆), now I am a third-year PhD candidate in the School of Computer Science at Nanjing University (NJU), supervised by Prof. Tong Lu.
I started my studies in 2020 through a combined Master's and PhD program, which includes 2 years for the master's degree and 4 years for the PhD, and I expect to graduate in 2026.
My research interests include multimodal large language models (MLLMs), vision foundation models (VFMs), and visual perception (e.g., detection & segmentation).
News
- [2025-01] 🎉🎉 Vision-RWKV and OmniCorpus are accepted as ICLR 2025 spotlight papers.
- [2024-12] 🎉 InternVL 1.5 is accepted by Science China Information Sciences.
- [2024-12] Our team released InternVL 2.5.
- [2024-07] Our team released InternVL 2.0.
- [2024-04] Our team released InternVL 1.5.
- [2024-02] InternVL (oral) is accepted by CVPR 2024.
- [2024-01] GeoDiffusion, All-Seeing, and BoS (spotlight) are accepted by ICLR 2024.
- [2023-10] AVSegFormer is accepted by AAAI 2024.
- [2023-10] InternImage is selected as one of CVPR 2023 Top-10 Influential Papers.
- [2023-09] VisionLLM is accepted by NeurIPS 2023.
- [2023-07] DDP is accepted by ICCV 2023.
- [2023-04] GPTrans is accepted by IJCAI 2023.
- [2023-02] InternImage (highlight) is accepted by CVPR 2023.
- [2023-01] ViT-Adapter (spotlight) is accepted by ICLR 2023.
- [2021-12] URST is accepted by AAAI 2022.
More News
- [2023-05] We release InternGPT, which allows you to interact with ChatGPT by clicking, dragging and drawing using a pointing device.
- [2023-01] Our team wins the champion of WSDM Cup 2023 Toloka VQA Challenge.
- [2022-11] Our InternImage-H created new record of 65.4 box AP on COCO test-dev!
- [2022-09] Our team wins the champions in 7 tracks of Ego4D ECCV2022 Challenge.
- [2020-12] Our team wins the champion of NAIC 2020 Remote Sensing Semantic Segmentation Task (1,000,000 RMB bonus).
- [2020-05] SiameseCCR is accepted by IET Image Processing.
Education & Experiences
Nanjing University, Nanjing, China
Sept 2020 - PresentZhejiang University of Science and Technology, Hangzhou, China
Sept 2016 - June 2020
Selected Publications
* refers to the co-first authors. The full paper list can be found on Google Scholar.
Multimodal Large Language Model

InternVL 2.5: Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling
Zhe Chen*, Weiyun Wang*, Yue Cao*, Yangzhou Liu*, Zhangwei Gao*, Erfei Cui*, Jinguo Zhu*, Shenglong Ye*, Hao Tian*, Zhaoyang Liu*, Lixin Gu, Xuehui Wang, Qingyun Li, Yimin Ren, Zixuan Chen, Jiapeng Luo, Jiahao Wang, Tan Jiang, Bo Wang, Conghui He, Botian Shi, Xingcheng Zhang, Han Lv, Yi Wang, Wenqi Shao, Pei Chu, Zhongying Tu, Tong He, Zhiyong Wu, Huipeng Deng, Jiaye Ge, Kai Chen, Kaipeng Zhang, Limin Wang, Min Dou, Lewei Lu, Xizhou Zhu, Tong Lu, Dahua Lin, Yu Qiao, Jifeng Dai, Wenhai Wang
Technical Report, 2024

InternVL 1.5: How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites
Zhe Chen*, Weiyun Wang*, Hao Tian*, Shenglong Ye*, Zhangwei Gao, Erfei Cui, ..., Min Dou, Lewei Lu, Xizhou Zhu, Tong Lu, Dahua Lin, Yu Qiao, Jifeng Dai, Wenhai Wang#
Science China Information Sciences (CCF-A), 2024
Introduction: A Pioneering Open-Source Alternative to GPT-4V.

InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
Zhe Chen, Jiannan Wu, Wenhai Wang, Weijie Su, Guo Chen, Sen Xing, Zhong Muyan, Qinglong Zhang, Xizhou Zhu, Lewei Lu, Bin Li, Ping Luo, Tong Lu, Yu Qiao, Jifeng Dai#
CVPR Oral, 2024 | Most Influential CVPR 2024 Papers (Rank 12)
Introduction: InternVL scales up the ViT to 6B parameters and aligns it with LLM.

VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks
Wenhai Wang*, Zhe Chen*, Xiaokang Chen*, Jiannan Wu*, Xizhou Zhu, Gang Zeng, Ping Luo, Tong Lu, Jie Zhou, Yu Qiao, Jifeng Dai#
NeurIPS, 2023
Introduction: We present an LLM-based framework for vision-centric tasks, termed VisionLLM.
Vision Foundation Model

InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions
Wenhai Wang*, Jifeng Dai*, Zhe Chen*, Zhenhang Huang*, Zhiqi Li*, Xizhou Zhu*, Xiaowei Hu, Tong Lu, Lewei Lu, Hongsheng Li, Xiaogang Wang, Yu Qiao#
CVPR Highlight, 2023 | Most Influential CVPR 2023 Papers (Rank 10)
Introduction: This work presents a new large-scale CNN-based foundation model, termed InternImage.

Vision Transformer Adapter for Dense Predictions
Zhe Chen*, Yuchen Duan*, Wenhai Wang#, Junjun He, Tong Lu#, Jifeng Dai, Yu Qiao
ICLR Spotlight, 2023
Introduction: This work present a simple yet powerful adapter for pure ViT, which can remedy the defects of ViT and achieve comparable performance to vision-specific models in dense prediction tasks.
Visual Perception

Bounding Box Stability against Feature Dropout Reflects Detector Generalization across Environments
Yang Yang, Wenhai Wang, Zhe Chen, Jifeng Dai, Liang Zheng#
ICLR Spotlight, 2024
Introduction: A brand-new data-centric problem of estimating the detector performance in an unlabeled test domain.

GeoDiffusion: Text-Prompted Geometric Control for Object Detection Data Generation
Kai Chen*, Enze Xie*, Zhe Chen, Lanqing Hong#, Zhenguo Li, Dit-Yan Yeung
ICLR, 2024
Introduction: GeoDiffusion translates geometric conditions into text prompts, enhancing T2I models for generating detection data, and improves object detector performance.


DDP: Diffusion Model for Dense Visual Prediction
Yuanfeng Ji*, Zhe Chen*, Enze Xie#, Lanqing Hong, Xihui Liu, Zhaoqiang Liu, Tong Lu, Zhenguo Li, Ping Luo
ICCV, 2023
Introduction: We propose a simple, efficient, yet powerful framework for dense visual predictions based on the conditional diffusion pipeline.
Other Papers


Towards Ultra-Resolution Neural Style Transfer via Thumbnail Instance Normalization
Zhe Chen, Wenhai Wang#, Enze Xie, Tong Lu#, Ping Luo
AAAI, 2022
Introduction: URST is a versatile framework for ultra-high resolution style transfer under limited GPU memory resources.
Awards & Honors
Contests
- Toloka Visual Question Answering Challenge, WSDM Cup 2023, 2023, 1st Place.
- The 2nd Ego4D Challenge, ECCV Workshop, 2022, 7 Top-1 Rankings.
- The 2nd National Artificial Intelligence Challenge (NAIC), Remote Sensing Semantic Segmentation Track, 2020, 1st Place (1,000,000 RMB Bonus).
- The 2nd China Gaofen Cup Beautiful Countryside Competition, Remote Sensing Crop Classification Track, 2019, 3rd Prize (5,000 RMB Bonus).
- The 9th National Undergraduate E-commerce "Innovation, Creativity and Entrepreneurship" Challenge, Zhejiang Division, 2019, 1rd Prize.
- The 9nd National Undergraduate Service Outsourcing Competition, Captcha Recognition Task, 2018, 2rd Prize.
Honors
- Youth PhD Student Research Project under the National Natural Science Foundation
- Nanjing University Egret Scholarship
- Outstanding Graduate of Zhejiang Province
- Zhejiang Provincial Government Scholarship
Some of My Friends
- Guo Chen, Zhiqi Li, Yuanfeng Ji, Yang Yang, Zhanhao Liang