Zhe Chen (陈喆)

Phd Candidate at Nanjing University

github
email
google scholar

About Me

My name is Zhe Chen (陈喆), now I am a third-year PhD candidate in the School of Computer Science at Nanjing University (NJU), supervised by Prof. Tong Lu.

I started my studies in 2020 through a combined Master's and PhD program, which includes 2 years for the master's degree and 4 years for the PhD, and I expect to graduate in 2026.

My research interests include multimodal large language models (MLLMs), vision foundation models (VFMs), and visual perception (e.g., detection & segmentation).

News

More News

Education & Experiences

  • Nanjing University, Nanjing, China
    Sept 2020 - Present

  • Zhejiang University of Science and Technology, Hangzhou, China
    Sept 2016 - June 2020

Selected Publications

* refers to the co-first authors. The full paper list can be found on Google Scholar.

Multimodal Large Language Model

InternVL 2.5: Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling

Zhe Chen*, Weiyun Wang*, Yue Cao*, Yangzhou Liu*, Zhangwei Gao*, Erfei Cui*, Jinguo Zhu*, Shenglong Ye*, Hao Tian*, Zhaoyang Liu*, Lixin Gu, Xuehui Wang, Qingyun Li, Yimin Ren, Zixuan Chen, Jiapeng Luo, Jiahao Wang, Tan Jiang, Bo Wang, Conghui He, Botian Shi, Xingcheng Zhang, Han Lv, Yi Wang, Wenqi Shao, Pei Chu, Zhongying Tu, Tong He, Zhiyong Wu, Huipeng Deng, Jiaye Ge, Kai Chen, Kaipeng Zhang, Limin Wang, Min Dou, Lewei Lu, Xizhou Zhu, Tong Lu, Dahua Lin, Yu Qiao, Jifeng Dai, Wenhai Wang

Technical Report, 2024

[Paper] [BibTex] [Code ]

InternVL 1.5: How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites

Zhe Chen*, Weiyun Wang*, Hao Tian*, Shenglong Ye*, Zhangwei Gao, Erfei Cui, ..., Min Dou, Lewei Lu, Xizhou Zhu, Tong Lu, Dahua Lin, Yu Qiao, Jifeng Dai, Wenhai Wang#

Science China Information Sciences (CCF-A), 2024

Introduction: A Pioneering Open-Source Alternative to GPT-4V.

[Paper] [BibTex] [Code ] [中文解读]

InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks

Zhe Chen, Jiannan Wu, Wenhai Wang, Weijie Su, Guo Chen, Sen Xing, Zhong Muyan, Qinglong Zhang, Xizhou Zhu, Lewei Lu, Bin Li, Ping Luo, Tong Lu, Yu Qiao, Jifeng Dai#

CVPR Oral, 2024 | Most Influential CVPR 2024 Papers (Rank 12)

Introduction: InternVL scales up the ViT to 6B parameters and aligns it with LLM.

[Paper] [BibTex] [Code ] [Poster] [中文解读]

VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks

Wenhai Wang*, Zhe Chen*, Xiaokang Chen*, Jiannan Wu*, Xizhou Zhu, Gang Zeng, Ping Luo, Tong Lu, Jie Zhou, Yu Qiao, Jifeng Dai#

NeurIPS, 2023

Introduction: We present an LLM-based framework for vision-centric tasks, termed VisionLLM.

[Paper] [BibTex] [Code ] [Poster]

Vision Foundation Model

InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions

Wenhai Wang*, Jifeng Dai*, Zhe Chen*, Zhenhang Huang*, Zhiqi Li*, Xizhou Zhu*, Xiaowei Hu, Tong Lu, Lewei Lu, Hongsheng Li, Xiaogang Wang, Yu Qiao#

CVPR Highlight, 2023 | Most Influential CVPR 2023 Papers (Rank 10)

Introduction: This work presents a new large-scale CNN-based foundation model, termed InternImage.

[Paper] [BibTex] [Code ] [Poster] [中文解读]

Vision Transformer Adapter for Dense Predictions

Zhe Chen*, Yuchen Duan*, Wenhai Wang#, Junjun He, Tong Lu#, Jifeng Dai, Yu Qiao

ICLR Spotlight, 2023

Introduction: This work present a simple yet powerful adapter for pure ViT, which can remedy the defects of ViT and achieve comparable performance to vision-specific models in dense prediction tasks.

[Paper] [BibTex] [Code ] [Poster] [Slides] [中文解读]

Visual Perception

Bounding Box Stability against Feature Dropout Reflects Detector Generalization across Environments

Yang Yang, Wenhai Wang, Zhe Chen, Jifeng Dai, Liang Zheng#

ICLR Spotlight, 2024

Introduction: A brand-new data-centric problem of estimating the detector performance in an unlabeled test domain.

[Paper] [BibTex] [Code ] [Poster]

GeoDiffusion: Text-Prompted Geometric Control for Object Detection Data Generation

Kai Chen*, Enze Xie*, Zhe Chen, Lanqing Hong#, Zhenguo Li, Dit-Yan Yeung

ICLR, 2024

Introduction: GeoDiffusion translates geometric conditions into text prompts, enhancing T2I models for generating detection data, and improves object detector performance.

[Paper] [BibTex] [Code ] [Poster]

AVSegFormer: Audio-Visual Segmentation with Transformer

Shengyi Gao, Zhe Chen, Guo Chen, Wenhai Wang, Tong Lu#

AAAI, 2024

Introduction: This work presents a new framework for AVS tasks that leverages the transformer architecture.

[Paper] [BibTex] [Code ] [Poster]

DDP: Diffusion Model for Dense Visual Prediction

Yuanfeng Ji*, Zhe Chen*, Enze Xie#, Lanqing Hong, Xihui Liu, Zhaoqiang Liu, Tong Lu, Zhenguo Li, Ping Luo

ICCV, 2023

Introduction: We propose a simple, efficient, yet powerful framework for dense visual predictions based on the conditional diffusion pipeline.

[Paper] [BibTex] [Code ] [Poster]

Other Papers

Graph Propagation Transformer for Graph Representation Learning

Zhe Chen, Hao Tan, Tao Wang, Tianrun Shen, Tong Lu#, Qiuying Peng, Cheng Cheng, Yue Qi

IJCAI, 2023

Introduction: This work presents a novel transformer architecture for graph representation learning.

[Paper] [BibTex] [Code ]

Towards Ultra-Resolution Neural Style Transfer via Thumbnail Instance Normalization

Zhe Chen, Wenhai Wang#, Enze Xie, Tong Lu#, Ping Luo

AAAI, 2022

Introduction: URST is a versatile framework for ultra-high resolution style transfer under limited GPU memory resources.

[Paper] [BibTex] [Code ] [Poster] [中文解读]

Awards & Honors

Contests

  • Toloka Visual Question Answering Challenge, WSDM Cup 2023, 2023, 1st Place.
  • The 2nd Ego4D Challenge, ECCV Workshop, 2022, 7 Top-1 Rankings.
  • The 2nd National Artificial Intelligence Challenge (NAIC), Remote Sensing Semantic Segmentation Track, 2020, 1st Place (1,000,000 RMB Bonus).
  • The 2nd China Gaofen Cup Beautiful Countryside Competition, Remote Sensing Crop Classification Track, 2019, 3rd Prize (5,000 RMB Bonus).
  • The 9th National Undergraduate E-commerce "Innovation, Creativity and Entrepreneurship" Challenge, Zhejiang Division, 2019, 1rd Prize.
  • The 9nd National Undergraduate Service Outsourcing Competition, Captcha Recognition Task, 2018, 2rd Prize.

Honors

  • Youth PhD Student Research Project under the National Natural Science Foundation
  • Nanjing University Egret Scholarship
  • Outstanding Graduate of Zhejiang Province
  • Zhejiang Provincial Government Scholarship

Some of My Friends

Last Updated: 2/7/2025, 2:07:36 PM