About Me
I am a second-year PhD candidate in the Department of Computer Science and Technology at Nanjing University (NJU), supervised by Prof. Tong Lu. I started my studies in 2020 through a combined Master's and PhD program, which includes two years for the master's degree and four years for the PhD.
My research interests are vision foundation model, vision-language model, and detection & segmentation.
News
- [2024-02-27] InternVL (oral) is accepted by CVPR 2024.
- [2024-01-16] GeoDiffusion, All-Seeing, and BoS (spotlight) are accepted by ICLR 2024.
- [2023-10-10] AVSegFormer is accepted by AAAI 2024.
- [2023-10-24] InternImage is is selected as one of CVPR 2023 Top-10 Influential Papers.
- [2023-09-22] VisionLLM is accepted by NeurIPS 2023.
- [2023-07-14] DDP is accepted by ICCV 2023.
- [2023-05-10] We release InternGPT, which allows you to interact with ChatGPT by clicking, dragging and drawing using a pointing device.
- [2023-04-20] GPTrans is accepted by IJCAI 2023.
- [2023-02-28] InternImage (highlight) is accepted by CVPR 2023.
- [2023-01-21] ViT-Adapter (spotlight) is accepted by ICLR 2023.
- [2023-01-17] Our team wins the champion of WSDM Cup 2023 Toloka VQA Challenge.
- [2022-11-11] Our InternImage-H created new record of 65.4 box AP on COCO test-dev!
- [2022-09-19] Our team wins the champions in 7 tracks of Ego4D ECCV2022 Challenge.
- [2021-12-01] URST is accepted by AAAI 2022.
- [2020-12-21] Our team wins the champion of NAIC 2020 Remote Sensing Semantic Segmentation Task (1,000,000 RMB bonus).
- [2020-05-12] SiameseCCR is accepted by IET Image Processing.
Education & Experiences
Nanjing University, Nanjing, China
Sept 2020 - PresentZhejiang University of Science and Technology, Zhejiang, China
Sept 2016 - June 2020
Publications
* Equal Contribution # Corresponding Author
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
Zhe Chen, Jiannan Wu, Wenhai Wang, Weijie Su, Guo Chen, Sen Xing, Zhong Muyan, Qinglong Zhang, Xizhou Zhu, Lewei Lu, Bin Li, Ping Luo, Tong Lu, Yu Qiao, Jifeng Dai#
CVPR oral, 2024
Introduction: InternVL scales up the ViT to 6B parameters and aligns it with LLM.
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions
Wenhai Wang*, Jifeng Dai*, Zhe Chen*, Zhenhang Huang*, Zhiqi Li*, Xizhou Zhu*, Xiaowei Hu, Tong Lu, Lewei Lu, Hongsheng Li, Xiaogang Wang, Yu Qiao#
CVPR highlight, 2023 | CVPR 2023 Top-10 Influential Papers
Introduction: This work presents a new large-scale CNN-based foundation model, termed InternImage.
Vision Transformer Adapter for Dense Predictions
Zhe Chen*, Yuchen Duan*, Wenhai Wang#, Junjun He, Tong Lu#, Jifeng Dai, Yu Qiao
ICLR spotlight, 2023
Introduction: This work present a simple yet powerful adapter for pure ViT, which can remedy the defects of ViT and achieve comparable performance to vision-specific models in dense prediction tasks.
Bounding Box Stability against Feature Dropout Reflects Detector Generalization across Environments
Yang Yang, Wenhai Wang, Zhe Chen, Jifeng Dai, Liang Zheng
ICLR spotlight, 2024
Introduction: A brand-new data-centric problem of estimating the detector performance in an unlabeled test domain.
GeoDiffusion: Text-Prompted Geometric Control for Object Detection Data Generation
Kai Chen, Enze Xie, Zhe Chen, Lanqing Hong, Zhenguo Li, Dit-Yan Yeung
ICLR, 2024
Introduction: GeoDiffusion translates geometric conditions into text prompts, enhancing T2I models for generating detection data, and improves object detector performance.
VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks
Wenhai Wang*, Zhe Chen*, Xiaokang Chen*, Jiannan Wu*, Xizhou Zhu, Gang Zeng, Ping Luo, Tong Lu, Jie Zhou, Yu Qiao, Jifeng Dai#
NeurIPS, 2023
Introduction: We present an LLM-based framework for vision-centric tasks, termed VisionLLM.
DDP: Diffusion Model for Dense Visual Prediction
Yuanfeng Ji*, Zhe Chen*, Enze Xie#, Lanqing Hong, Xihui Liu, Zhaoqiang Liu, Tong Lu, Zhenguo Li, Ping Luo
ICCV, 2023
Introduction: We propose a simple, efficient, yet powerful framework for dense visual predictions based on the conditional diffusion pipeline.
Towards Ultra-Resolution Neural Style Transfer via Thumbnail Instance Normalization
Zhe Chen, Wenhai Wang#, Enze Xie, Tong Lu#, Ping Luo
AAAI, 2022
Introduction: URST is a versatile framework for ultra-high resolution style transfer under limited GPU memory resources.
SiameseCCR: A Novel Method for One-shot and Few-shot Chinese CAPTCHA Recognition using Deep Siamese Network
Zhe Chen, Weifeng Ma#, Nanfan Xu, Caoting Ji, Yulai Zhang
IET Image Processing, 2020 (SCI Impact Factor: 2.373)
Introduction: We proposed a Siamese network-based method for one-shot and few-shot Chinese CAPTCHA Recognition.
Preprints
FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation
Zhe Chen, Jiahao Wang, Wenhai Wang, Guo Chen, Enze Xie, Ping Luo, Tong Lu#
Arxiv, 2021
Introduction: We propose an accurate and efficient scene text detection framework, termed FAST (i.e., faster arbitrarily-shaped text detector).
Technical Reports
InternGPT: Solving Vision-Centric Tasks by Interacting with Chatbots Beyond Language
Zhaoyang Liu, Yinan He, Wenhai Wang, Weiyun Wang, Yi Wang, Shoufa Chen, Qinglong Zhang, Yang Yang, Qingyun Li, Jiashuo Yu, Kunchang Li, Zhe Chen, Xue Yang, Xizhou Zhu, Yali Wang, Limin Wang, Ping Luo, Jifeng Dai, Yu Qiao
Technical Report, 2023
Introduction: InternChat allows you to interact with ChatGPT by clicking, dragging and drawing using a pointing device.
InternVideo-Ego4D: A Pack of Champion Solutions to Ego4D Challenges
Guo Chen*, Sen Xing*, Zhe Chen*, Yi Wang*, Kunchang Li, Yizhuo Li, Yi Liu, Jiahao Wang, Yin-Dong Zheng, Bingkun Huang, Zhiyu Zhao, Junting Pan, Yifei Huang, Zun Wang, Jiashuo Yu, Yinan He, Hongjie Zhang, Tong Lu, Yali Wang, Limin Wang, Yu Qiao#
Technical Report, 2022
Introduction: This work presents our champion solutions to five tracks at Ego4D challenge.
Awards & Honors
Contests
- Toloka Visual Question Answering Challenge, WSDM Cup 2023, 2023, 1st Place.
- The 2nd Ego4D Challenge, ECCV Workshop, 2022, 7 Top-1 Rankings.
- The 2nd National Artificial Intelligence Challenge (NAIC), Remote Sensing Semantic Segmentation Track, 2020, 1st Place (1,000,000 RMB Bonus).
- The 2nd China Gaofen Cup Beautiful Countryside Competition, Remote Sensing Crop Classification Track, 2019, 3rd Prize (5,000 RMB Bonus).
- The 9th National Undergraduate E-commerce "Innovation, Creativity and Entrepreneurship" Challenge, Zhejiang Division, 2019, 1rd Prize.
- The 9nd National Undergraduate Service Outsourcing Competition, Captcha Recognition Task, 2018, 2rd Prize.
Honors
- Outstanding Graduate of Zhejiang Province
- Zhejiang Provincial Government Scholarship
Some of My Friends
- Guo Chen, Zhiqi Li, Yuanfeng Ji, Yang Yang, Zhanhao Liang