Zhe Chen (陈喆)

Phd Candidate at Nanjing University

github
email

About Me

I am a second-year PhD candidate in the Department of Computer Science and Technology at Nanjing University (NJU), supervised by Prof. Tong Lu. I started my studies in 2020 through a combined Master's and PhD program, which includes two years for the master's degree and four years for the PhD.

My research interests are vision foundation model, vision-language model, and detection & segmentation.

News

Education & Experiences

  • Nanjing University, Nanjing, China
    Sept 2020 - Present

  • Zhejiang University of Science and Technology, Zhejiang, China
    Sept 2016 - June 2020

Publications

→ Full list

* Equal Contribution # Corresponding Author

InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks

Zhe Chen, Jiannan Wu, Wenhai Wang, Weijie Su, Guo Chen, Sen Xing, Zhong Muyan, Qinglong Zhang, Xizhou Zhu, Lewei Lu, Bin Li, Ping Luo, Tong Lu, Yu Qiao, Jifeng Dai#

CVPR oral, 2024

Introduction: InternVL scales up the ViT to 6B parameters and aligns it with LLM.

[Paper] [BibTex] [Code] [Poster] [中文解读]

InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions

Wenhai Wang*, Jifeng Dai*, Zhe Chen*, Zhenhang Huang*, Zhiqi Li*, Xizhou Zhu*, Xiaowei Hu, Tong Lu, Lewei Lu, Hongsheng Li, Xiaogang Wang, Yu Qiao#

CVPR highlight, 2023 | CVPR 2023 Top-10 Influential Papers

Introduction: This work presents a new large-scale CNN-based foundation model, termed InternImage.

[Paper] [BibTex] [Code] [Poster] [中文解读]

Vision Transformer Adapter for Dense Predictions

Zhe Chen*, Yuchen Duan*, Wenhai Wang#, Junjun He, Tong Lu#, Jifeng Dai, Yu Qiao

ICLR spotlight, 2023

Introduction: This work present a simple yet powerful adapter for pure ViT, which can remedy the defects of ViT and achieve comparable performance to vision-specific models in dense prediction tasks.

[Paper] [BibTex] [Code] [Poster] [Slides] [中文解读]

Bounding Box Stability against Feature Dropout Reflects Detector Generalization across Environments

Yang Yang, Wenhai Wang, Zhe Chen, Jifeng Dai, Liang Zheng

ICLR spotlight, 2024

Introduction: A brand-new data-centric problem of estimating the detector performance in an unlabeled test domain.

[Paper] [BibTex] [Code]

GeoDiffusion: Text-Prompted Geometric Control for Object Detection Data Generation

Kai Chen, Enze Xie, Zhe Chen, Lanqing Hong, Zhenguo Li, Dit-Yan Yeung

ICLR, 2024

Introduction: GeoDiffusion translates geometric conditions into text prompts, enhancing T2I models for generating detection data, and improves object detector performance.

[Paper] [BibTex] [Code]

VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks

Wenhai Wang*, Zhe Chen*, Xiaokang Chen*, Jiannan Wu*, Xizhou Zhu, Gang Zeng, Ping Luo, Tong Lu, Jie Zhou, Yu Qiao, Jifeng Dai#

NeurIPS, 2023

Introduction: We present an LLM-based framework for vision-centric tasks, termed VisionLLM.

[Paper] [BibTex] [Code] [Poster]

DDP: Diffusion Model for Dense Visual Prediction

Yuanfeng Ji*, Zhe Chen*, Enze Xie#, Lanqing Hong, Xihui Liu, Zhaoqiang Liu, Tong Lu, Zhenguo Li, Ping Luo

ICCV, 2023

Introduction: We propose a simple, efficient, yet powerful framework for dense visual predictions based on the conditional diffusion pipeline.

[Paper] [BibTex] [Code] [Poster]

AVSegFormer: Audio-Visual Segmentation with Transformer

Shengyi Gao, Zhe Chen, Guo Chen, Wenhai Wang, Tong Lu#

AAAI, 2024

Introduction: This work presents a new framework for AVS tasks that leverages the transformer architecture.

[Paper] [BibTex] [Code] [Poster]

Graph Propagation Transformer for Graph Representation Learning

Zhe Chen*, Hao Tan*, Tao Wang, Tianrun Shen, Tong Lu#, Qiuying Peng, Cheng Cheng, Yue Qi

IJCAI, 2023

Introduction: This work presents a novel transformer architecture for graph representation learning.

[Paper] [BibTex] [Code]

Towards Ultra-Resolution Neural Style Transfer via Thumbnail Instance Normalization

Zhe Chen, Wenhai Wang#, Enze Xie, Tong Lu#, Ping Luo

AAAI, 2022

Introduction: URST is a versatile framework for ultra-high resolution style transfer under limited GPU memory resources.

[Paper] [BibTex] [Code] [Poster] [中文解读]

SiameseCCR: A Novel Method for One-shot and Few-shot Chinese CAPTCHA Recognition using Deep Siamese Network

Zhe Chen, Weifeng Ma#, Nanfan Xu, Caoting Ji, Yulai Zhang

IET Image Processing, 2020 (SCI Impact Factor: 2.373)

Introduction: We proposed a Siamese network-based method for one-shot and few-shot Chinese CAPTCHA Recognition.

[Paper] [BibTex] [Code]

Preprints

FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation

Zhe Chen, Jiahao Wang, Wenhai Wang, Guo Chen, Enze Xie, Ping Luo, Tong Lu#

Arxiv, 2021

Introduction: We propose an accurate and efficient scene text detection framework, termed FAST (i.e., faster arbitrarily-shaped text detector).

[Paper] [BibTex] [Code]

Technical Reports

InternGPT: Solving Vision-Centric Tasks by Interacting with Chatbots Beyond Language

Zhaoyang Liu, Yinan He, Wenhai Wang, Weiyun Wang, Yi Wang, Shoufa Chen, Qinglong Zhang, Yang Yang, Qingyun Li, Jiashuo Yu, Kunchang Li, Zhe Chen, Xue Yang, Xizhou Zhu, Yali Wang, Limin Wang, Ping Luo, Jifeng Dai, Yu Qiao

Technical Report, 2023

Introduction: InternChat allows you to interact with ChatGPT by clicking, dragging and drawing using a pointing device.

[Paper] [BibTex] [Code] [中文解读]

Champion Solution for the WSDM2023 Toloka VQA Challenge

Shengyi Gao, Zhe Chen, Guo Chen, Wenhai Wang, Tong Lu#

Technical Report, 2023

Introduction: In this report, we present our champion solution to the WSDM2023 Toloka Visual Question Answering (VQA) Challenge.

[Paper] [BibTex] [Code] [中文解读]

InternVideo-Ego4D: A Pack of Champion Solutions to Ego4D Challenges

Guo Chen*, Sen Xing*, Zhe Chen*, Yi Wang*, Kunchang Li, Yizhuo Li, Yi Liu, Jiahao Wang, Yin-Dong Zheng, Bingkun Huang, Zhiyu Zhao, Junting Pan, Yifei Huang, Zun Wang, Jiashuo Yu, Yinan He, Hongjie Zhang, Tong Lu, Yali Wang, Limin Wang, Yu Qiao#

Technical Report, 2022

Introduction: This work presents our champion solutions to five tracks at Ego4D challenge.

[Paper] [BibTex] [Code] [中文解读]

Awards & Honors

Contests

  • Toloka Visual Question Answering Challenge, WSDM Cup 2023, 2023, 1st Place.
  • The 2nd Ego4D Challenge, ECCV Workshop, 2022, 7 Top-1 Rankings.
  • The 2nd National Artificial Intelligence Challenge (NAIC), Remote Sensing Semantic Segmentation Track, 2020, 1st Place (1,000,000 RMB Bonus).
  • The 2nd China Gaofen Cup Beautiful Countryside Competition, Remote Sensing Crop Classification Track, 2019, 3rd Prize (5,000 RMB Bonus).
  • The 9th National Undergraduate E-commerce "Innovation, Creativity and Entrepreneurship" Challenge, Zhejiang Division, 2019, 1rd Prize.
  • The 9nd National Undergraduate Service Outsourcing Competition, Captcha Recognition Task, 2018, 2rd Prize.

Honors

  • Outstanding Graduate of Zhejiang Province
  • Zhejiang Provincial Government Scholarship

Some of My Friends

Last Updated: 5/30/2022, 3:24:51 PM