Work

Here are some works of mine 📚

Publications

InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions

Wenhai Wang*, Jifeng Dai*, Zhe Chen*, Zhenhang Huang*, Zhiqi Li*, Xizhou Zhu*, Xiaowei Hu, Tong Lu, Lewei Lu, Hongsheng Li, Xiaogang Wang, Yu Qiao#

CVPR highlight, 2023

Introduction: This work presents a new large-scale CNN-based foundation model, termed InternImage.

[Paper] [BibTex] [Code]

Vision Transformer Adapter for Dense Predictions

Zhe Chen*, Yuchen Duan*, Wenhai Wang#, Junjun He, Tong Lu#, Jifeng Dai, Yu Qiao

ICLR spotlight, 2023

Introduction: This work present a simple yet powerful adapter for pure ViT, which can remedy the defects of ViT and achieve comparable performance to vision-specific models in dense prediction tasks.

[Paper] [BibTex] [Code]

Towards Ultra-Resolution Neural Style Transfer via Thumbnail Instance Normalization

Zhe Chen, Wenhai Wang#, Enze Xie, Tong Lu#, Ping Luo

AAAI, 2022

Introduction: URST is a versatile framework for ultra-high resolution style transfer under limited memory resources.

[Paper] [BibTex] [Code]

SiameseCCR: A Novel Method for One-shot and Few-shot Chinese CAPTCHA Recognition using Deep Siamese Network

Zhe Chen, Weifeng Ma#, Nanfan Xu, Caoting Ji, Yulai Zhang

IET Image Processing, 2020 (SCI Impact Factor: 2.373)

Introduction: We proposed a Siamese network-based method for one-shot and few-shot Chinese CAPTCHA Recognition.

[Paper] [BibTex] [Code]

Technical Report

InternVideo-Ego4D: A Pack of Champion Solutions to Ego4D Challenges

Guo Chen*, Sen Xing*, Zhe Chen*, Yi Wang*, Kunchang Li, Yizhuo Li, Yi Liu, Jiahao Wang, Yin-Dong Zheng, Bingkun Huang, Zhiyu Zhao, Junting Pan, Yifei Huang, Zun Wang, Jiashuo Yu, Yinan He, Hongjie Zhang, Tong Lu, Yali Wang, Limin Wang, Yu Qiao#

Arxiv, 2022

Introduction: This work presents our champion solutions to five tracks at Ego4D challenge.

[Paper] [BibTex] [Code]

FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation

Zhe Chen, Jiahao Wang, Wenhai Wang, Guo Chen, Enze Xie, Ping Luo, Tong Lu#

Arxiv, 2021

Introduction: We propose an accurate and efficient scene text detection framework, termed FAST (i.e., faster arbitrarily-shaped text detector).

[Paper] [BibTex] [Code]

Projects

MusesArt: A Fast Style Transfer Application based on the Xiaomi's MACE Framework

Introduction: MusesArt is an Android application to achieve high-resolution nerual style transfer on mobile devices. It is built on OpenCV and Xiaomi's MACE. You can download the source code and compile it with Android Studio, or install the package we released.

[Apk] [Code]

English and Chinese Captcha Recognition via TensorFlow

Introduction: This is our solution for the Captcha Recognition Task of the 9nd National Undergraduate Service Outsourcing Competition. The target of this challenge is to recognize the captcha images with multi levels of difficulty, including digital captcha, English captcha, and Chinese captcha.

[Code]