Chenghao Li

Deep Learning & Computer Vision

Intro

I am currently a Master's student in a joint program between Chongqing University of Technology (CQUT) and the Korea Advanced Institute of Science & Technology (KAIST), specializing in the field of Computer Vision. During my internship at KAIST's RCVLab, I am working alongside Professor Chaoning Zhang. Prior to this, I obtained my bachelor's degree in Computer Science through a joint program between Hebei University of Technology (HEBUT) and Massey University. I have a strong research interest in Artificial Intelligence and Computer Vision.

News

Our preprint has been submitted on arXiv.
January 11, 2024

We have just submitted the preprint of our recent research titled 'LKCA: Large Kernel Convolutional Attention' on arXiv..

We achieved seventh place in the Mammoth Cup finals.
November 06, 2023

After rigorous evaluation and selection through the preliminary and semi-final rounds, our team, along with 21 others, emerged from over 400 participating teams to enter the final showdown. Following intense competition, we achieved an impressive 7th place in the DNA track! Details can be found at Here.

Our preprint has been submitted on arXiv.
September 11, 2023

We have just submitted the preprint 'CNN or ViT? Revisiting Vision Transformers Through the Lens of Convolution' to arXiv.

I have created my own personal homepage.
September 7, 2023

I created my first personal homepage today. I'm excited to see where this personal homepage will take me and how it will evolve over time. Thank you for joining me on this journey, and if you have any suggestions or feedback, please feel free to share. Here's to new beginnings and endless possibilities!

I have achieved a top-ten ranking in an algorithm competition.
August 31, 2023

Congratulations! Our team has successfully made it to the finals of the DNA-Byte Information Encoding and Decoding Competition, achieving a top-ten ranking.I appreciate the support and collaboration of our team members, and we look forward to representing our skills and knowledge in the finals.

I have concluded my internship at RCVLab, KAIST.
August 23, 2023

I have concluded my internship at RCVLab, KAIST, and it has been a great honor to engage in research and investigation on Generative AIs and Foundation Models alongside Professor Chaoning Zhang and other scholars. I look forward to carrying the knowledge and experiences gained during my time at RCVLab into my future pursuits in the other research.

Our preprint has been featured by @AK.
May 11, 2023

Our preprint titled 'Generative AI meets 3D: A Survey on Text-to-3D in the AIGC Era', which we published on arXiv, has been featured in the daily paper by blogger @AK. We are thrilled and honored to see our work receiving recognition and attention in the research community.

I have achieved the top score in all three of my major courses.
January 11, 2022

I have achieved the top score in all three of my major courses: Digital Image Processing, Engineering Stochastic Processes, and Linear Systems. As I move forward in my academic journey, I am eager to apply the knowledge and skills gained from these courses to my future research and projects.

Works

Portfolio Image
Hide

LKCA: Large Kernel Convolutional Attention

Chenghao Li, Boheng Zeng, Yi Lu, Pengbo Shi, Qingzi Chen, Jirui Liu, Lingyun Zhu

We revisit the relationship between attention mechanisms and large kernel ConvNets in visual transformers and propose a new spatial attention named Large Kernel Convolutional Attention (LKCA). It simplifies the attention operation by replacing it with a single large kernel convolution. LKCA combines the advantages of convolutional neural networks and visual transformers, possessing a large receptive field, locality, and parameter sharing. We explained the superiority of LKCA from both convolution and attention perspectives, providing equivalent code implementations for each view. Experiments confirm that LKCA implemented from both the convolutional and attention perspectives exhibit equivalent performance. We extensively experimented with the LKCA variant of ViT in both classification and segmentation tasks. The experiments demonstrated that LKCA exhibits competitive performance in visual tasks.

Portfolio Image
Hide

CNN or ViT? Revisiting Vision Transformers Through the Lens of Convolution

Chenghao Li, Chaoning Zhang*

The success of Vision Transformer (ViT) has been widely reported on a wide range of image recognition tasks. The merit of ViT over CNN has been largely attributed to large training datasets or auxiliary pre-training. Without pre-training, the performance of ViT on small datasets is limited because the global self-attention has limited capacity in local modeling. Towards boosting ViT on small datasets without pre-training, this work improves its local modeling by applying a weight mask on the original self-attention matrix. A straightforward way to locally adapt the self-attention matrix can be realized by an element-wise learnable weight mask (ELM), for which our preliminary results show promising results. However, the element-wise simple learnable weight mask not only induces a non-trivial additional parameter overhead but also increases the optimization complexity. To this end, this work proposes a novel Gaussian mixture mask (GMM) in which one mask only has two learnable parameters and it can be conveniently used in any ViT variants whose attention mechanism allows the use of masks. Experimental results on multiple small datasets demonstrate that the effectiveness of our proposed Gaussian mask for boosting ViTs for free (almost zero additional parameter or computation cost).

Portfolio Image
Hide

Generative AI meets 3D: A Survey on Text-to-3D in AIGC Era

Chenghao Li, Chaoning Zhang*, Atish Waghwase, Lik-Hang Lee, Francois Rameau, Yang Yang, Sung-Ho Bae, Choong Seon Hong

In recent years, generative AI, often referred to as AI-generated content (AIGC), has seen significant advancements. One of the most practical applications of this technology is text-guided content generation, which allows for interaction between human instructions and AIGC. Furthermore, with the development of text-to-image and 3D modeling technologies such as NeRF, the field of text-to-3D has emerged as a vibrant and active research area.

Portfolio Image
Hide

A Complete Survey on Generative AI (AIGC): Is ChatGPT from GPT-4 to GPT-5 All You Need?

Chaoning Zhang, Chenshuang Zhang, Sheng Zheng, Yu Qiao, Chenghao Li, Mengchun Zhang, Sumit Kumar Dam, Chu Myaet Thwal, Ye Lin Tun, Le Luang Huy, Sung-Ho Bae, Lik-Hang Lee, Yang Yang, Heng Tao Shen, In So Kweon, Choong Seon Hong

In light of the viral popularity of ChatGPT and the widespread attention garnered by generative AI (AIGC), which encompasses text, images, and more, it becomes imperative to take a closer look at this evolving landscape. As AI transitions from pure analysis to creative content generation, it is crucial to recognize that ChatGPT, represented by the latest model GPT-4, is just one facet of the expansive realm of AIGC tasks.

Portfolio Image
Hide

One Small Step for Generative AI, One Giant Leap for AGI: A Complete Survey on Chatgpt in AIGC Era

Chaoning Zhang, Chenshuang Zhang, Chenghao Li, Yu Qiao, Sheng Zheng, Sumit Kumar Dam, Mengchun Zhang, Jung Uk Kim, Seong Tae Kim, Jinwoo Choi, Gyeong-Moon Park, Sung-Ho Bae, Lik-Hang Lee, Pan Hui, In So Kweon, Choong Seon Hong

OpenAI has recently released GPT-4 (a.k.a. ChatGPT plus), which is demonstrated to be one small step for generative AI (GAI), but one giant leap for artificial general intelligence (AGI). Since its official release in November 2022, ChatGPT has quickly attracted numerous users with extensive media coverage. Such unprecedented attention has also motivated numerous researchers to investigate ChatGPT from various aspects. According to Google scholar, there are more than 500 articles with ChatGPT in their titles or mentioning it in their abstracts. Considering this, a review is urgently needed, and our work fills this gap. Overall, this work is the first to survey ChatGPT with a comprehensive review of its underlying technology, applications, and challenges. Moreover, we present an outlook on how ChatGPT might evolve to realize general-purpose AIGC (a.k.a. AI-generated content), which will be a significant milestone for the development of AGI.

Portfolio Image
Hide

Understanding segment anything model: Sam is biased towards texture rather than shape

Chaoning Zhang, Yu Qiao, Shehbaz Tariq, Sheng Zheng, Chenshuang Zhang, Chenghao Li, Hyundong Shin, Choong Seon Hong

In contrast to the human vision that mainly depends on the shape for recognizing the objects, deep image recognition models are widely known to be biased toward texture. Recently, Meta research team has released the first foundation model for image segmentation, termed segment anything model (SAM), which has attracted significant attention. In this work, we understand SAM from the perspective of texture vs shape. Different from label-oriented recognition tasks, the SAM is trained to predict a mask for covering the object shape based on a promt. With this said, it seems self-evident that the SAM is biased towards shape. In this work, however, we reveal an interesting finding: the SAM is strongly biased towards texture-like dense features rather than shape. This intriguing finding is supported by a novel setup where we disentangle texture and shape cues and design texture-shape cue conflict for mask prediction.

Portfolio Image
Hide

A Survey On Segment Anything Model (SAM): Vision Foundation Model Meets Prompt Engineering

Chaoning Zhang, Sheng Zheng, Chenghao Li, Yu Qiao, Taegoo Kang, Xinru Shan, Chenshuang Zhang, Caiyan Qin, Francois Rameau, Sung-Ho Bae, Choong Seon Hong

Segment anything model (SAM) developed by Meta AI Research has recently attracted significant attention. Trained on a large segmentation dataset of over 1 billion masks, SAM is capable of segmenting any object on a certain image. In the original SAM work, the authors turned to zero-short transfer tasks (like edge detection) for evaluating the performance of SAM. Recently, numerous works have attempted to investigate the performance of SAM in various scenarios to recognize and segment objects. Moreover, numerous projects have emerged to show the versatility of SAM as a foundation model by combining it with other models, like Grounding DINO, Stable Diffusion, ChatGPT, etc. With the relevant papers and projects increasing exponentially, it is challenging for the readers to catch up with the development of SAM. To this end, this work conducts the first yet comprehensive survey on SAM. This is an ongoing project and we intend to update the manuscript on a regular basis. Therefore, readers are welcome to contact us if they complete new works related to SAM so that we can include them in our next version.

Experience

 

CQUT & KAIST Joint Program

Master of Artificial Intelligence

 

HEBUT & Massey University Joint Program

Bachelor of Computer Science




Feel free to reach out to me if you have any questions, require further information, or wish to discuss any related topics. Your inquiries and feedback are highly valued, and I look forward to connecting with you soon. Thank you for your interest and engagement.

Top