GhostFormer: Efficiently amalgamated CNN-transformer architecture for object detection

Abstract

The lightweight network model has gradually evolved into an important research direction in object detection. Network lightweight design has a variety of research methods, such as quantization, knowledge distillation, and neural architecture search. However, these methods either fail to break through the performance bottleneck of the model itself or require massive training costs. In order to solve these problems, a new object detection model based on CNN-Transformer hybrid feature extraction network called GhostFormer is proposed from the perspective of lightweight network structure design. GhostFormer makes full use of the advantages of local modeling of CNN and global modeling of Transformer, not only effectively reducing the complexity of the convolution model but also breaking through the limitation of Transformer’s lack of inductive bias. Finally, better transfer results are obtained in downstream tasks. Experiments show that the model is less than half as computationally expensive as YOLOv7 on the Pascal VOC dataset, with only about 3 % mAP@0.5 loss, and 9.7% mAP@0.5:0.95 improvement on the MS COCO dataset compared with GhostNet.

Publication
Pattern Recognition
Mingye Xie
Mingye Xie
PhD Candidate

Life itself is the most wonderful fairy tale.