高分辨率图像分类方法在直播汽车车型识别场景中的应用

温中卉; 王汉生

高分辨率图像分类方法在直播汽车车型识别场景中的应用

Application of High-Resolution Image Classification Method in Live Vehicle Type Recognition Scene

摘要

摘要: 面临汽车市场疲软带来的销量下降、客户结构改变带来的获客成本增加和线下流量稀缺等压力，传统车企迫切地需要进行数字化改革，汽车直播逐渐成为汽车厂商引流的重要方式。为精准捕捉销售线索和分析产品反馈，直播平台和车辆销售方需要将正在直播的车型与每个直播观众的反馈及直播数据相对应，因此有效识别正在直播中的车型成为亟待解决的问题。直播过程中通常会在车牌位置显示车型，这是一个高分辨率图像中的强信号场景，而传统图像处理方法对此类场景的训练效果不佳。本文借鉴目标检索方法，针对标注数据有限的强信号场景，采用相对简单的图像分类模型，应用迁移学习以解决标注样本少且资源有限的问题，并创建基于IoUB Loss损失函数的边界框回归以更精准地确定车牌区域，最终将该模型应用于汽车直播场景。以抖音直播平台奥迪汽车直播过程中的车辆图像进行训练，本文模型在测试集图像数据中的预测精度达47.4%。将本文模型与传统图像处理模型和经典Faster RCNN模型进行对比，发现本文模型已具备相较经典Faster RCNN模型更优的效果，具有一定的现实作用，可协助直播平台和车辆销售方在业务中更为简单高效地处理直播车型识别问题。

Abstract: Faced with pressures such as declining sales caused by the weak automobile market,increasing customer acquisition costs caused by changes in customer structure,and scarcity of offline traffic,traditional car companies urgently need to carry out digital reforms,and car live stream has gradually become an important way for car manufacturers to attract traffic.In order to accurately capture sales leads and analyze product feedback,the live streaming platforms and vehicle sales teams need to correspond the current live vehicle type with the feedback and live data of each live audience.However,in practical business,many live broadcasts only indicate the brand without identifying the current live vehicle type,or a live broadcast will broadcast multiple types of the same brand.It is impossible to directly obtain the currently live vehicle type and then match the data for further analysis.

This paper finds these practical business issues and innovatively proposes a business scenario for identifying the live vehicle type to effectively solve the pain point of the live streaming platforms and vehicle sales teams.In the process of live streaming,the vehicle type is usually displayed at the license plate position.This is a strong signal scene in a high-resolution image,that is,the signal occupies a small size in the whole picture,but it largely determines the classification of the picture.Moreover,the number of targets in the image has been clarified in advance,and the scene is relatively simple and targeted.However,traditional image processing methods are relatively complex and not effective for such scenes.This paper draws on the target retrieval method and proposes a relatively simple image classification model for scenarios with limited annotation data and strong signal scenes.It applies transfer learning to solve the problem of few annotation samples and limited resources and creates a boundary box regression based on the IoUB Loss function to more accurately determine the license plate area.Specifically,the model is a two-stage model.The first step is to scan the image,train a binary classification model through transfer learning based on the VGG16 network and use the boundary box regression based on the IoUB Loss function to predict the license plate position.The second step trains a classification model through transfer learning based on the VGG16 network to classify the predicted license plate area.Finally,the model is applied to the live vehicle type identification scene.

By training the vehicle images during the live stream of Audi cars on the TikTok,the prediction accuracy of this model in the test set images reaches 47.4%.Considering that the innovation of this paper does not focus on creating a new high-precision image recognition method,but on identifying live vehicle type to effectively solve the business issue of live streaming platforms and vehicle sales teams,this model is compared with the traditional image processing model and classic object detection model (Faster RCNN).It is found that：① traditional image processing models are not suitable for such strong signal scenarios.Dimensionality reduction will lose signal information and lead to significant performance degradation,while the model in the paper has a certain optimization effect;② In the second step,using the predicted license plate area image for training is more effective than directly training the annotation frame;③ In the case of limited annotation data,the relatively simple model proposed in the paper has higher prediction accuracy than Faster RCNN model and has a certain practical effect.It can help live streaming platforms and vehicle sales teams handle live vehicle identification problems in their business more simply and efficiently.At the same time,the model in the paper is also suitable for other application scenarios with high-resolution,limited annotation data,and strong signal image classification problems.Users can migrate and apply the model according to actual scenarios.

HTML全文

参考文献(0)

施引文献

资源附件(0)