面向目标检测和人眼视觉的视频编码优化
Optimization of video coding for object detection and perceptual quality
-
摘要: 为了降低视频编码失真对目标检测性能的影响,提出了一种面向目标检测和人眼视觉的视频编码优化方法。首先,调整I帧的量化参数,提升率-编码失真性能;其次,将目标检测算法引入到视频编码器中提取编码帧中的目标区域信息;接着,采用深度网络模型提取当前编码单元的特征,并采用余弦距离计算特征失真;然后,通过改进的VGG网络模型预测编码单元的量化参数;最后,将特征失真引入到率失真优化问题中,通过计算码率-编码失真-特征失真代价函数选择编码单元的最优编码参数。实验结果表明,与最新视频编码标准参考软件VTM-23.0相比,对于目标检测性能,所提算法平均可取得10.5%的BD-rate节省;对于人眼视觉,所提算法平均可取得2.2%的BD-rate节省。Abstract: In order to reduce the impact of video coding distortion on object detection, an optimization method of ideo coding for object detection and perceptual quality was proposed. Firstly, the quantization parameter of I frame was refined to improve the video coding performance in terms of rate-compression-distortion. Secondly, the object detection algorithm was introduced into video codec to predict the object area of current coding frame. Thirdly, a commonly used deep neural network was utilized to extract the feature of current coding unit, which was used to calculate feature distortion. Then, a modified VGG model was proposed to predict the quantization parameter of current coding unit. Finally, the feature distortion and compression distortion were considered as joint distortion in rate-distortion optimization problem, in which the optimal coding parameters were decided. Experimental results showed that, compared with VTM-23.0, the proposed method could achieve about 10.5% BD-rate savings on object detection accuracy and about 2.2% BD-rate savings on compression distortion, respectively.