基于时空特征一致性的Deepfake视频检测模型

Deepfake Video Detection Model Based on Consistency of Spatial-Temporal Features

作者：赵磊(武汉大学空天信息安全与可信计算教育部重点实验室，湖北武汉 430000;武汉大学国家网络安全学院，湖北武汉 430000)；葛万峰(武汉大学空天信息安全与可信计算教育部重点实验室，湖北武汉 430000;武汉大学国家网络安全学院，湖北武汉 430000)；毛钰竹(武汉大学国家网络安全学院，湖北武汉 430000)；韩萌(武汉大学国家网络安全学院，湖北武汉 430000)；李文欣(武汉大学国家网络安全学院，湖北武汉 430000)；李学(武汉大学国家网络安全学院，湖北武汉 430000)

Author：ZHAO Lei(Key Lab. of Aerospace Info. Security and Trusted Computing, Ministry of Education, Wuhan Univ., Wuhan 430000, China;School of Cyber Sci. and Eng., Wuhan Univ., Wuhan 430000, China)；GE Wanfeng(Key Lab. of Aerospace Info. Security and Trusted Computing, Ministry of Education, Wuhan Univ., Wuhan 430000, China;School of Cyber Sci. and Eng., Wuhan Univ., Wuhan 430000, China)；MAO Yuzhu(School of Cyber Sci. and Eng., Wuhan Univ., Wuhan 430000, China)；HAN Meng(School of Cyber Sci. and Eng., Wuhan Univ., Wuhan 430000, China)；LI Wenxin(School of Cyber Sci. and Eng., Wuhan Univ., Wuhan 430000, China)；LI Xue(School of Cyber Sci. and Eng., Wuhan Univ., Wuhan 430000, China)

收稿日期：2019-09-17 年卷（期）页码：2020,52(4):243-250

期刊名称：工程科学与技术

Journal Name：Advanced Engineering Sciences

关键字：虚假图像;Deepfake检测;时域特征;空域特征

Key words：fake images;Deepfake detection;temporal features;spatial features

基金项目：国家自然科学基金项目（61672394）；中央高校基本科研业务费专项资金（2042019kf0017）

中文摘要

针对目前大部分研究仅关注Deepfake单幅图像的空间域特征而设计检测模型的问题，以Deepfake视频中人物面部表情变化存在细微的不一致、不连续等现象为出发点，提出一种基于时空特征一致性的检测模型。该模型使用卷积神经网络对待检测图像提取空域特征，利用光流法在待检测图像的连续帧间进行时域特征的捕获，同时利用卷积神经网络对时域特征进行深层次特征提取，在时域特征和空域特征经过多重的特征变换后，使用全连接神经网络对空域特征和时域特征的组合空间进行分类实现检测目标。将本文提出的模型在Faceforensics++开源Deepfake数据集上开展模型的训练，并对模型的检测效果进行实验验证。实验结果表明，本文模型的检测准确率可达98.1%，AUC值可达0.998 1。通过与现有的Deepfake检测模型进行对比，本文模型在检测准确率和AUC取值方面均优于现有模型，验证了本文模型的有效性。

英文摘要

In order to improve the feature utilization rate of the image to be detected，a Deepfake video detection model based on consistency of spatial-temporal features was proposed, inspired by the observation that there are slight inconsistency and discontinuity in the facial expression changes of the characters in Deepfake videos. In the model, the convolutional neural network (CNN) was used to extract the spatial features from the video to be detected, and an optical flow method was used to perform temporal features between consecutive frames of the video. Then another CNN was used to extract the abstract and in-depth features from the optical flow map. After the temporal features and spatial features were transformed from original representation space to a new feature space by neural networks, a fully connected neural network was used to classify the combined spatial and temporal features space to achieve the detection target. The model proposed in the paper was trained on the Faceforensics++, an open source Deepfake dataset. The experimental results indicated that the detection accuracy of the proposed model reaches 98.1%, and the AUC value reaches 0.998 1. By comparing with the existing Deepfake detection models, the proposed model is superior to the existing models in terms of detection accuracy and AUC value, which verifies the effectiveness of the proposed model.

上一条：一种基于公差的机床导轨几何误差预测方法

【关闭】

论文摘要

基于时空特征一致性的Deepfake视频检测模型

Deepfake Video Detection Model Based on Consistency of Spatial-Temporal Features