資料載入中.....
|
請使用永久網址來引用或連結此文件:
https://nccur.lib.nccu.edu.tw/handle/140.119/158714
|
題名: | 強化深偽偵測:以統計方法辨識影像的圖像特徵 Enhancing Deepfake Detection: Statistical Analysis of Frame Features with Extension to Video |
作者: | 高崇哲 GAO, CHONG-ZHE |
貢獻者: | 余清祥 YU, QING-XIANG 高崇哲 GAO, CHONG-ZHE |
關鍵詞: | 深偽影像 維度縮減 資料洩漏 資料結構化 幀間同質性 Deepfake videos Dimensionality reduction Data leakage Data structuring Inter-frame homogeneity |
日期: | 2025 |
上傳時間: | 2025-08-04 15:11:34 (UTC+8) |
摘要: | 人工智慧與深度學習的快速發展帶來諸多便利與創新,然而這些技術可能遭不法分子濫用成為新型犯罪工具,其中深偽影像(Deepfake)的出現顛覆了眼見為憑的傳統認知,對視覺資訊的真實性構成嚴重威脅。目前,多數深偽影像偵測方法依賴深度學習技術,雖然偵測效果不錯,卻因龐大的參數量與複雜的計算過程,使決策過程難以解釋。本研究從統計角度切入,提出一種透過特徵維度縮減,具高度可解釋性的輕量化偵測方法。 除了可解釋性與計算量較少的優勢外,本研究另有兩項貢獻:一、有效避免資料洩漏問題;二、將偵測單位從圖像層級拓展至影像層級,以符合實務需求。先前方法多以圖像為單位進行分析,這可能導致同一部影像同時出現在訓練集與測試集中,產生資料洩漏(Data Leakage),使測試結果與實際應用存在落差。為改善此問題,本研究改以影像為單位切割資料,避免資料洩漏,提升模型的泛化能力與結果可信度,使偵測效果更穩定且符合實務需求。我們參考先前研究方法,將原本以單一尺度區塊切割計算的梯度強度,調整為使用大尺度區塊與一階差分計算的低離群值,以賦予特徵同時具備全局與局部紋理的描述能力。同時,針對HSV(Hue、Saturation、Value)色彩空間中 H 通道的角度特性,採用 sin H 與 cos H 分解方式進行轉換,以提升偵測表現與解釋性。除了紋理變化,本研究亦發現紋理種類分布對深偽影像具有辨識力,因而進一步納入兩類紋理統計特徵:其一為以共生矩陣(Co-occurrence Matrix)計算的角二階矩(Angular Second Moment,ASM),其二為從梯度方向直方圖(Histogram of Oriented Gradient,HOG)中提取的統計量。本研究以 Celeb-DF-v2 深偽影像資料集為實驗對象,並採用 500 次重複模擬的交叉驗證進行評估。結果顯示,所提方法在僅使用 31 個特徵的情況下,仍可達到 69.55% 的偵測準確率,較原方法提升 4.91%,展現本方法兼具良好效能與可解釋性的潛力。 The rapid advancement of artificial intelligence and deep learning has brought significant benefits and innovations. However, these technologies are also increasingly misused, particularly in the creation of deepfake media, which severely undermines the credibility of visual information. While most existing detection methods rely on deep learning models that achieve high accuracy, they often suffer from limited interpretability and substantial computational complexity. This study presents a lightweight and interpretable statistical approach for deepfake detection, achieving competitive performance with fewer than 1% of the features typically used in deep learning models. Building upon the work of Chen (2023), we enhance both global and local texture representation by applying large-scale block-based gradient extraction in combination with first-order differencing to suppress outliers. To further address angular discontinuities in the HSV color space, hue components are transformed using sine and cosine decomposition (sin H and cos H). In addition to capturing texture variations, we investigate the distribution of texture types by incorporating two types of statistical features: (1) Angular Second Moment (ASM) from gray-level co-occurrence matrices, and (2) summary statistics extracted from Histograms of Oriented Gradients (HOG). These features are then used as inputs for statistical and machine learning classifiers. Experiments conducted on the Celeb-DF-v2 dataset, using 500 iterations of cross-validation, demonstrate that our method achieves a detection accuracy of 69.55% with only 31 features—a 4.91% improvement over the baseline. Furthermore, by sampling and aggregating predictions at the video level rather than the frame level, we mitigate data leakage risks and enhance real-world applicability. Final decisions are made using majority voting and median aggregation strategies to better reflect practical deployment scenarios. |
參考文獻: | [1] 陳慧霜(2023)。「影像分析與深偽影片的偵測」。國立政治大學統計學系學位論文。 [2] Ahmed, N., Natarajan, T., & Rao, K. R. (2006). “Discrete Cosine Transform”, IEEE Transactions on Computers, 100(1), 90–93. [3] Akiba, T., Sano, S., Yanase, T., Ohta, T., & Koyama, M. (2019). “Optuna: A Next-Generation Hyperparameter Optimization Framework”, Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2623–2631. [4] Amari, S. (2006). “A Theory of Adaptive Pattern Classifiers”, IEEE Transactions on Electronic Computers, 3, 299–307. [5] Bertasius, G., Wang, H., & Torresani, L. (2021). “Is Space-Time Attention All You Need for Video Understanding?”, Proceedings of the 38th International Conference on Machine Learning (ICML), Vol. 2, No. 3, p. 4. [6] Blanz, V., & Vetter, T. (2023). “A Morphable Model for the Synthesis of 3D Faces”, Seminal Graphics Papers: Pushing the Boundaries, Volume 2, 157–164. [7] Breiman, L. (2001). “Random Forests”, Machine Learning, 45(1), 5–32. [8] Chen, T., & Guestrin, C. (2016). “XGBoost: A Scalable Tree Boosting System”, Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 785–794. [9] Cortes, C., & Vapnik, V. (1995). “Support-Vector Networks”, Machine Learning, 20, 273–297. [10] Cover, T., & Hart, P. (1967). “Nearest Neighbor Pattern Classification”, IEEE Transactions on Information Theory, 13(1), 21–27. [11] Cox, D. R. (1958). “The Regression Analysis of Binary Sequences”, Journal of the Royal Statistical Society Series B: Statistical Methodology, 20(2), 215–232. [12] Dalal, N., & Triggs, B. (2005). “Histograms of Oriented Gradients for Human Detection”, Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), 1, 886–893. [13] Dolhansky, B., Bitton, J., Pflaum, B., Lu, J., Howes, R., Wang, M., & Ferrer, C. C. (2020). “The Deepfake Detection Challenge (DFDC) Dataset”, arXiv preprint arXiv:2006.07397. [14] Gabor, D. (1946). “Theory of Communication. Part 1: The Analysis of Information”, Journal of the Institution of Electrical Engineers – Part III: Radio and Communication Engineering, 93(26), 429–441. [15] Haralick, R. M., Shanmugam, K., & Dinstein, I. H. (1973). “Textural Features for Image Classification”, IEEE Transactions on Systems, Man, and Cybernetics, 6, 610–621. [16] Horn, B. K. P., & Schunck, B. G. (1981). “Determining Optical Flow”, Artificial Intelligence, 17(1–3), 185–203. [17] Kim, H., Garrido, P., Tewari, A., Xu, W., Thies, J., Niessner, M., ... & Theobalt, C. (2018). “Deep Video Portraits”, ACM Transactions on Graphics (TOG), 37(4), 1–14. [18] LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). “Gradient-Based Learning Applied to Document Recognition”, Proceedings of the IEEE, 86(11), 2278–2324. [19] Li, L., Bao, J., Zhang, T., Yang, H., Chen, D., Wen, F., & Guo, B. (2020). “Face X-Ray for More General Face Forgery Detection”, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 5001–5010. [20] Li, Y., Yang, X., Sun, P., Qi, H., & Lyu, S. (2020). “Celeb-DF: A Large-Scale Challenging Dataset for Deepfake Forensics”, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) [21] Liu, Y., Zhang, K., Li, Y., Yan, Z., Gao, C., Chen, R., ... & Sun, L. (2024). “Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models”, arXiv preprint arXiv:2402.17177. [22] Matern, F., Riess, C., & Stamminger, M. (2019). “Exploiting Visual Artifacts to Expose Deepfakes and Face Manipulations”, 2019 IEEE Winter Applications of Computer Vision Workshops (WACVW), 83–92. [23] Nirkin, Y., Keller, Y., & Hassner, T. (2019). “FSGAN: Subject Agnostic Face Swapping and Reenactment”, Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 7184–7193. [24] Pérez, P., Gangnet, M., & Blake, A. (2023). “Poisson Image Editing”, Seminal Graphics Papers: Pushing the Boundaries, Volume 2, 577–582. [25] Polyak, A., Zohar, A., Brown, A., Tjandra, A., Sinha, A., Lee, A., ... & Du, Y. (2024). “Movie gen: A Cast of Media Foundation Models, 2025”, arXiv preprint arXiv:2410.13720. [26] Rossler, A., Cozzolino, D., Verdoliva, L., Riess, C., Thies, J., & Nießner, M. (2019). “FaceForensics++: Learning to Detect Manipulated Facial Images”, Proceedings of the IEEE/CVF International Conference on Computer Vision, 1–11. [27] Siarohin, A., Lathuilière, S., Tulyakov, S., Ricci, E., & Sebe, N. (2019). “First Order Motion Model for Image Animation”, Advances in Neural Information Processing Systems, 32. [28] Tolosana, R., Vera-Rodriguez, R., Fierrez, J., Morales, A., & Ortega-Garcia, J. (2020). “Deepfakes and Beyond: A Survey of Face Manipulation and Fake Detection”, Information Fusion, 64, 131–148. [29] Wiles, O., Koepke, A., & Zisserman, A. (2018). “X2Face: A Network for Controlling Face Generation Using Images, Audio, and Pose Codes”, Proceedings of the European Conference on Computer Vision (ECCV), 670–686. [30] Yang, X., Li, Y., & Lyu, S. (2019). “Exposing Deep Fakes Using Inconsistent Head Poses”, ICASSP 2019 – IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 8261–8265. [31] Zhao, H., Zhou, W., Chen, D., Wei, T., Zhang, W., & Yu, N. (2021). “Multi-Attentional Deepfake Detection”, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2185–2194. [32] Zhang, K., Zhang, Z., Li, Z., & Qiao, Y. (2016). “Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks”, IEEE Signal Processing Letters, 23(10), 1499–1503. [33] Zhao, T., Xu, X., Xu, M., Ding, H., Xiong, Y., & Xia, W. (2021). “Learning Self-Consistency for Deepfake Detection”, Proceedings of the IEEE/CVF International Conference on Computer Vision, 15023–15033. |
描述: | 碩士 國立政治大學 統計學系 112354020 |
資料來源: | http://thesis.lib.nccu.edu.tw/record/#G0112354020 |
資料類型: | thesis |
顯示於類別: | [統計學系] 學位論文
|
文件中的檔案:
檔案 |
描述 |
大小 | 格式 | 瀏覽次數 |
402001.pdf | | 3901Kb | Adobe PDF | 0 | 檢視/開啟 |
|
在政大典藏中所有的資料項目都受到原著作權保護.
|