資料載入中.....
|
請使用永久網址來引用或連結此文件:
https://nccur.lib.nccu.edu.tw/handle/140.119/157107
|
題名: | CapST: Leveraging Capsule Networks and Temporal Attention for Accurate Model Attribution in Deep-fake Videos |
作者: | 汪新 Ahmad, Wasim;Peng, Yan Tsung;Chang, Yuan-Hao;Ganfure, Gaddisa Olani;Khan, Sarwar |
貢獻者: | 群智博五 |
日期: | 2025-04 |
上傳時間: | 2025-05-27 11:09:35 (UTC+8) |
摘要: | Deep-fake videos, generated through AI face-swapping techniques, have garnered considerable attention due to their potential for impactful impersonation attacks. While existing research primarily distinguishes real from fake videos, attributing a deep-fake to its specific generation model or encoder is crucial for forensic investigation, enabling precise source tracing and tailored countermeasures. This approach not only enhances detection accuracy by leveraging unique model-specific artifacts but also provides insights essential for developing proactive defenses against evolving deep-fake techniques. Addressing this gap, this article investigates the model attribution problem for deep-fake videos using two datasets—Deepfakes from Different Models (DFDM) and GANGen-Detection, which comprise deep-fake videos and images generated by GAN models. We select only fake images from the GANGen-Detection dataset to align with the DFDM dataset, which specifies the goal of this study, focusing on model attribution rather than real/fake classification. This study formulates deep-fake model attribution as a multiclass classification task, introducing a novel Capsule-Spatial-Temporal (CapST) model that effectively integrates a modified VGG19 (utilizing only the first 26 out of 52 layers) for feature extraction, combined with Capsule Networks and a Spatio-Temporal attention mechanism. The Capsule module captures intricate feature hierarchies, enabling robust identification of deep-fake attributes, while a video-level fusion technique leverages temporal attention mechanisms to process concatenated feature vectors and capture temporal dependencies in deep-fake videos. By aggregating insights across frames, our model achieves a comprehensive understanding of video content, resulting in more precise predictions. Experimental results on the DFDM and GANGen-Detection datasets demonstrate the efficacy of CapST, achieving substantial improvements in accurately categorizing deep-fake videos over baseline models, all while demanding fewer computational resources. |
關聯: | ACM Transactions on Multimedia Computing, Communications and Applications, Vol.21, No.4, pp.1-23 |
資料類型: | article |
DOI 連結: | https://doi.org/10.1145/3715138 |
DOI: | 10.1145/3715138 |
顯示於類別: | [社群網路與人智計算國際研究生博士學位學程(TIGP)] 期刊論文
|
文件中的檔案:
檔案 |
描述 |
大小 | 格式 | 瀏覽次數 |
index.html | | 0Kb | HTML | 165 | 檢視/開啟 |
|
在政大典藏中所有的資料項目都受到原著作權保護.
|