Loading...
|
Please use this identifier to cite or link to this item:
https://nccur.lib.nccu.edu.tw/handle/140.119/158709
|
Title: | FedADKD:一種基於自適應解耦知識蒸餾的聯邦學習方法 FedADKD:A Federated Learning Approach based on Adaptive Decoupled Knowledge Distillation |
Authors: | 周秉賢 Chou, Ping-Hsien |
Contributors: | 張宏慶 Jang, Hung-Chin 周秉賢 Chou, Ping-Hsien |
Keywords: | 聯邦學習 知識蒸餾 非獨立同分布 知識遺忘 解耦知識蒸餾 Federated Learning Knowledge Distillation Non-IID Knowledge Forgetting Decoupled Knowledge Distillation |
Date: | 2025 |
Issue Date: | 2025-08-04 15:10:16 (UTC+8) |
Abstract: | 隨著物聯網(IoT)、行動裝置與智慧應用的迅速發展,數據的生成與儲存日益分散,使得去中心化的模型訓練需求日漸提升。在此背景下,聯邦學習(Federated Learning, FL)因其能在無需集中原始資料的前提下,協同訓練全局模型,成為一項備受矚目的技術。然而,現實中的數據通常呈現非獨立同分布(Non-IID),各客戶端本地數據差異顯著,導致傳統FL方法(如FedAvg)在聚合過程中出現「全局知識遺忘」現象,降低全局模型效能。 近期研究嘗試引入知識蒸餾來緩解此問題,例如FedNTD方法透過對齊各節點的非真實類別預測來保留全局知識,但僅蒸餾非真實類別資訊可能不足以全面融合知識。同時,現有方法多採用固定的蒸餾權重,未考慮節點間數據異質性的差異,這形成了重要的研究缺口。 為了解決上述挑戰,本研究提出了一種創新的聯邦學習框架——FedADKD(Federated Learning via Adaptive Decoupled Knowledge Distillation)。該方法將知識蒸餾機制解耦為真實類別知識蒸餾(TCKD)與非真實類別知識蒸餾(NCKD)兩部分,並量化每個節點數據分布的異質程度,以自適應地調整TCKD的蒸餾權重。具體而言,在數據極度異質的節點上降低TCKD權重,使該節點保留較多本地特性;反之在數據較均衡的節點上提高TCKD權重,以強化跨節點的共同知識學習。如此動態平衡各節點對全局模型的知識貢獻,最大程度減輕Non-IID導致的全局知識遺忘,且僅需傳送極小的額外標量資訊,對通訊量影響可忽略。 我們在CIFAR-10與CIFAR-100影像分類數據集上設計多組Non-IID情境實驗,全面評估FedADKD的效能。結果顯示,FedADKD在各種異質數據分布下均顯著優於傳統FedAvg及現有FedNTD方法,在全局模型準確率上有明顯提升,並有效降低了全局知識遺忘率。進一步的分析證實,FedADKD能更穩定地保留來自不同節點的知識,兼顧「本地適應」與「全局融合」。同時,消融實驗說明了自適應TCKD權重調整機制對提升模型表現的貢獻。 綜合而言,本文所提出的FedADKD為異質數據環境下的聯邦學習提供了一種高效且無額外隱私風險的解決方案,在理論與實務上具有重要意義。 With the rapid advancement of the Internet of Things (IoT), mobile devices, and intelligent applications, data generation and storage have become increasingly decentralized, thereby intensifying the demand for distributed model training. Against this backdrop, Federated Learning (FL) has emerged as a promising paradigm, enabling collaborative model training across clients without requiring the centralization of raw data. However, real-world data typically exhibit Non-Independent and Identically Distributed (Non-IID) heterogeneity, where local data across clients vary considerably. This non-uniform data distribution often leads to “global knowledge forgetting” during the aggregation process in conventional FL methods (e.g., FedAvg), resulting in degraded global model performance. Recent studies have attempted to alleviate this problem by incorporating knowledge distillation into FL. For instance, FedNTD preserves global knowledge by aligning the non-target class predictions among clients. Nevertheless, relying solely on non-target class information may not thoroughly integrate all knowledge sources. Moreover, most existing approaches employ fixed distillation weights without accounting for varying degrees of heterogeneity among clients, leaving an important research gap. To address these challenges, we propose an innovative federated learning framework, FedADKD (Federated Learning via Adaptive Decoupled Knowledge Distillation). The framework decouples knowledge distillation into True-Class Knowledge Distillation (TCKD) and Non-True-Class Knowledge Distillation (NCKD), and quantifies each client’s data heterogeneity to adaptively adjust the TCKD weight. Specifically, the TCKD weight is reduced for clients with highly heterogeneous data, allowing greater retention of local characteristics, whereas it is increased for clients with more balanced data to reinforce cross-client knowledge sharing. This dynamic balance of client contributions mitigates Non-IID-induced global knowledge forgetting to the greatest extent, while transmitting only a few additional scalar values—an overhead that is virtually negligible. We conduct extensive experiments on CIFAR-10 and CIFAR-100 under multiple Non-IID scenarios to evaluate the effectiveness of FedADKD. Experimental results show that FedADKD consistently outperforms the conventional FedAvg and the existing FedNTD methods across diverse heterogeneous distributions, achieving notable improvements in global accuracy and significantly reducing global knowledge forgetting rates. Further analyses confirm that FedADKD more robustly retains knowledge from various clients, thus effectively reconciling “local adaptation” with “global integration.” Ablation studies additionally underscore the contribution of the adaptive TCKD weighting mechanism to model performance enhancement. In sum, our proposed FedADKD method offers a high-efficiency, privacy-preserving solution for Federated Learning under heterogeneous data environments, demonstrating both theoretical significance and practical value. |
Reference: | [1] McMahan, B., Moore, E., Ramage, D., Hampson, S., & y Arcas, B. A. (2017, April). Communication-efficient learning of deep networks from decentralized data. In Artificial intelligence and statistics (pp. 1273-1282). PMLR. [2] Mammen, P. M. (2021). Federated learning: Opportunities and challenges. arXiv preprint arXiv:2101.05428. [3] Li, T., Sahu, A. K., Talwalkar, A., & Smith, V. (2020). Federated learning: Challenges, methods, and future directions. IEEE signal processing magazine, 37(3), 50-60. [4] Liu, Y., James, J. Q., Kang, J., Niyato, D., & Zhang, S. (2020). Privacy-preserving traffic flow prediction: A federated learning approach. IEEE Internet of Things Journal, 7(8), 7751-7763. [5] Lee, G., Jeong, M., Shin, Y., Bae, S., & Yun, S. Y. (2022). Preservation of the global knowledge by not-true distillation in federated learning. Advances in Neural Information Processing Systems, 35, 38461-38474. [6] Kirkpatrick, J., Pascanu, R., Rabinowitz, N., Veness, J., Desjardins, G., Rusu, A. A., ... & Hadsell, R. (2017). Overcoming catastrophic forgetting in neural networks. Proceedings of the national academy of sciences, 114(13), 3521-3526. [7] Goodfellow, I. J., Mirza, M., Xiao, D., Courville, A., & Bengio, Y. (2013). An empirical investigation of catastrophic forgetting in gradient-based neural networks. arXiv preprint arXiv:1312.6211. [8] Li, T., Sahu, A. K., Zaheer, M., Sanjabi, M., Talwalkar, A., & Smith, V. (2020). Federated optimization in heterogeneous networks. Proceedings of Machine learning and systems, 2, 429-450. [9] Karimireddy, S. P., Kale, S., Mohri, M., Reddi, S., Stich, S., & Suresh, A. T. (2020, November). Scaffold: Stochastic controlled averaging for federated learning. In International conference on machine learning (pp. 5132-5143). PMLR. [10] Wang, J., Liu, Q., Liang, H., Joshi, G., & Poor, H. V. (2020). Tackling the objective inconsistency problem in heterogeneous federated optimization. Advances in neural information processing systems, 33, 7611-7623. [11] Reddi, S., Charles, Z., Zaheer, M., Garrett, Z., Rush, K., Konečný, J., ... & McMahan, H. B. (2020). Adaptive federated optimization. arXiv preprint arXiv:2003.00295. [12] Jeong, E., Oh, S., Kim, H., Park, J., Bennis, M., & Kim, S. L. (2018). Communication-efficient on-device machine learning: Federated distillation and augmentation under non-iid private data. arXiv preprint arXiv:1811.11479. [13] Lin, T., Kong, L., Stich, S. U., & Jaggi, M. (2020). Ensemble distillation for robust model fusion in federated learning. Advances in neural information processing systems, 33, 2351-2363. [14] Hinton, G., Vinyals, O., & Dean, J. (2015). Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531. [15] Zhao, B., Cui, Q., Song, R., Qiu, Y., & Liang, J. (2022). Decoupled knowledge distillation. In Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition (pp. 11953-11962). [16] Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton. Cifar-10 and cifar-100 datasets. URl: https://www. cs. toronto. edu/kriz/cifar. html, 6, 2009. [17] Luo, M., Chen, F., Hu, D., Zhang, Y., Liang, J., & Feng, J. (2021). No fear of heterogeneity: Classifier calibration for federated learning with non-iid data. Advances in Neural Information Processing Systems, 34, 5972-5984. [18] Wu, C., Wu, F., Lyu, L., Huang, Y., & Xie, X. (2022). Communication-efficient federated learning via knowledge distillation. Nature communications, 13(1), 2032. [19] Zhang, L., Shen, L., Ding, L., Tao, D., & Duan, L. Y. (2022). Fine-tuning global model via data-free knowledge distillation for non-iid federated learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 10174-10183). [20] Cao, X., Sun, G., Yu, H., & Guizani, M. (2022). PerFED-GAN: Personalized federated learning via generative adversarial networks. IEEE Internet of Things Journal, 10(5), 3749-3762. [21] Zhou, Z., Sun, F., Chen, X., Zhang, D., Han, T., & Lan, P. (2023). A decentralized federated learning based on node selection and knowledge distillation. Mathematics, 11(14), 3162. [22] Li, D., & Wang, J. Fedmd: Heterogenous federated learning via model distillation. arXiv 2019. arXiv preprint arXiv:1910.03581. [23] Zhu, Z., Hong, J., & Zhou, J. (2021, July). Data-free knowledge distillation for heterogeneous federated learning. In International conference on machine learning (pp. 12878-12889). PMLR. [24] Parisi, G. I., Kemker, R., Part, J. L., Kanan, C., & Wermter, S. (2019). Continual lifelong learning with neural networks: A review. Neural networks, 113, 54-71. [25] Li, Q., He, B., & Song, D. (2021). Model-contrastive federated learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 10713-10722). [26] Shoham, N., Avidor, T., Keren, A., Israel, N., Benditkis, D., Mor-Yosef, L., & Zeitak, I. (2019). Overcoming forgetting in federated learning on non-iid data. arXiv preprint arXiv:1910.07796. [27] Chaudhry, A., Dokania, P. K., Ajanthan, T., & Torr, P. H. (2018). Riemannian walk for incremental learning: Understanding forgetting and intransigence. In Proceedings of the European conference on computer vision (ECCV) (pp. 532-547). [28] Wang, Z., Zhang, Z., Lee, C. Y., Zhang, H., Sun, R., Ren, X., ... & Pfister, T. (2022). Learning to prompt for continual learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 139-149). [29] Wu, H., & Wang, P. (2021). Fast-convergent federated learning with adaptive weighting. IEEE Transactions on Cognitive Communications and Networking, 7(4), 1078-1088. [30] Loh, W. Y. (2011). Classification and regression trees. Wiley interdisciplinary reviews: data mining and knowledge discovery, 1(1), 14-23. [31] Song, Y., Liu, H., Zhao, S., Jin, H., Yu, J., Liu, Y., ... & Wang, L. (2024). Fedadkd: heterogeneous federated learning via adaptive knowledge distillation. Pattern Analysis and Applications, 27(4), 134. [32] Su, L., Wang, D., & Zhu, J. (2025). DKD-pFed: A novel framework for personalized federated learning via decoupling knowledge distillation and feature decorrelation. Expert Systems with Applications, 259, 125336. [33] Yashwanth, M., Nayak, G. K., Singh, A., Simmhan, Y., & Chakraborty, A. (2023). Adaptive Self-Distillation for Minimizing Client Drift in Heterogeneous Federated Learning. arXiv preprint arXiv:2305.19600. [34] Zhao, S., Liao, T., Fu, L., Chen, C., Bian, J., & Zheng, Z. (2024). Data-free knowledge distillation via generator-free data generation for Non-IID federated learning. Neural Networks, 179, 106627. [35] Han, S., Park, S., Wu, F., Kim, S., Wu, C., Xie, X., & Cha, M. (2022, October). Fedx: Unsupervised federated learning with cross knowledge distillation. In European Conference on Computer Vision (pp. 691-707). Cham: Springer Nature Switzerland. |
Description: | 碩士 國立政治大學 資訊科學系碩士在職專班 112971005 |
Source URI: | http://thesis.lib.nccu.edu.tw/record/#G0112971005 |
Data Type: | thesis |
Appears in Collections: | [資訊科學系碩士在職專班] 學位論文
|
Files in This Item:
File |
Description |
Size | Format | |
100501.pdf | | 3202Kb | Adobe PDF | 0 | View/Open |
|
All items in 政大典藏 are protected by copyright, with all rights reserved.
|