政大機構典藏-National Chengchi University Institutional Repository(NCCUR):Item 140.119/139140

English | 正體中文 | 简体中文 | Post-Print筆數 : 27 | Items with full text/Total items : 118786/149850 (79%)
Visitors : 82360253 Online Users : 3977

RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.

Scope

please add "double quotation mark" for query phrases to get precise results

please goto advance search for comprehansive author search

Adv. Search

Home ‧ Login ‧ Upload ‧ Help ‧ About ‧ Administer

Goto mobile version

政大機構典藏 > 商學院 > 統計學系 > 學位論文 > Item 140.119/139140

Please use this identifier to cite or link to this item: https://nccur.lib.nccu.edu.tw/handle/140.119/139140

Title:	基於生成對抗網路的異質圖神經網路之不平衡節點分類架構 A Framework of Imbalanced Node Classification On Heterogeneous Graph Neural Network With GAN
Authors:	林庭樂 Lin, Ting-Le
Contributors:	王志宇周珮婷 Wang, Chih-Yu Chou, Pei-Ting 林庭樂 Lin, Ting-Le
Keywords:	類別不平衡生成對抗網路圖神經網路異質圖 Class Imbalance Generative Adversarial Network Graph Neural Network Heterogeneous Graph
Date:	2022
Issue Date:	2022-03-01 16:38:46 (UTC+8)
Abstract:	圖神經網路（Graph Neural Network；GNN）為近年興起的深度學習模型。由於其可以利用圖狀資訊的特性，因此被廣泛運用於各種任務，並且達到極佳的效果。目前的GNN皆預設不同類別的樣本數量一致，然而許多現實中的應用場景為類別不平衡（Class Imbalance）的狀況，所以GNN在該應用場景上無法達到較好的表現。因此處理類別不平衡對GNN為十分重要的課題。過取樣（Oversampling）為解決類別不平衡的常用技巧，透過複製或合成以創造少量類別的樣本，調整各類別的樣本數量。但過取樣可能造成過擬合的問題，在GNN的應用框架下，新生成的樣本無法正確地與原始資料結合。且異質圖（Heterogeneous Graph）的設定時常出現在現實的應用場景，這也使得建立關聯的問題更加困難。為了解決上述的問題，本文以過取樣的概念為出發點，藉由生成對抗網路（Generative Adversarial Network；GAN）產生近似真實資料的樣本，並建立深度學習模型將新生成的樣本與原始的資料結合。本研究以Amazon評論商品評論資料集為實驗資料。本研究所提出的方法在多項指標的表現明顯優於其餘方法。 Graph Neural Network (GNN) is a Deep Learning-Based model and recently has received a lot of attention. Since its ability to utilize the information of graph-structured data, it is widely used and dominant in various real-world tasks. However, existing GNNs set the sample size of different classes to be balanced. But in the real world, many scenarios are naturally with the characteristic of class imbalance. Therefore, directly applying GNNs to these scenarios may not achieve optimal performance. Consequently, it is crucial to solving the class imbalance problem for GNNs. Oversampling is a common way to solve the class imbalance problem. It increases minority class samples by duplicating or synthesizing to balance the sample size of each class. Yet oversampling may result in overfitting, and synthetic samples cannot add to the original dataset under the framework of GNNs. Furthermore, the heterogeneous graph setting makes generating connections harder which is frequent in real-world applications. In this work, we propose a novel framework that adopts the idea of oversampling to solve the problem described above. It generates samples with GAN (Generative Adversarial Network) instead of duplicating or synthesizing old samples. In addition, it trains Deep Neural Networks to add the synthetic samples to the original dataset. The proposed framework is applied and evaluated on Amazon Reviews datasets. It outperforms all the other baselines on many metrics.
Reference:	[1] Arjovsky, M., Chintala, S., and Bottou, L. (2017). Wasserstein Generative Adversarial Networks. Proceedings of the 34th International Conference on Machine Learning, 214-223. [2] Bradley, P., A., (1997). The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern recognition 30(7), 1145–1159. [3] Chawla, V., N., Bowyer, W, K., Hall, O., L., and Kegelmeyer, W P. (2002). SMOTE: Synthetic Minority Over-sampling Technique. Journal of artificial intelligence research, 321-357. [4] Chen, D., Lin, Y., Zhao, G., Ren, X., Li, P., Zhou, J., and Sun, X. (2021). Topology-Imbalance Learning for Semi-Supervised Node Classification. Pre-proceedings of the 34th Advances in Neural Information Processing Systems. [5] Davis, J., Goadrich, M. (2006). The relationship between precision-recall and roc curves. Proceedings of the 23rd International Conference on Machine Learning, 233-240. [6] Ghorbani, M., Kazi, A., Baghshah, S., M., Rabiee, R., H., and Navab, N. (2021). RA-GCN: Graph Convolutional Network for Disease Prediction Problems with Imbalanced Data. arXiv preprint: 2103.00221. [7] Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., and Bengio, Y. (2014). Generative Adversarial Nets. Proceedings of the 27th International Conference on Neural Information Processing Systems, 2672-2680. [8] Gori, M., Monfardini, G., and Scarselli, F. (2005). A new model for learning in graph domains. Proceedings of the 2005 IEEE International Joint Conference on Neural Networks. [9] He, H. and Ma, Y. (2013). Imbalanced learning: foundations, algorithms, and applications, John Wiley & Sons. [10] He, H., and Garcia, A., E. (2009). Learning from imbalanced data. IEEE Transactions on knowledge and data engineering, 21(9), 1263–1284. [11] Hu, Z., Dong, Y., Wang, K., and Sun, Y., (2020). Heterogeneous Graph Transformer. Proceedings of The Web Conference, 2704-2710. [12] Kipf., N., T., and Welling, M. (2017). Semi-supervised Classification with Graph Convolutional Networks. Proceedings of the 5th International Conference on Learning Representations. [13] Kumar, S., Hooi, B., Makhija, D., Kumar, M., Faloutsos, C., and Subrahmanian, V.S. (2018). Rev2: Fraudulent user prediction in rating platforms. Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, 333–341. [14] Ling, C., X. and Li, C. (1998). Data mining for direct marketing: Problems and solutions. Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining, (6), 73-79. [15] Liu, Z., Chen, C., Yang, X., Zhou, J., Li, X., Song, L. (2018). Heterogeneous Graph Neural Networks for Malicious Account Detection. Proceedings of the 27th ACM International Conference on Information and Knowledge Management, 2077-2085. [16] Liu, Z., Dou, Y., Yu, P., S., Deng, Y., Peng, H. (2020). Alleviating the Inconsistency Problem of Applying Graph Neural Network to Fraud Detection. Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, 1569-1572. [17] Long, Q., Jin, Y., Song, G., Li, Y., Lin, W. (2020) Graph Structural-topic Neural Network. Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 1065-1073. [18] Mao, X., Li, Q., Xie, H., Lau, Y.K., R., Wang, Z., and Smolley, P., S. (2017). Least Squares Generative Adversarial Networks. Proceedings of the IEEE International Conference on Computer Vision, 2794-2802. [19] Marius, P., and Balas, E., V. (2009). Multilayer perceptron and neural networks. WSEAS Transactions on Circuits and Systems, 8(7), 579-588 [20] McAuley, J., J., Leskovec, J. (2013). From amateurs to connoisseurs: modeling the evolution of user expertise through online reviews. Proceedings of the 2013 International World Wide Web Conferences. [21] Mirza, M., and Osindero, S. (2014). Conditional Generative Adversarial Nets, arXiv preprint arXiv:1411.1784. [22] Radford, A., Metz, L., and Chintala, S. (2016). Unsupervised representation learning with deep convolutional generative adversarial networks. Proceedings of the International Conference on Learning Representations. [23] Ren, M., Zeng, W., Yang, B., and Urtasun, R. (2018). Learning to Reweight Examples for Robust Deep Learning. Proceedings of the 35th International Conference on Machine Learning, (80), 4334-4343. [24] Sampath, V., Maurtua, I., Martin, J., J., A., and Gutierrez, A. (2021). A survey on generative adversarial networks for imbalance problems in computer vision tasks. Journal of Big Data, (8), 1-59. [25] Scarselli, F., Gori, M., Tsoi, A., C., Hagenbuchner, M., and Monfardini, G. (2009). The graph neural network model. IEEE Transactions on Neural Networks, (20), 61-80. [26] Shi, M., Tang, Y., Zhu, X., Wilson, A., D., and Liu, J. (2020). Multi-Class Imbalanced Graph Convolutional Network Learning. Proceedings of the 29th International Joint Conference on Artificial Intelligence, 2879-2885. [27] Velickovic P., Cucurull, G., Casanova, A., Romero, A., Lio P., and Bengio, Y. (2018). Graph Attention Networks. Proceedings of the 6th International Conference on Learning Representations. [28] Wang, X., Ji, H., Shi, C., Wang, B., Cui, P., Yu, P., and Ye, Y. (2019). Heterogeneous Graph Attention Network. Proceedings of the 2019 International World Wide Web Conferences. [29] Yuan, B., and Ma, X. (2012). Sampling + reweighting: Boosting the performance of AdaBoost on imbalanced datasets. The 2012 International Joint Conference on Neural Networks, 1-6. [30] Zhao, T., Zhang, X., and Wang, S. (2021). GraphSMOTE: Imbalanced Node Classification on Graphs with Graph Neural Networks. Proceedings of the 14th ACM International Conference on Web Search and Data Mining, (9), 833-841.
Description:	碩士國立政治大學統計學系 108354018
Source URI:	http://thesis.lib.nccu.edu.tw/record/#G0108354018
Data Type:	thesis
DOI:	10.6814/NCCU202200296
Appears in Collections:	[統計學系] 學位論文

Files in This Item:

File	Description	Size	Format
401801.pdf		1032Kb	Adobe PDF2	0	View/Open

All items in 政大典藏 are protected by copyright, with all rights reserved.

社群 sharing

著作權政策宣告 Copyright Announcement

1.本網站之數位內容為國立政治大學所收錄之機構典藏，無償提供學術研究與公眾教育等公益性使用，惟仍請適度，合理使用本網站之內容，以尊重著作權人之權益。商業上之利用，則請先取得著作權人之授權。
The digital content of this website is part of National Chengchi University Institutional Repository. It provides free access to academic research and public education for non-commercial use. Please utilize it in a proper and reasonable manner and respect the rights of copyright owners. For commercial use, please obtain authorization from the copyright owner in advance.

2.本網站之製作，已盡力防止侵害著作權人之權益，如仍發現本網站之數位內容有侵害著作權人權益情事者，請權利人通知本網站維護人員(nccur@nccu.edu.tw)，維護人員將立即採取移除該數位著作等補救措施。
NCCU Institutional Repository is made to protect the interests of copyright owners. If you believe that any material on the website infringes copyright, please contact our staff(nccur@nccu.edu.tw). We will remove the work from the repository and investigate your claim.

DSpace Software Copyright © 2002-2004 MIT & Hewlett-Packard / Enhanced by NTU Library IR team Copyright © - Feedback