政大機構典藏-National Chengchi University Institutional Repository(NCCUR):Item 140.119/153378

English | 正體中文 | 简体中文 | Post-Print筆數 : 27 | Items with full text/Total items : 112704/143671 (78%)
Visitors : 49778872 Online Users : 394

RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.

Scope

please add "double quotation mark" for query phrases to get precise results

please goto advance search for comprehansive author search

Adv. Search

Home ‧ Login ‧ Upload ‧ Help ‧ About ‧ Administer

Goto mobile version

政大機構典藏 > 資訊學院 > 資訊科學系 > 學位論文 > Item 140.119/153378

Please use this identifier to cite or link to this item: https://nccur.lib.nccu.edu.tw/handle/140.119/153378

Title:	基於深度學習以情感辭典增強情緒分析 Emotion Analysis Enhanced with Sentiment Lexicons Based on Deep Learning
Authors:	張禎尹
Contributors:	邱淑怡 Chiu, Shu-I 張禎尹
Keywords:	情緒分析多標籤資料不平衡相似詞替換 NRC情感辭典雙向長短期記憶網絡（BiLSTM） NRC情感辭典（EmoLex）掩碼語言模型（MLM） emotion analysis EmoLex multi-label data imbalance synonym replacement Bidirectional Long Short-Term Memory (BiL STM) Masked Language Model (MLM)
Date:	2024
Issue Date:	2024-09-04 14:59:43 (UTC+8)
Abstract:	本研究結合了雙向長短期記憶網絡（BiLSTM）和NRC情感辭典（EmoLex），名為EmoBiLSTM，旨在提高台灣社交媒體文本的情緒識別準確性。隨著COVID-19 疫情的全球蔓延，人們的生活和心理健康受到了顯著影響，及時準確地掌握公眾的情感變化對於公共衛生政策的制定具有重要意義。情緒分析在疫情期間的重要性尤為突出，能夠幫助政府及時了解公眾的情緒狀態，並針對性地採取措施。然而，現有的情緒分析技術在準確性和適應性方面仍存在不足，特別是在面對多標籤資料不平衡問題時。通過結合深度學習技術和情感辭典，提升情緒分析的準確性和適應性。為了解決多標籤資料不平衡問題，採用了相似詞替換和掩碼語言模型（MLM）進行資料擴增。相似詞替換通替換句子中的部分詞彙來生成新的訓練樣本，增加少數類別的數據量；MLM 通過預測句子中被隨機掩碼的單詞進行訓練，學習詞語的語境和句子結構，提升文本生成和擴增的效果。模型結合了BiLSTM和CNN 兩種技術。CNN 用於提取文本的局部特徵，BiLSTM 則負責捕捉文本的全局上下文信息。為了進一步增強模型的情感識別能力，模型引入了NRC 情感辭典（EmoLex）。這一辭典提供了豐富的情感詞彙，能夠幫助模型更準確地識別和處理文本中的情感信息。模型參數經過調整以優化性能，使用訓練數據集進行訓練。訓練過程中，採用準確率、召回率和F1-score 等性能指標對模型進行評估。結果顯示，相似詞換搭配EmoLex 和BiLSTM 模型在各項指標上均表現優異，特別是在處理多標籤資料不平衡問題時，顯示出了顯著的優勢。實驗結果表明，在處理台灣社交媒體文本的情緒識別任務中，具有較高的準確性和穩定性。這表明，結合深度學習技術與情感辭典的情緒分析方法，在處理多標籤資料不平衡問題方面，具有顯著的效果。 This study integrates Bidirectional Long Short-Term Memory (BiLSTM) networks and the NRC Emotion Lexicon (EmoLex) to enhance the accuracy of emotion recognition in Taiwanese social media texts during the COVID- 19 pandemic. The model, named EmoBiLSTM, aims to provide timely and accurate insights into public emotional changes, which is crucial for public health policy formulation. To address multi-label data imbalance, the study employs synonym replacement and Masked Language Model (MLM) for data augmentation. Synonym replacement generates new training samples by substituting words in sentences, increasing the data volume of minority classes.MLM predicts randomly masked words in sentences, enhancing text generation and augmentation. The model combines CNN and BiLSTM techniques,with CNN extracting local text features and BiLSTM capturing global contextual information. Introducing the NRC Emotion Lexicon (EmoLex) further enhances the model’s ability to identify and process emotional information. Performance metrics such as accuracy, recall, and F1-score are used to evaluate the model. Results show that synonym replacement combined with EmoLex and BiLSTM models performs excellently, particularly in handling multi-label data imbalance issues. This demonstrates the effectiveness of combining deep learning techniques with emotion lexicons for emotion analysis in social media texts.
Reference:	[1] Zi-xian Liu, De-gan Zhang, Gu-zhao Luo, Ming Lian, and Bing Liu. A new method of emotional analysis based on cnn–bilstm hybrid neural network. Cluster Computing, 23:2901–2913, 2020. [2] Cuiyan Wang, Riyu Pan, Xiaoyang Wan, Yilin Tan, Linkang Xu, Roger S McIntyre, Faith N Choo, Bach Tran, Roger Ho, Vijay K Sharma, et al. A longitudinal study on the mental health of general population during the covid-19 epidemic in china. Brain, behavior, and immunity, 87:40–48, 2020. [3] Tian-Ru Huang. Did covid-19 form an unexpected shield? post-pandemic suicide deaths surge to a 14-year high: ”so many more people” in two groups. The Storm Media, 2024. [4] Jasmin Bogatinovski, Ljupčo Todorovski, Sašo Džeroski, and Dragi Kocev. Comprehensive comparative study of multi-label classification methods. Expert Systems with Applications, 203:117215, 2022. [5] Alex Graves and Alex Graves. Long short-term memory. Supervised sequence labelling with recurrent neural networks, pages 37–45, 2012. [6] Mike Schuster and Kuldip K Paliwal. Bidirectional recurrent neural networks. IEEE transactions on Signal Processing, 45(11):2673–2681, 1997. [7] Christos Pavlatos, Evangelos Makris, Georgios Fotis, Vasiliki Vita, and Valeri Mladenov. Enhancing electrical load prediction using a bidirectional lstm neural network. Electronics, 12(22):4652, 2023. [8] Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998. [9] Yoon Kim. Convolutional neural networks for sentence classification, 2014. [10] Kai Zhou and Fei Long. Sentiment analysis of text based on cnn and bi-directional lstm model. pages 1–5, 2018. [11] Saif M Mohammad and Peter D Turney. Crowdsourcing a word–emotion association lexicon. Computational intelligence, 29(3):436–465, 2013. [12] Qihuang Zhang, Grace Y. Yi, Li-Pang Chen, and Wenqing He. Sentiment analysis and causal learning of covid-19 tweets prior to the rollout of vaccines. PLOS ONE, 18(2):e0277878, February 2023. ISSN 1932-6203. doi: 10.1371/journal.pone. 0277878. [13] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pretraining of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018. [14] Ritesh Kumar. Augment your small dataset using transformers: Synonym replacement for sentiment analysis part 1. Towards Data Science, 2020. [15] Qihuang Zhang, Grace Y Yi, Li-Pang Chen, and Wenqing He. Sentiment analysis and causal learning of covid-19 tweets prior to the rollout of vaccines. Plos one, 18 (2):e0277878, 2023. [16] Hua Qian and Craig R Scott. Anonymity and self-disclosure on weblogs. Journal of Computer-Mediated Communication, 12(4):1428–1451, 2007. [17] Marcus Müller, Sabine Bartsch, and Jens O Zinn. Communicating the unknown: An interdisciplinary annotation study of uncertainty in the coronavirus pandemic. International Journal of Corpus Linguistics, 26(4):498–531, 2021. [18] Sun Peng. Jieba: Chinese word segmentation tool. 2012. [19] Tomasz Szandała. Review and comparison of commonly used activation functions for deep neural networks. Bio-inspired neurocomputing, pages 203–224, 2021. [20] Yupeng Chang, Xu Wang, Jindong Wang, Yuan Wu, Linyi Yang, Kaijie Zhu, Hao Chen, Xiaoyuan Yi, Cunxiang Wang, Yidong Wang, et al. A survey on evaluation of large language models. ACM Transactions on Intelligent Systems and Technology, 15(3), 2024.
Description:	碩士國立政治大學資訊科學系 111753136
Source URI:	http://thesis.lib.nccu.edu.tw/record/#G0111753136
Data Type:	thesis
Appears in Collections:	[資訊科學系] 學位論文

Files in This Item:

File	Description	Size	Format
313601.pdf		4321Kb	Adobe PDF	0	View/Open

All items in 政大典藏 are protected by copyright, with all rights reserved.

社群 sharing

著作權政策宣告 Copyright Announcement

1.本網站之數位內容為國立政治大學所收錄之機構典藏，無償提供學術研究與公眾教育等公益性使用，惟仍請適度，合理使用本網站之內容，以尊重著作權人之權益。商業上之利用，則請先取得著作權人之授權。
The digital content of this website is part of National Chengchi University Institutional Repository. It provides free access to academic research and public education for non-commercial use. Please utilize it in a proper and reasonable manner and respect the rights of copyright owners. For commercial use, please obtain authorization from the copyright owner in advance.

2.本網站之製作，已盡力防止侵害著作權人之權益，如仍發現本網站之數位內容有侵害著作權人權益情事者，請權利人通知本網站維護人員(nccur@nccu.edu.tw)，維護人員將立即採取移除該數位著作等補救措施。
NCCU Institutional Repository is made to protect the interests of copyright owners. If you believe that any material on the website infringes copyright, please contact our staff(nccur@nccu.edu.tw). We will remove the work from the repository and investigate your claim.

DSpace Software Copyright © 2002-2004 MIT & Hewlett-Packard / Enhanced by NTU Library IR team Copyright © - Feedback