English  |  正體中文  |  简体中文  |  Post-Print筆數 : 27 |  Items with full text/Total items : 94040/124493 (76%)
Visitors : 29079606      Online Users : 420
RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version
    政大機構典藏 > 商學院 > 統計學系 > 學位論文 >  Item 140.119/130957
    Please use this identifier to cite or link to this item: http://nccur.lib.nccu.edu.tw/handle/140.119/130957


    Title: 應用情感分析技術於電影評論分類與評分系統 — 以Yahoo!奇摩電影為例
    Application of Sentiment Analysis Technology in Movie Reviews Opinion Classification and Ranking System -Taking Yahoo! Movie for Example
    Authors: 蔡廷儀
    Tsai, Ting-Yi
    Contributors: 鄭宇庭
    蔡廷儀
    Tsai, Ting-Yi
    Keywords: 自動評分
    意見探勘
    情緒分類
    Yahoo!奇摩電影
    網際網路
    Opinion auto ranking
    Opinion mining
    Sentiment analysis
    Yahoo! Movie
    Internet
    Date: 2020
    Issue Date: 2020-08-03 17:31:37 (UTC+8)
    Abstract: 在近十年內,網際網路迅速的竄起,與80年代當時的web2.0尚未普及使用相比,人與人之間的交往模式從寫信給特定想發送的對象,至今慢慢地轉為傾向於自願性的發表以及分享個人言論於公開的網路平台或是論壇上,例如:消費者對於產品後的使用心得、經驗分享,或是針對影劇、新聞媒體的觀後評論與意見等等。往後,也隨著行動裝置越來越便利普及,當人們在無法做決定、有選擇性障礙時,往往會參考有經驗的人或是過去消費者們的想法。透過網路搜尋關鍵字,取得來自各種論壇、公開評論網站、新聞媒體以及個人部落格等等的資訊。例如:台大批踢踢實業坊、痞客邦等屬於提供各方面領域訊息的網站。如果想針對不同領域進行資訊的查詢,像是想了解電影相關的的訊息的話,例如:Yahoo!奇摩電影、IMDB這種評論網站提供的則是針對電影相關的影評、新聞文章、電影簡介等訊息給使用者。然而網際網路的盛行也進而引進企業界人士的投入,帶來有用的商業智慧,並提供有效的行銷決策。另外,對於網路使用者來說也能獲取來自四面八方的主觀評論意見,作為消費前或是觀看電影前的參考依據。
    有鑒於此,本論文針對Yahoo!奇摩電影的短篇評論,設計一個專屬電影的意見情緒分類器與評論評分系統,分成訓練模型和測試集合驗證兩部分。在訓練集合部分,包含資料處理、人工擷取意見詞和屬性詞、建立相關詞庫、計算意見詞分數以及訓練模型的建立。首先,我們將訓練集合資料利用CKIP斷詞系統進行斷詞後,以人工標記的方式,蒐集帶有明顯情緒的意見詞以及電影相關的屬性詞,來建立情緒特徵詞庫,再針對訓練集評論中具有加強和否定語義的詞彙建立程度詞庫以及否定詞庫。接著,透過事前建立的意見詞庫、程度詞庫、否定詞庫,定義五種情緒特徵,分別為「極度正向」、「正向」、「中立」、「負向」、「極度負向」,針對訓練集合的評論進行特徵向量的擷取,再轉為特徵向量,透過非監督式的機器學習法SVM(Support Vector Machine),訓練出一個情緒分類模型。在測試集合驗證部分,利用訓練好的支持向量機,將評論進行正向情緒和負向情緒的分類,再將分類結果與評論網站上提供的星等分數做比較,計算出整體的正確率為85.55%以及AUC為92.55%,代表此系統有不錯的鑑別度和可信度。最後根據評論內容自動化對產生的電影評分,並且搭配電影的四大屬性類別的得分狀況,來提供給使用者在看電影前最直接且可信的參考指標。
    In the past ten years, the Internet has rapidly sprung up. Compared with the web 2.0 that was not widely used in the 1980s, the mode of communication between people has gradually changed from writing a letter to a specific person who wanted to send to volunteering to post and share personal comments on public web platforms or forums. For example, consumers’ experience of using the products and experience sharing, or reviews and opinions on social media such as movies and news, etc. As mobile devices become more and more convenient and popular, when people are unable to make decisions and be torn between two things, they often take the thoughts from people who has experience or consumers from past. We can obtain various types information from all kinds of forums, public comment sites, media, and personal blogs, etc. PTT and PIXNET are web forums which provides information from different fields. If people wants to get information in specific fields, such as movie-related review sites like “Yahoo! Movies” and “IMDB” are reviews sites that provide comments, news, and introductions about movies for users. However, the prevalence of the Internet has also brought in the engagement of entrepreneurs, bringing useful business intelligence and providing reference for effective marketing decisions.
    In the light of this, we designed an opinion classification and ranking system just for movies by analyzing the short reviews on Yahoo! Movies, including training model and verification parts. In the training part of the system, it includes data processing, manual extraction of opinion words and attribute words, establishment of related corpus, calculation of opinion words scores, and establishment of training models. First, we tokenized data into words by Chinese Knowledge and Information Processing system, and then we collected opinion words with obvious emotions and movie-related attribute words to establish an emotional feature lexicon. Furthermore, we also took negative-terms and degree-terms into consideration, and built a lexicon for them. Then, we defined five sentiment features by the lexicons that we have built previously, including “extremely positive”, “positive”, “neutral”, “negative”, and “extremely negative”. We translated the emotional features into feature vectors, training a Support Vector Machine classification model to classify emotions. In the verification parts, the system classifies emotion of every social comment into positive and negative based on the Support Vector Machine emotions classification model trained in the training part. We compared the classification results with the star scores provided on the review sites, and got the 85.55% in accuracy rate and 92.55% in AUC, which represents that the system has a good discrimination. Finally, every movie is automatically scored according to the movie reviews, and the scores of the four attribute categories of the movie provides the users with the most direct and reliable reference index before watching the movie.
    Reference: 一、中文文獻
    (1)CKIP中央研究院中文斷詞系,中央研究院,2012:http://ckipsvr.iis.sinica.edu.tw/。
    (2)IMDB評論網站,1990:https://www.imdb.com/。
    (3)李淑惠,2014,應用文字探勘技術於口碑分析之研究,東吳大學資訊管理學系碩士論文。
    (4)邱鴻達,2011,意見探勘在中文電影評論之應用,國立交通大學資訊與工程研究所碩士論文。
    (5)邱文仁、蕭維勤、藍玉潔、祁孝麟、陳信雄 、吳冠良、許嘉慧、游明蒼,2006,事件良率關聯分析系統及方法以及電腦可讀取儲存媒體,中華民國發明專利,I251752。
    (6)俞舒禔,2018,應用情感分析於產品比較與品牌推薦系統—以美妝產品為例。
    (7)張育蓉,2012,使用情緒分析於圖書館使用者滿意度評估之研究,國立中興大學圖書資訊學研究所碩士學位論文。
    (8)張傳珩 ,2019,文本探勘與情緒分析於產品推薦之應用—以PTT電影版為例,東吳大學資訊管理學系碩士論文。
    (9)謝佩庭,2014,基於使用者情緒關鍵詞彙之臉書粉絲專頁評論分類與評分系統,國立交通大學多媒體工程研究所碩士論文。

    二、英文文獻
    (1)Abdous, M. & W. He, (2011), Using text mining to uncover students’ technology related problems in live video streaming, British Journal of Educational Technology, Vol. 40, Issue 5, pp. 40–49.
    (2)Blair-Goldensohn, S., K. Hannan, R. McDonald, T. Neylon, G. Reis & J. Reynar, (2008). “Building a sentiment summarizer for local service reviews,” In paperpresented at the www 2008 workshop on NLP challenges in the information explosion era (NLPIX 2008), Beijing, April, 22.
    (3)Blake, C. (2011). Text mining. Annual review of information science and technology, 45(1), pp. 121-155.
    (4)Bollen, J., H. Mao & X. Zeng, (2011). Twitter mood predicts the stock market, Journal of Computational Science, Vol. 2, pp. 1–8.
    (5)Changli Zhang, Daniel Zeng, Jiexun Li, Fei-Yue Wang, Wanli Zuo, (2009). “Sentiment analysis of Chinese documents: From sentence to document level”, JASIST, pp. 2474-2487
    (6)Chen, K. J. & S. H. Kiu, (1992). Word Identification for Mandarin Chinese Sentences. Fifth International Conference on Computational Linguistics, pp.101-107.
    (7)Das, S. & M. Chen, (2001). Yahoo! For Amazon: Sentiment parsing from small talkon the com, European Finance Association Meeting: Barcelona.
    (8)Ekman, P. & W. Friesen, (1978). “Facial Action Coding System: A Technique for the Measurement of Facial Movement, ” Palo Alto, Calif.: Consulting Psychologists Press, Inc.
    (9)Gu, B., P. Konana, A. Liu, B. Rajagopalan & J. Ghosh, (2006). Predictive value of stock message board sentiments, McCombs Research Paper Series No. IROM-11-06.
    (10)Hu, M. & B. Liu, (2004). “Mining and summarizing customer reviews,” In Proceedings of the 10th international conference on knowledge discovery and data mining (ACMSIGKDD 2004) , pp. 168–177.
    (11)Hu, M. & B. Liu, (2004). “Mining and Summarizing Customer Reviews,” Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 168-177.
    (12)Ingvaldsen, J. E. & J. A. Gulla, (2012), Industrial application of semantic process mining, Enterprise Information Systems, Vol. 6, Issue 2, pp. 139–163.
    (13)Larsson, A., & H. Moe, (2011). Studying political microblogging: Twitter users in the 2010 Swedish election campaign, New Media and Society, Vol. 14, pp. 727–747.
    (14)Li Zhuang, Feng Jing & Xiao-Yan Zhu (2006). “Movie review mining and summarization,” In the Proceedings of the CIKM Conference
    (15)Li, L., R. Ge, S. Zhou, & R. Valerdi, (2012), Guest editorial integrated healthcare information systems, IEEE Transactions on Information Technology in Biomedicine, Vol. 16, Issue 4, pp. 515–517.
    (16)Lim, M. (2012). Clicks, cabs, and coffee houses: Social media and oppositional movements in Egypt, 2004–2011, Journal of Communication, Vol. 62, pp. 231–248.
    (17)Liu, B. (2010). “Sentiment Analysis and Subjectivity.” Handbook of Natural Language Processing.
    (18) Liu, B. (2012). Sentiment Analysis and Opinion Mining, Morgan & Claypool Publisher.
    (19)Liu, Y., Huang, X., An, A., & Yu, X. (2007), ARSA: A sentiment-aware model for predicting sales performance using blogs, In Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval, pp. 607–614. New York.
    (20)Mostafa, M. (2013). More than words: Social networks’ text mining for consumer brand sentiments, Expert Systems with Applications ,Vol. 40, Issue 10, pp. 4241–4251.
    (21)Na, J., T. Thet, & C. Khoo, (2010). Comparing sentiment expression in movie reviews from four online genres, Online Information Review, Vol. 34, pp. 317–338.
    (22)Nie, Jian-Yun, Brisebois, Martin & Ren, Xiaobo (1996). On Chinese Text Retrieval. Conference Proceedings of SIGIR, pp.225-233.
    (23)Pang, B., L. Lee & S. Vaithyanathan, (2002). “Thumbs up? : Sentiment classification using machine learning techniques,” Proceedings of the ACL-02 conference on empirical methods in natural language processing, Vol. 10, pp. 79–86, Association for Computational Linguistics.
    (24)Pang, B., L. Lee, & S. Vaithyanathan, (2002). “Thumbs up? : Sentiment classification using machine learning techniques,” Proceedings of the ACL-02 conference on empirical methods in natural language processing, Vol. 10, pp. 79–86, Association for Computational Linguistics.
    (25)Sproat, R. & C. Shih, (1990). A Statistical Method for Finding Word Boundaries in Chinese Text. Computer Processing of Chinese and Oriental Languages, pp.336-351.
    (26)Sun, Y. T. et al.(2010). “Sentiment Classification of Short Chinese Sentences,” Proceedings of the 22nd Conference on Computational Linguistics and Speech Processing (ROCLING 2010), pp. 187-198, 2010.
    (27)Tsantis, L. & J. Castellani, (2001), Enhancing learning environments through solution-based knowledge discovery tools, Journal of Special Education Technology, Vol. 16, Issue 4, pp. 1–35.
    (28)Tumasjan, A., O. Timm, P. G. Sprenger, I. Sandner & M. Welpe (2010). Predicting elections with twitter: What 140 characters reveal about political sentiment.
    (29)Tumasjan, A., T. Sprenger, P. Sandner & I. Welpe, (2011). Election forecasts with Twitter: How 140 characters reflect the political landscape, Social Science Computer Review, Vol. 29, pp. 402–418.
    (30) Wang, J. H. & T.W. Ye, (2013). “Microblog Sentiment Analysis based on Opinion Target Modifying,” Proceedings of the 25th Conference on Computational Linguistics and Speech Processing (ROCLING 2013), pp. 168-182, 2013.
    (31)Williams, C. & G. Gulati, (2008). What is a social network worth? Facebook and vote share in the 2008 presidential primaries, In The annual meeting of the American political science association, pp. 1–17. Boston, MA: APSA.
    (32)Yeh, C. L. & H. J. Lee, (1991). “Rule-Based Word Identification for Mandarin Chinese Sentences-A Unification Approach,” Computer Processing of Chinese and Oriental Languages, Vol. 5, No. 2, pp. 97-118.
    (33)Yi, J., T. Nasukawa, R. Bunescu & W. Niblack, (2003). “Sentiment analyzer : Extracting sentiments about a given topic using natural language-processing techniques,” In Proceedings of the 3rd IEEE international conference on data mining(ICDM’ 2003) , pp. 427–434. Los Alamitos, CA.
    (34)Zhang, W., H. Xu & W. Wan, (2012). Weakness Finder: Find product weakness from Chinese reviews by using aspects based sentiment analysis, Expert System with Applications, Vol. 39, Issue 11, pp. 10283–10291.
    (35)Zhuang, L., F. Jing & X. Zhu, (2006). Movie review mining and summarization, In Proceedings of the 15th ACM conference.
    Description: 碩士
    國立政治大學
    統計學系
    107354015
    Source URI: http://thesis.lib.nccu.edu.tw/record/#G0107354015
    Data Type: thesis
    DOI: 10.6814/NCCU202000663
    Appears in Collections:[統計學系] 學位論文

    Files in This Item:

    File Description SizeFormat
    401501.pdf13100KbAdobe PDF0View/Open


    All items in 政大典藏 are protected by copyright, with all rights reserved.


    社群 sharing

    著作權政策宣告
    1.本網站之數位內容為國立政治大學所收錄之機構典藏,無償提供學術研究與公眾教育等公益性使用,惟仍請適度,合理使用本網站之內容,以尊重著作權人之權益。商業上之利用,則請先取得著作權人之授權。
    2.本網站之製作,已盡力防止侵害著作權人之權益,如仍發現本網站之數位內容有侵害著作權人權益情事者,請權利人通知本網站維護人員(nccur@nccu.edu.tw),維護人員將立即採取移除該數位著作等補救措施。
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - Feedback