English  |  正體中文  |  简体中文  |  Items with full text/Total items : 88295/117812 (75%)
Visitors : 23400564      Online Users : 60
RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version
    Please use this identifier to cite or link to this item: http://nccur.lib.nccu.edu.tw/handle/140.119/100571

    Title: 基於主題模型之社群媒體內容分析探索
    Exploring Topic Models for Analyzing the Contents of Social Media
    Authors: 廖舒婷
    Liao, Shu Ting
    Contributors: 陳恭
    Chen, Kung
    Liao, Shu Ting
    Keywords: 主題分析
    Topic Models
    Text Mining
    Social Media
    Date: 2016
    Issue Date: 2016-08-22 13:40:38 (UTC+8)
    Abstract: 隨著網路文章訊息量的快速增長,傳統內容分析已無法在短時間內有效地處理和解析龐雜文本潛在意義,為此,本研究嘗試建置一套以非監督式學習主題模型技術為核心的工具,結合自然語言處理可協助研究學者快速處理與探索大量中文資料,挖掘蘊藏的知識。並透過整合自動化的評估機制,提供模型效果好壞之參考。另由於主題模型所產出的結果仍需要人工判讀,因此本研究再利用視覺化技術呈現,以輔助研究學者詮釋結果。
    Recently, the data retrieved from the internet are too large for traditional content analysis methods to handle and extract high quality insights in reasonable amounts of time. To address this issue, we develop a data analysis system based on unsupervised topic modeling method. In particular, we focus on applying this tool to process Chinese texts. By a proper integration with the Chinese tokenization tool, jieba, our system is able to explore and analyze Chinese documents rapidly yet effectively. Besides, the system also automatically performs a quantitative evaluation of the quality of the generated model, which is useful for the user to get an idea quickly about how well the model works. Finally, as the outputs produced by topic modeling rely on human interpretation, we present a method for visualizing topic modeling results to help end-users understand and interpret what topics have been discovered.
    To evaluate our system, six Chinese text data sets of different network media sources are used for experiment. The result in this study shows the proposed system can be applied to analyze large volumes of unlabeled Chinese text and help reduce manual work, and shorten the amount of time required. We then compare the topics found from social media with those from online news. It is observed that Taiwan’s Sunflower Movement not only received great attention from people in Taiwan, overseas users in Hong Kong or China also express their concerns and opinions through social media. Furthermore, according to topic distribution, we can also find hot topics easily.
    Finally, we conduct some experiments to evaluate and understand the limiting factors of the propose system. An interesting finding is that our system can act as a data filter tool where the composition of data sets can be computed and used to define the filters for quick selection of relevant data sets from large data sets.
    Reference: [1] Sullivan, Dan. (2001). Document Warehousing and Text Mining Techniques for Improving Business Operations, Marketing,and Sales. New York: John Wiley & Sons.
    [2] Tan, A. H. (1999). Text mining: The state of the art and the challenges. In Proceedings of the PAKDD 1999 Workshop on Knowledge Disocovery from Advanced Databases (Vol. 8, pp. 65-70).
    [3] S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman. (1990). Indexing by latent semantic analysis. Journal of the American Society for Information Science, vol. 41,pp. 391-407.
    [4] T. Hofmann. (1999). Probabilistic latent semantic indexing. presented at the Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, Berkeley, California, USA.
    [5] D. M. Blei, A. Y. Ng, and M. I. Jordan. (2003). Latent dirichlet allocation. J. Mach. Learn. Res.,vol. 3,pp. 993-1022.
    [6] M. Steyvers and T. Griffths. Probabilistic topic models. (2006).
    [7] Hall, David, Daniel Jurafsky and Christopher D. Manning. (2008). Studying the history of ideas using topic models. Proceedings of the conference on empirical methods in natural language processing. Association for Computational Linguistics.
    [8] Phan, Xuan-Hieu, Le-Minh Nguyen, and Susumu Horiguchi. (2008). Learning to classify short and sparse text & web with hidden topics from large-scale data collections. Proceedings of the 17th international conference on World Wide Web. ACM.
    [9] Xin Zhao, Jing Jiang, JianshuWeng et al. (2011). Comparing Twitter and traditional media using topic models. In Proceedings of the European Conference on Information Retrieval.
    [10] Brody, Samuel, and Noemie Elhadad. (2010). An unsupervised aspect-sentiment model for online reviews. Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics. Association for Computational Linguistics.
    [11] 楚克明, and 李芳. "基于 LDA 模型的新聞話題的演化." 计算机应用与软件 28.4 (2011): 4-7.
    [12] 冯时, 景珊, 杨卓, and 王大玲, "基于 LDA 模型的中文微博话题意见领袖挖掘," 东北大学学报: 自然科学版, vol. 34, pp. 490-494, 2013.
    [13] 張日威,"應用LDA進行Plurk主題分類及使用者情緒分析",雲科大資訊管理學系碩士論文,2014.
    [14] 李日斌, "探討臺灣網民對鄰國的情感",中山大學資訊管理學系研究所碩士論文,2014.
    [15] Chang, J., Gerrish, S., Wang, C., Boyd-Graber, J. L., & Blei, D. M. (2009). Reading tea leaves: How humans interpret topic models. In Advances in neural information processing systems (pp. 288-296).
    [16] Newman, D., Lau, J. H. , Grieser, K. ,& Baldwin, T. (2010). Automatic evaluation of topic coherence. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics (pp. 100-108). Association for Computational Linguistics.
    [17] Mimno, D., Wallach, H. M., Talley, E., Leenders, M., & McCallum, A. (2011). Optimizing semantic coherence in topic models. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (pp. 262-272). Association for Computational Linguistics.
    [18] Röder, M., Both, A., & Hinneburg, A. (2015). Exploring the space of topic coherence measures. In Proceedings of the eighth ACM international conference on Web search and data mining (pp. 399-408). ACM.ISO 690.
    [19] Blei, D. M., & Lafferty, J. D. (2007). A correlated topic model of science. The Annals of Applied Statistics,17-35.
    [20] Maiya, A. S., & Rolfe, R. M. (2014). Topic similarity networks: visual analytics for large document sets. In Big Data (Big Data),2014 IEEE International Conference on (pp. 364-372). IEEE.
    [21] Harris, Z. S. (1954). Distributional Structure. Word,10(2/3),146–162.
    [22] Parnas, D. L. (1972). On the criteria to be used in decomposing systems into modules. Communications of the ACM,15(12),1053-1058.
    [23] Fayyad, U., Piatetsky-Shapiro, G., & Smyth, P. (1996). From data mining to knowledge discovery in databases. AI magazine, 17(3), 37.
    [24] Newman, D., Hagedorn, K., Chemudugunta, C., & Smyth, P. (2007). Subject metadata enrichment using statistical topic models. In Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries (pp. 366-375). ACM.
    [25] 謝宗震 (2014)。服貿事件 X 資料科學。檢自:http://readata.org/ecfa-and-data-science/
    Description: 碩士
    Source URI: http://thesis.lib.nccu.edu.tw/record/#G0103971002
    Data Type: thesis
    Appears in Collections:[資訊科學系碩士在職專班] 學位論文

    Files in This Item:

    File SizeFormat
    100201.pdf4057KbAdobe PDF20View/Open

    All items in 政大典藏 are protected by copyright, with all rights reserved.

    社群 sharing

    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - Feedback