English  |  正體中文  |  简体中文  |  Post-Print筆數 : 27 |  Items with full text/Total items : 109948/140897 (78%)
Visitors : 46095102      Online Users : 865
RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version
    政大機構典藏 > 商學院 > 統計學系 > 學位論文 >  Item 140.119/118220
    Please use this identifier to cite or link to this item: https://nccur.lib.nccu.edu.tw/handle/140.119/118220


    Title: 潛在類別分析於文字探勘之應用
    Applying Latent Class Analysis on Text Mining
    Authors: 廖彥婷
    Liaw, Yen-Ting
    Contributors: 江振東
    廖彥婷
    Liaw, Yen-Ting
    Keywords: 分類
    潛在類分析
    文字探勘
    相似性檢測
    Classification
    Latent class analysis
    Similarity detection
    Text mining
    Date: 2018
    Issue Date: 2018-07-03 17:23:43 (UTC+8)
    Abstract: 現今網路的使用已經成為主流,因此在網站上擁有大量的文字信息。文字探勘也因此成為一種流行的資料分析方法。潛在類別分析(Latent Class Analysis)是一常用於社會科學的分析方法來尋找潛藏於資料背後的潛在類別。在本文中,我們應用潛在類別分析來評估此分析方法應用於文字探勘的可行性。本文中針對兩個案例進行論證和研究,一個是比較“水滸傳”和“三國演義”的相似性檢測,另一個則是針對新聞文章的分類問題來尋找關鍵詞並據此提供結論和建議。
    There is a large amount of information on the website that is in text form, and due to the increment of internet usage, text mining has become a popular method for information retrieval. In this paper, we apply Latent Class Analysis (LCA), a technique that is often used in social sciences to reveal underlying latent classes, on text mining and check whether it is an appropriate method on this regard. Two study cases are demonstrated, one is similarity detection that compare two novels, Water Margin and Romance of Three Kingdom, and the other is using classification that classify the categories for news articles to find important keywords. Conclusions and suggestions are provided.
    Reference: Aggarwal, C. C. & Zhai, C. X. (2012). Mining Text Data. New York, NY: Springer Publishing Company.
    Forster, M. R. (2000). Key Concepts in Model Selection: Performance and Generalizability. Journal of Mathematical Psychology, 44, 205- 231.
    Lin, T. H. & Dayton, C. M. (1997). Model Selection Information Criteria for Non-Nested Latent Class Models. Journal of Educational and Behavioral Statistics, 22(3), 249-264.
    Linzer, D. A. & Lewis, J. B. (2011). poLCA: An R Package for Polytomous Variable Latent Class Analysis. Journal of Statistical Software, 42(10), 1-29.
    Matsuo, Y. & Ishizuka, M. (2004). Keyword Extraction from a Single Document Using Word Co-Occurrence Statistical Information. International Journal on Artificial Intelligence Tools, 13(1), 157-169.
    McCutcheon, A. L. (1987). Latent Class Analysis (No.64). Thousand Oaks, CA: Sage Publications.
    Mittermayer, M. (2004). Forecasting Intraday Stock Price Trends with Text Mining Techniques. Proceedings of the 37th Hawaii International Conference on System Sciences.
    Nylund, K. L., Asparouhov, T., & Muthen, B. O. (2007). Deciding on the Number of Classes in Latent Class Analysis and Growth Mixture Modeling: A Monte Carlo Simulation Study. STRUCTURAL EQUATION MODELING,14:4, 535-569, doi: 10.1080/10705510701575396.
    Rosenberg, M. (1968). The Logic of Survey Analysis. New York: Basic Books.
    Suh, J. (2016). Comparing writing style feature-based classification methods for estimating user reputations in social media. SpringerPlus 5:261. doi: 10.1186/s40064-016-1841-1
    Yue, C.J., Ho, L., Pan, Y., and Cheng, W.(2016). A Quantitative Study of Chinese Writing Style based on the New Youth Magazin, Concepts & Context in East Asia, Vol. 5.
    Zheng, R., Li, J., Chen, H. & Huang, Z. (2006). A Framework for Authorship Identification of Online Messages: Writing-Style Features and Classification Techniques. Journal of the American Society for Information Science and Technology, 57(3), 378-393.
    Zou, F., Wang, F. L., Deng, X., Han, S. & Wang, L. S. (2006). Automatic Construction of Chinese Stop Word List. Proceedings of the 5th WSEAS International Conference on Applied Computer Science, pp.1010-1015.
    王曉家(1998)。水滸傳作者考論,西安:陝西人民出版社。
    李永祜(2011)。 施耐庵和羅貫中對《水滸傳》成書的貢獻。荷澤學院學報, 33(4), 24-37。
    金聖嘆、金采、曹方人、周錫山(1985)。金聖嘆全集,江蘇古籍出版社。
    胡適(2006)。《水滸傳》考證。荷澤學院學報,28(3),131-142。
    黃崇旻(2015)。論胡適《水滸傳》考證的研究方法,世新中文研究集刊,11,95-126。
    羅盤(1983)。水滸的事蹟、版本與作者,文訊,4,155-161。
    林宏仁. (2017, Dec. 13). 停用詞.txt. Retrieved from https://github.com/tomlinNTUB/Machine-Learning/tree/master/%E4%B8%AD%E6%96%87%E5%88%86%E8%A9%9E.
    Description: 碩士
    國立政治大學
    統計學系
    105354029
    Source URI: http://thesis.lib.nccu.edu.tw/record/#G0105354029
    Data Type: thesis
    DOI: 10.6814/THE.NCCU.STAT.004.2018.B03
    Appears in Collections:[統計學系] 學位論文

    Files in This Item:

    File SizeFormat
    402901.pdf1298KbAdobe PDF213View/Open


    All items in 政大典藏 are protected by copyright, with all rights reserved.


    社群 sharing

    著作權政策宣告 Copyright Announcement
    1.本網站之數位內容為國立政治大學所收錄之機構典藏,無償提供學術研究與公眾教育等公益性使用,惟仍請適度,合理使用本網站之內容,以尊重著作權人之權益。商業上之利用,則請先取得著作權人之授權。
    The digital content of this website is part of National Chengchi University Institutional Repository. It provides free access to academic research and public education for non-commercial use. Please utilize it in a proper and reasonable manner and respect the rights of copyright owners. For commercial use, please obtain authorization from the copyright owner in advance.

    2.本網站之製作,已盡力防止侵害著作權人之權益,如仍發現本網站之數位內容有侵害著作權人權益情事者,請權利人通知本網站維護人員(nccur@nccu.edu.tw),維護人員將立即採取移除該數位著作等補救措施。
    NCCU Institutional Repository is made to protect the interests of copyright owners. If you believe that any material on the website infringes copyright, please contact our staff(nccur@nccu.edu.tw). We will remove the work from the repository and investigate your claim.
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - Feedback