English  |  正體中文  |  简体中文  |  Post-Print筆數 : 27 |  Items with full text/Total items : 118414/149451 (79%)
Visitors : 78493779      Online Users : 160
RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version
    政大機構典藏 > 商學院 > 統計學系 > 學位論文 >  Item 140.119/159040
    Please use this identifier to cite or link to this item: https://nccur.lib.nccu.edu.tw/handle/140.119/159040


    Title: 以統計方法辨識碩士及ChatGPT生成之經濟學類論文摘要
    A Statistical Approach to Identifying Master's and ChatGPT-Generated Abstracts in the Field of Economics
    Authors: 張祐瑜
    Chang, You-yu
    Contributors: 余清祥
    楊曉文

    Yu, Qing-Xiang
    Yang, Xiao-Wen

    張祐瑜
    Chang, You-yu
    Keywords: 文字分析
    探索性資料分析
    ChatGPT
    大型語言模型
    寫作風格
    Text analysis
    Exploratory Data Analysis (EDA)
    ChatGPT
    Large Language Models
    Writing Style
    Date: 2025
    Issue Date: 2025-09-01 14:49:47 (UTC+8)
    Abstract: 文字紀錄使後世得以窺見各時期的文化發展、社會縮影及技術演變。其中,
    摘要通常可視為一篇文章或書籍的濃縮,讀者可迅速掌握全文主軸及重點,其風格與架構有別於一般文章。由於電腦科技快速發展進步,大型語言等模型為生活增加便利及創造發展可能,但同時也帶來潛在隱患,近年屢傳獲獎文章及創作依賴ChatGPT等AI工具,此舉不僅引發公平性的討論,也顛覆傳統對於教育、研究、創新的作法及定位。ChatGPT於民國111年推出,不久後便廣為人知,故本文以民國107~109年臺灣經濟學門碩士論文摘要為研究對象,彼時生成式AI尚未蔚為風氣。同時藉由 ChatGPT 生成等量摘要,透過探索性資料分析及統計方法比較兩種文本的寫作風格,進而判別論文真偽。
    研究整理兩文本字、詞、句長及多樣性等基本統計量,結合常見字詞、模糊
    性詞彙作為解釋變數,以統計模型—羅吉斯迴歸篩選出具顯著性特徵,並與機器學習模型比較。結果顯示使用 ChatGPT 生成的摘要傾向使用短句建構文本,碩士生文章則為長短句交錯;生成文本使用「提升」、「提供」、「建議」及「此外」等詞彙的比例較高,碩士生摘要運用虛字「之」頻率較高。以上述探索性資料分析挑選之解釋變數,利用羅吉斯迴歸、隨機森林等機器學習模型辨別論文摘要真偽,其分類準確率皆有不錯得效果,不過,本文方法可透過較少變數及計算量即可達到類似效果,並能提供讀者區隔寫作特色的主要差異。
    Written records provide future generations insights into cultural developments, societal snapshots, and technological evolutions across various historical periods. Among these, abstracts serve as condensed versions of articles or books, enabling readers to quickly grasp the main ideas and essential points. Their style and structure differ significantly from regular prose. With rapid advancements in computer technology, large language models have brought convenience and developmental opportunities into daily life. However, they have also introduced potential concerns. In
    recent years, numerous award-winning articles and creative works have reportedly relied heavily on AI tools like ChatGPT, raising debates about fairness and fundamentally transforming traditional educational, research, and innovation practices. ChatGPT, released in 2022, quickly gained widespread attention. Hence, this study focuses on the abstracts of Taiwanese master's theses in economics from 2018 to 2020, a period before generative AI became prevalent. Equivalent volumes of abstracts were generated using ChatGPT for comparative analysis of writing styles through exploratory data analysis (EDA) and statistical methods to distinguish authenticity.
    This research compiles fundamental textual statistics including characters, words, sentence length, and diversity, alongside frequently occurring and ambiguous words as explanatory variables. Statistical models, specifically logistic regression, were used to identify significant features, and results were compared with machine learning models. Findings indicate that ChatGPT-generated abstracts tend to use shorter sentences, whereas master's students’ abstracts feature a mix of long and short sentences. Generated texts exhibit a higher frequency of words such as "enhance," "provide," "suggest," and "furthermore," while master's students more frequently use function words like "of" (之). Utilizing explanatory variables selected from EDA, logistic regression and machine learning models such as random forest successfully classified the authenticity of the abstracts with high accuracy. Notably, the methods employed in this study achieved similar classification accuracy with fewer variables and reduced computational effort, clearly highlighting significant stylistic distinctions for readers.
    Reference: 一、 中文文獻
    [1] 余清祥. (1998). 統計在紅樓夢的應用 (註). 國立政治大學學報, (76-77), 303.
    [2] 余清祥, & 葉昱廷. (2020). 以文字探勘技術分析臺灣四大報文字風格. 數位典藏與數位人文, (6), 69-96.
    [3] 陳庭偉. (2021). 運用文字探勘分析人民日報的風格變遷. 政治大學統計學系學位論文, 2021, 1-78.
    [4] 郭小东. (2023). 生成式人工智能的风险及其包容性法律治理. 北京理工大学学报 (社会科学版), 25(6), 93-105.
    二、 英文文獻
    [1] Church, K. W. (1989). A stochastic parts program and noun phrase parser for unrestricted text. In International Conference on Acoustics, Speech, and Signal Processing, pp. 695-698. IEEE.
    [2] Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8), 1735-1780.
    [3] Ramos, J. (2003). Using tf-idf to determine word relevance in document queries. In Proceedings of the first instructional conference on machine learning 242(1), 29-48.
    [4] Ma, W. Y. & Chen, K. J. (2003). A bottom-up merging algorithm for Chinese unknown word extraction. In Proceedings of the second SIGHAN workshop on Chinese language processing, 31-38.
    [5] Mihalcea, R., & Tarau, P. (2004). Textrank: Bringing order into text. In Proceedings of the 2004 conference on empirical methods in natural language processing, 404-411.
    [6] Chen, Z., Huang, L., Yang, W., Meng, P., & Miao, H. (2012). More than word frequencies: Authorship attribution via natural frequency zoned word distribution analysis. arXiv preprint arXiv:1208.3001.
    [7] Hu, X., Wang, Y., & Wu, Q. (2014). Multiple authors detection: a quantitative analysis of dream of the red chamber. Advances in adaptive data analysis, 6(04), 1450012.
    [8] Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473.
    [9] Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, 785-794.
    [10] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30.
    [11] Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2019, June). BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers) (pp. 4171-4186).
    [12] Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D. M., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford, A., Sutskever, I., & Amodei, D. (2020). Language models are few-shot learners. Advances in Neural Information Processing Systems, 33, 1877-1901.
    [13] Desaire, H., Chua, A. E., Isom, M., Jarosova, R., & Hua, D. (2023). Distinguishing academic science writing from humans or ChatGPT with over 99% accuracy using off-the-shelf machine learning tools. Cell Reports Physical Science, 4(6).
    [14] Webb, L., & Schönberger, D. (2024). Generative AI and the problem of existential risk. arXiv preprint arXiv:2407.13365.
    Description: 碩士
    國立政治大學
    統計學系
    112354021
    Source URI: http://thesis.lib.nccu.edu.tw/record/#G0112354021
    Data Type: thesis
    Appears in Collections:[統計學系] 學位論文

    Files in This Item:

    File Description SizeFormat
    402101.pdf2689KbAdobe PDF0View/Open


    All items in 政大典藏 are protected by copyright, with all rights reserved.


    社群 sharing

    著作權政策宣告 Copyright Announcement
    1.本網站之數位內容為國立政治大學所收錄之機構典藏,無償提供學術研究與公眾教育等公益性使用,惟仍請適度,合理使用本網站之內容,以尊重著作權人之權益。商業上之利用,則請先取得著作權人之授權。
    The digital content of this website is part of National Chengchi University Institutional Repository. It provides free access to academic research and public education for non-commercial use. Please utilize it in a proper and reasonable manner and respect the rights of copyright owners. For commercial use, please obtain authorization from the copyright owner in advance.

    2.本網站之製作,已盡力防止侵害著作權人之權益,如仍發現本網站之數位內容有侵害著作權人權益情事者,請權利人通知本網站維護人員(nccur@nccu.edu.tw),維護人員將立即採取移除該數位著作等補救措施。
    NCCU Institutional Repository is made to protect the interests of copyright owners. If you believe that any material on the website infringes copyright, please contact our staff(nccur@nccu.edu.tw). We will remove the work from the repository and investigate your claim.
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - Feedback