政大機構典藏-National Chengchi University Institutional Repository(NCCUR):Item 140.119/124825

政大機構典藏-National Chengchi University Institutional Repository(NCCUR):Item 140.119/124825

English | 正體中文 | 简体中文 | Post-Print筆數 : 27 | 全文笔数/总笔数 : 109952/140901 (78%)
造访人次 : 46062339 在线人数 : 1008

RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.

搜寻范围

查询小技巧：

您可在西文检索词汇前后加上"双引号"，以获取较精准的检索结果

若欲以作者姓名搜寻，建议至进阶搜寻限定作者字段，可获得较完整数据

进阶搜寻

主页 ‧ 登入 ‧ 上传 ‧ 说明 ‧ 关于政大典藏 ‧ 管理

到手机版

政大機構典藏 > 文學院 > 圖書資訊與檔案學研究所 > 學位論文 > Item 140.119/124825

请使用永久网址来引用或连结此文件: https://nccur.lib.nccu.edu.tw/handle/140.119/124825

题名:	基於主動式學習之古漢語斷句系統發展與應用研究 Development and Application of An Ancient Chinese Sentence Segmentation System Based on Active Learning
作者:	徐志帆 Hsu, Chih-Fan
贡献者:	陳志銘 Chen, Chih-Ming 徐志帆 Hsu, Chih-Fan
关键词:	數位人文主動學習機器學習自動化古漢語斷句人機互動 digital humanities active learning machine learning automatic ancient Chinese sentence segmentation human-computer interaction
日期:	2019
上传时间:	2019-08-07 16:26:24 (UTC+8)
摘要:	本研究旨在開發支援數位人文研究之「基於主動式學習的古漢語文本斷句系統」，結合主動學習與機器學習演算法，透過人機合作模式降低建立自動化古漢語斷句建立模型時所需的訓練語料，並協助人文學者面對未解讀過的文獻能更有效率的進行斷句判讀作業。為了找出最合適建立「基於主動式學習的古漢語文本斷句系統」的的演算法與特徵模板，本研究設計第一個實驗採用了不同的演算法與特徵模板配合依序文本和主動學習兩種選擇文本方法所建立的斷句模型進行比較。實驗結果發現，條件隨機場(conditional random fields)與三字詞特徵模板在主動學習方法中能有效地進行學習，適合發展「主動學習斷句模式」。第二個實驗邀請人文專長領域的學者使用「基於主動式學習的古漢語文本斷句系統」進行古漢語文本的斷句判讀，以人文學者各自標註資料建立的斷句模型進行比較分析，並輔以半結構式訪談深度了解人文學者對於本研究發展之系統輔以斷句的使用感受與建議。實驗結果發現「基於主動式學習的古漢語文本斷句系統」確實能有效學習人文學者的斷句標註資料，並且模型預測能力能基於人機合作而不斷提升。此外，分析過程中發現模型的斷句預測能力與人文學者的標註種類比和相鄰字種類比有顯著負相關。最後，透過訪談結果歸納得知人文學者對於系統操作流程與介面具有正面評價，多數受訪者認為本系統的斷句預測功能在古漢語斷句上能提供有效之輔助功能。未來可考量增加命名實體模型或其他古漢語規則的特徵模板設計，以進一步提升斷句預測能力，也希冀能將發展的系統運用在人文領域教育上，發展為訓練古漢語斷句之數位人文教育平台。 This study aims to develop an “Ancient Chinese Sentence Segmentation System Based on Active Learning” for supporting digital humanities research, combine active learning and machine learning algorithms, reduce training corpora required for establishing an automatic ancient Chinese sentence segmentation model through human-computer cooperation model, and assist humanists in efficient sentence segmentation interpretation when facing literatures which have not been interpreted. To find out the most suitable algorithm and feature template for establishing the “Ancient Chinese Sentence Segmentation System Based on Active Learning”, the sentence segmentation models established by applying different algorithms and feature templates matched with sequential text and active learning are compared in the first experiment in this study. The experimental results reveal that conditional random fields and three-word feature templates could effectively precede learning in active learning that they are suitable for developing an “active learning sentence segmentation model”. Humanities researchers are invited to use the “Ancient Chinese Sentence Segmentation System Based on Active Learning” for the sentence segmentation interpretation of ancient Chinese texts. Sentence segmentation model established by individual humanist’s annotation data are compared and analyzed, and semi-structured interview is used for deeply understanding humanists’ use perception of sentence segmentation with the system developed in this study and suggestions. The experimental results show that the “Ancient Chinese Sentence Segmentation System Based on Active Learning” could effectively learn humanists’ sentence segmentation annotation data and the prediction ability of the model, based on human-computer cooperation, could be constantly promoted. Significantly negative correlations between sentence segmentation prediction ability and humanists’ annotation type ratio and adjacent word type ratio are discovered in the analysis process. According to the interviews, humanists present positive evaluation on the system operation process and interface. Most respondents consider that the sentence segmentation prediction function of the system could provide effective assistance in ancient Chinese sentence segmentation. Naming solid model or other feature template design with ancient Chinese rules could be increased to further promote the sentence segmentation prediction ability. It is also expected to apply the developed system to humanities education and develop the digital humanities education platform for training ancient Chinese sentence segmentation.
參考文獻:	中文部分牛紅廣 (2014)。關於古籍數字化性質及開發的思考。圖書館, (2), 107-108. 王力 (1976)。古漢語通論 (Vol. 2)。中外出版社。王丹。(2010)。古籍數字化與古典文學研究。社科縱橫，2,98-99。李鐸、王毅(2005)。關於古代文獻信息化工程與古典文學研究之間互動關係的對話。文學遺產，1，126-137。李響、才藏太、姜文斌、呂雅娟、劉群(2011)。最大熵和規則相結合的藏文句子邊界識別方法。中文信息學報，25(4)，39-45。林爾正、林丹紅(2007)。計算機應用於古籍整理研究概況。情報探索，2007(6)，28-29。梁喜濤、顧磊 (2015)。基於分層選擇策略的主動學習分詞方法。計算機應用研究，32(5)，1353-1356。張逸(2018)。唐代墓誌銘與中國佛教寺廟志斷句研究。國立政治大學，臺北市。張開旭、夏雲慶、宇航(2009)。基於條件隨機場的古漢語自動斷句與標點方法。清華大學學報: 自然科學版，(10)，1733-1736。黃瀚萱、孫春在(2007)。以序列標記方法解決古漢語斷句問題。國立交通大學，新竹市。黃水清、王東波(2017)。古文信息處理研究的現狀及趨勢。圖書情報工作， 61(12)，43-49. 葉智豪、王盟鈞、蔡宗翰(2011)。歷史文獻的命名實體描顯取一結合主動學習法之半監督式模型. 從保存到創造: 開啟數位人文研究。 1，131。楊樹達(1963)。古書句讀釋例。中華書局。趙敏俐、杜曉勤(2013)。國學大數據時代來了。光明日報，09-16。潘德利(2002)。中國古籍數字化進程和展望。圖書情報工作，46(7)， 117-120。兰和群(2005)。古文断句与翻译技巧。河南师范大学学报: 哲学社会科学版， 32(3)，120-121。顧磊、趙陽(2016)。古籍數字化標註資源建設的意義及其現狀分析。圖書館學研究，(4)，49-52。劉康、錢旭、王自強(2012)。主動學習算法綜述。計算機工程與應用，48(34)，1-4。劉瀏、王東波、黃水清(2017)。機器學習視角的人工智能研究回顧及對圖書情報學的影響。圖書與情報，37（06），84-95。西文部分 Graves, A. Supervised sequence labelling with recurrent neural networks. 2012. ISBN 9783642212703. URL http://books. google. com/books. Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8), 1735-1780. Hu, Y. (2016). Classical Chinese Sentence Segmentation as Sequence Labeling. Li, S., Zhou, G., & Huang, C. R. (2012). Active learning for Chinese word segmentation. Proceedings of COLING 2012: Posters, 683-692. Lewis, D. D., & Gale, W. A. (1994, August). A sequential algorithm for training text classifiers. In Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval (pp. 3-12). Springer-Verlag New York, Inc.. Krishnakumar, A. (2007). Active learning literature survey. Technical reports, University of California, Santa Cruz. 42. Olsson, F. (2009). A literature survey of active machine learning in the context of natural language processing. Seung, H. S., Opper, M., & Sompolinsky, H. (1992, July). Query by committee. In Proceedings of the fifth annual workshop on Computational learning theory (pp. 287-294). ACM. Settles, B., & Craven, M. (2008, October). An analysis of active learning strategies for sequence labeling tasks. In Proceedings of the conference on empirical methods in natural language processing (pp. 1070-1079). Association for Computational Linguistics. Settles, B. (2012). Active learning. Synthesis Lectures on Artificial Intelligence and Machine Learning, 6(1), 1-114. Sutton, C., & McCallum, A. (2012). An introduction to conditional random fields. Foundations and Trends® in Machine Learning, 4(4), 267-373. Wang, B., Shi, X. and Su, J. (2017). A sentence segmentation method for ancient Chinese texts based on recurrent neural network. Acta Scientiarum Naturalium Universitatis Pekinensis, 53(2):255‒261. (in Chinese) Wang, B., Shi, X., Tan, Z., Chen, Y. and Wang, W. (2016). A sentence segmentation method for ancient Chinese texts based on NNLM. Proceedings of the Chinese Lexical Semantics Workshop 2016, Lecture Notes in Computer Science 10085, pp. 387–396. Witten, I. H., Frank, E., Hall, M. A., & Pal, C. J. (2016). Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann.
描述:	碩士國立政治大學圖書資訊與檔案學研究所 106155007
資料來源:	http://thesis.lib.nccu.edu.tw/record/#G0106155007
数据类型:	thesis
DOI:	10.6814/NCCU201900543
显示于类别:	[圖書資訊與檔案學研究所] 學位論文

文件中的档案:

档案	大小	格式	浏览次数
500701.pdf	1822Kb	Adobe PDF2	0	检视/开启

在政大典藏中所有的数据项都受到原著作权保护.

社群 sharing

著作權政策宣告 Copyright Announcement

1.本網站之數位內容為國立政治大學所收錄之機構典藏，無償提供學術研究與公眾教育等公益性使用，惟仍請適度，合理使用本網站之內容，以尊重著作權人之權益。商業上之利用，則請先取得著作權人之授權。
The digital content of this website is part of National Chengchi University Institutional Repository. It provides free access to academic research and public education for non-commercial use. Please utilize it in a proper and reasonable manner and respect the rights of copyright owners. For commercial use, please obtain authorization from the copyright owner in advance.

2.本網站之製作，已盡力防止侵害著作權人之權益，如仍發現本網站之數位內容有侵害著作權人權益情事者，請權利人通知本網站維護人員(nccur@nccu.edu.tw)，維護人員將立即採取移除該數位著作等補救措施。
NCCU Institutional Repository is made to protect the interests of copyright owners. If you believe that any material on the website infringes copyright, please contact our staff(nccur@nccu.edu.tw). We will remove the work from the repository and investigate your claim.

DSpace Software Copyright © 2002-2004 MIT & Hewlett-Packard / Enhanced by NTU Library IR team Copyright © - 回馈