English  |  正體中文  |  简体中文  |  Post-Print筆數 : 27 |  全文筆數/總筆數 : 118786/149850 (79%)
造訪人次 : 81852407      線上人數 : 473
RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
搜尋範圍 查詢小技巧:
  • 您可在西文檢索詞彙前後加上"雙引號",以獲取較精準的檢索結果
  • 若欲以作者姓名搜尋,建議至進階搜尋限定作者欄位,可獲得較完整資料
  • 進階搜尋
    政大機構典藏 > 理學院 > 應用數學系 > 學位論文 >  Item 140.119/159318
    請使用永久網址來引用或連結此文件: https://nccur.lib.nccu.edu.tw/handle/140.119/159318


    題名: ScoreRAG:基於檢索增強生成與一致性相關性評分的結構化新聞生成框架
    ScoreRAG: A Retrieval-Augmented Generation Framework with Consistency-Relevance Scoring and Structured Summarization for News Generation
    作者: 林佩昀
    Lin, Pei-Yun
    貢獻者: 蔡炎龍
    Tsai, Yen-Lung
    林佩昀
    Lin, Pei-Yun
    關鍵詞: 檢索增強生成
    新聞生成
    大型語言模型
    語意重排序
    分級摘要生成
    自然語言處理
    Retrieval-Augmented Generation
    News Generation
    Large Language Models
    Semantic Reranking
    Graded Summarization
    Natural Language Processing
    日期: 2025
    上傳時間: 2025-09-01 16:30:18 (UTC+8)
    摘要: 本研究提出名為ScoreRAG的方法,旨在提升大型語言模型生成新聞文章的品質。雖然自然語言處理和大型語言模型已有顯著發展,然而在新聞生成任務中,語言模型仍面臨幻覺、事實不一致以及缺乏領域專業知識等挑戰。ScoreRAG透過結合檢索增強生成、一致性的相關性評估和結構化摘要的多階段框架整合來解決這些問題。該系統首先從向量資料庫中檢索相關新聞文檔,並對應至新聞資料庫以獲取完整的文章內容,接著利用大型語言模型評估檢索文檔與新聞的相關性分數,並根據相關性分數對檢索文檔進行排序和過濾,移除低相關文章。最後,系統根據相關性分數進行分級摘要,並將結果與系統提示詞一同輸入語言模型進行最終輸出。透過此方法,ScoreRAG旨在顯著提高生成新聞內容的準確性、連貫性、資訊豐富度以及專業性,同時在整個生成過程中保持穩定性和一致性。程式碼與演示:https://github.com/peiyun2260/ScoreRAG
    This research introduces ScoreRAG, an approach to enhance the quality of automated news generation. Despite advancements in Natural Language Processing and large language models, current news generation methods often struggle with hallucinations, factual inconsistencies, and lack of domain-specific expertise when producing news articles. ScoreRAG addresses these challenges through a multi-stage framework combining retrieval-augmented generation, consistency relevance evaluation, and structured summarization. The system first retrieves relevant news documents from a vector database, maps them to complete news items, and assigns consistency relevance scores based on large language model evaluations. These documents are then reranked according to relevance, with low-quality items filtered out. The framework proceeds to generate graded summaries based on relevance scores, which guide the large language model in producing complete news articles following professional journalistic standards. Through this methodical approach, ScoreRAG aims to significantly improve the accuracy, coherence, informativeness, and professionalism of generated news articles while maintaining stability and consistency throughout the generation process. The code and demo are available at: https://github.com/peiyun2260/ScoreRAG
    參考文獻: [1] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Ł ukasz Kaiser, and Illia Polosukhin. Attention is all you need. In I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017.
    [2] Lei Huang, Weijiang Yu, Weitao Ma, Weihong Zhong, Zhangyin Feng, Haotian Wang, Qianglong Chen, Weihua Peng, Xiaocheng Feng, Bing Qin, and Ting Liu. A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions. ACM Trans. Inf. Syst., 43(2), January 2025.
    [3] Ziwei Xu, Sanjay Jain, and Mohan Kankanhalli. Hallucination is inevitable: An innate limitation of large language models, 2024.
    [4] Sebastian Farquhar et al. Detecting hallucinations in large language models using semantic entropy. Nature, 630:625–630, June 2024.
    [5] Daniel Jurafsky and James H. Martin. Speech and Language Processing. Stanford University, 3rd edition, 2023. Draft edition.
    [6] Adam L. Berger, Stephen A. Della Pietra, and Vincent J. Della Pietra. A maximum entropy approach to natural language processing. Computational Linguistics, 22(1):39–71, 1996.
    [7] Corinna Cortes and Vladimir Vapnik. Support-vector networks. Machine Learning, 20(3):273–297, 1995.
    [8] Geoffrey E. Hinton, Simon Osindero, and Yee Whye Teh. A fast learning algorithm for deep belief nets. Neural Computation, 18(7):1527–1554, 2006.
    [9] Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. Nature, 521(7553):436–444, 2015.
    [10] David E. Rumelhart, Geoffrey E. Hinton, and Ronald J. Williams. Learning internal representations by error propagation. Tech. Rep. ICS 8506, Institute for Cognitive Science, University of California, San Diego, 1985.
    [11] Jeffrey L. Elman. Finding structure in time. Cognitive Science, 14(2):179–211, 1990.
    [12] Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of word representations in vector space. In Proceedings of the International Conference on Learning Representations (ICLR 2013), Workshop Track, 2013. Presented at the ICLR 2013 Workshop Track.
    [13] Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory. Neural Computation, 9(8):1735–1780, 1997.
    [14] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
    [15] Stuart Russell and Peter Norvig. Artificial Intelligence: A Modern Approach. Pearson, 4 edition, 2021.
    [16] Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Matt Kusner, Willie Neiswanger, Wen-tau Yu, and Sebastian Riedel. Retrieval-augmented generation for knowledge-intensive nlp tasks. In Advances in Neural Information Processing Systems (NeurIPS), 2020.
    [17] Hongyin Li, Jie Huang, Jiahui Huang, Lei Han, and Bing Qin. Re2g: Retrieve, rerank, and generate for factual open-domain question answering. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 3143–3157, December 2023.
    [18] Zeming Ji, Nayeon Lee, Rudolf Frieske, Tao Yu, Dan Su, Yan Xu, Etsuko Ishii, Yejin Bang, Andrea Madotto, and Pascale Fung. Hallucination in natural language generation: A survey. ACM Computing Surveys, 2023.
    [19] Drew McDermott, Malik Ghallab, Adele Howe, Craig Knoblock, James Kurien, John L. Bresina, Brian Drabble, Alan Garvey, Keith Golden, J. Scott Penberthy, David Smith, and Daniel Weld. Pddl—the planning domain definition language. Tech. Rep. CVC TR-98-003/DCS TR-1165, Yale Center for Computational Vision and Control, 1998. 40
    [20] Shinn Yao, Yujia Zhao, Dian Yu, Jing Cao, Michael K. Y. Li, Nanyun Peng, and Daniel S. Weld. Plan-and-solve prompting for complex reasoning tasks. In Findings of the Association for Computational Linguistics: EMNLP 2023, December 2023.
    [21] Liang Wang, Nan Yang, Xiaolong Huang, Linjun Yang, Rangan Majumder, and Furu Wei.
    Multilingual e5 text embeddings: A technical report, 2024.
    [22] Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer, and Veselin Stoyanov. Unsupervised cross-lingual representation learning at scale. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 8440–8451. Association for Computational Linguistics, 2020.
    [23] Chroma Core. Chroma: The AI-native open-source embedding database. https://github.com/chroma-core/chroma. Accessed: Apr. 21, 2025.
    [24] LangChain-AI. LangChain: Build context-aware reasoning applications. https://github.com/langchain-ai/langchain. Accessed: Apr. 21, 2025.
    [25] Chroma. Chroma: Open-source AI application database. https://www.trychroma.com/.Accessed: Apr. 21, 2025.
    [26] MongoDB, Inc. MongoDB: The developer data platform. https://www.mongodb.com/.Accessed: Apr. 21, 2025.
    [27] Xuezhi Wang, Jason Wei, Dale Schuurmans, Ed Chi, Quoc Le, and Eric Chi. Self-consistency improves chain of thought reasoning in language models. In Proceedings of the International Conference on Learning Representations (ICLR), 2023.
    描述: 碩士
    國立政治大學
    應用數學系
    111751002
    資料來源: http://thesis.lib.nccu.edu.tw/record/#G0111751002
    資料類型: thesis
    顯示於類別:[應用數學系] 學位論文

    文件中的檔案:

    檔案 大小格式瀏覽次數
    100201.pdf13328KbAdobe PDF0檢視/開啟


    在政大典藏中所有的資料項目都受到原著作權保護.


    社群 sharing

    著作權政策宣告 Copyright Announcement
    1.本網站之數位內容為國立政治大學所收錄之機構典藏,無償提供學術研究與公眾教育等公益性使用,惟仍請適度,合理使用本網站之內容,以尊重著作權人之權益。商業上之利用,則請先取得著作權人之授權。
    The digital content of this website is part of National Chengchi University Institutional Repository. It provides free access to academic research and public education for non-commercial use. Please utilize it in a proper and reasonable manner and respect the rights of copyright owners. For commercial use, please obtain authorization from the copyright owner in advance.

    2.本網站之製作,已盡力防止侵害著作權人之權益,如仍發現本網站之數位內容有侵害著作權人權益情事者,請權利人通知本網站維護人員(nccur@nccu.edu.tw),維護人員將立即採取移除該數位著作等補救措施。
    NCCU Institutional Repository is made to protect the interests of copyright owners. If you believe that any material on the website infringes copyright, please contact our staff(nccur@nccu.edu.tw). We will remove the work from the repository and investigate your claim.
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 回饋