資料載入中.....
|
請使用永久網址來引用或連結此文件:
https://nccur.lib.nccu.edu.tw/handle/140.119/159317
|
| 題名: | 大型語言模型的非檢索式上下文延展機制研究:從鍵值緩存到微積分AI教師 RAG-Free Contextual Extension for LLMs: A Study on KV-Cache and Calculus AI Tutoring |
| 作者: | 孫翊珈 Sun, Yi-Jia |
| 貢獻者: | 蔡炎龍 Tsai, Yen-Lung 孫翊珈 Sun, Yi-Jia |
| 關鍵詞: | 大型語言模型 鍵值快取 上下文延展 AI 教學系統 非檢索生成 Large Language Models Key-Value Cache Context Extension AI Tutoring System Retrieval-Free Generation |
| 日期: | 2025 |
| 上傳時間: | 2025-09-01 16:30:03 (UTC+8) |
| 摘要: | 隨著大型語言模型(Large Language Models, LLMs)在自然語言處理領域的快速發展,其應用已逐漸擴展至教育場域。然而,現有 LLM 面臨上下文長度(context window)受限的挑戰,使其在處理長篇教材與多輪教學問答時難以維持語境連貫性與邏輯一致性。傳統解法如檢索增強生成(Retrieval-Augmented Generation, RAG)雖能引入外部知識,但也易產生檢索偏誤及語境斷裂,影響教學應用的效能。
本研究提出一種基於鍵值快取(Key-Value Cache, KV-Cache)的非檢索式上下文延展策略,並設計實作了一套以微積分教材為基礎的 AI 教師系統。系統透過分段預填充(chunked prefill)將教材內容逐步輸入模型,並快取中間計算結果,讓模型在後續教學問答中能延續語境、節省運算資源並提升語義一致性。實驗比較了 KV-Cache 系統、RAG 系統與無快取系統,評估其記憶體使用與回應延遲。
實驗結果顯示,所提出的 KV-Cache 機制在長文本教學場景下能有效提升語境連貫性,並顯著降低回應延遲,展現其於 AI 教學應用中的潛力。 With the advancement of Large Language Models (LLMs), their integration into educational applications has attracted increasing attention. However, LLMs are constrained by their fixed context window size, making it difficult to handle long instructional materials and maintain coherent multi-turn teaching dialogues. While Retrieval-Augmented Generation (RAG) alleviates some knowledge limitations by incorporating external retrieval, it often introduces retrieval bias and context fragmentation, reducing its effectiveness in educational scenarios.
This study proposes a retrieval-free context extension approach based on Key-Value Cache (KV-Cache) and implements a calculus-focused AI tutoring system. The system incrementally feeds LaTeX-based calculus textbooks into the model using a chunked prefill strategy, caching intermediate computations to enable consistent context retention and improved semantic coherence in subsequent teaching interactions. The experiments compare the proposed system with RAG-based and non-caching baselines, focusing on response latency and teaching continuity.
Experimental results demonstrate that the KV-Cache mechanism effectively enhances contextual coherence and significantly reduces response latency in long-text teaching scenarios, showing great potential for future AI-driven educational systems. |
| 參考文獻: | [1] Jie Hu, Shengnan Wang, Yutong He, Ping Gong, Jiawei Yi, Juncheng Zhang, Youhui Bai, Renhai Chen, Gong Zhang, Cheng Li, et al. Efficient long-context llm inference via kv cache clustering. arXiv preprint arXiv:2506.11418, 2025. [2] Neusha Javidnia, Bita Darvish Rouhani, and Farinaz Koushanfar. Key, value, compress: A systematic exploration of kv cache compression techniques. In 2025 IEEE Custom Integrated Circuits Conference (CICC), pages 1–3. IEEE, 2025. [3] Jushi Kai, Boyi Zeng, Yixuan Wang, Haoli Bai, Ziwei He, Bo Jiang, and Zhouhan Lin. Freqkv: Frequencydomainkey-valuecompressionforefficientcontextwindowextension. arXiv preprint arXiv:2505.00570, 2025. [4] Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, et al. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in neural information processing systems, 33:9459–9474, 2020. [5] Guangda Liu, Chengwei Li, Jieru Zhao, Chenqi Zhang, and Minyi Guo. Clusterkv: Manipulating llm kv cache in semantic space for recallable compression. arXiv preprint arXiv:2412.03213, 2024. [6] A. Palu and B. Smith. Kv-cache compression with low-rank projection. In International Conference on Learning Representations (ICLR), 2024. [7] Aurick Qiao, Zhewei Yao, Samyam Rajbhandari, and Yuxiong He. Swiftkv: Fast prefill optimized inference with knowledge-preserving model transformation. arXiv preprint arXiv:2410.03960, 2024. [8] Hanshi Sun, Li-Wen Chang, Wenlei Bao, Size Zheng, Ningxin Zheng, Xin Liu, Harry Dong, Yuejie Chi, and Beidi Chen. Shadowkv: Kv cache in shadows for high-throughput long-context llm inference. arXiv preprint arXiv:2410.21465, 2024. [9] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. Advances in neural information processing systems, 30, 2017. [10] Liang Wang, Nan Yang, Xiaolong Huang, Linjun Yang, Rangan Majumder, and Furu Wei. Multilingual e5 text embeddings: A technical report. arXiv preprint arXiv:2402.05672, 2024. [11] Jialong Wu, Zhenglin Wang, Linhai Zhang, Yilong Lai, Yulan He, and Deyu Zhou. Scope: Optimizing key-value cache compression in long-context generation. arXiv preprint arXiv:2412.13649, 2024. [12] Jingbo Yang, Bairu Hou, Wei Wei, Yujia Bao, and Shiyu Chang. Kvlink: Accelerating large language models via efficient kv cache reuse. arXiv preprint arXiv:2502.16002, 2025. |
| 描述: | 碩士 國立政治大學 應用數學系 111751001 |
| 資料來源: | http://thesis.lib.nccu.edu.tw/record/#G0111751001 |
| 資料類型: | thesis |
| 顯示於類別: | [應用數學系] 學位論文
|
文件中的檔案:
| 檔案 |
大小 | 格式 | 瀏覽次數 |
| 100101.pdf | 1103Kb | Adobe PDF | 0 | 檢視/開啟 |
|
在政大典藏中所有的資料項目都受到原著作權保護.
|