政大機構典藏-National Chengchi University Institutional Repository(NCCUR):Item 140.119/153375

English | 正體中文 | 简体中文 | Post-Print筆數 : 27 | 全文筆數/總筆數 : 118524/149574 (79%)
造訪人次 : 78961402 線上人數 : 3453

RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.

搜尋範圍

查詢小技巧：

您可在西文檢索詞彙前後加上"雙引號"，以獲取較精準的檢索結果

若欲以作者姓名搜尋，建議至進階搜尋限定作者欄位，可獲得較完整資料

進階搜尋

主頁 ‧ 登入 ‧ 上傳 ‧ 說明 ‧ 關於政大典藏 ‧ 管理

到手機版

政大機構典藏 > 資訊學院 > 資訊科學系 > 學位論文 > Item 140.119/153375

請使用永久網址來引用或連結此文件: https://nccur.lib.nccu.edu.tw/handle/140.119/153375

題名:	NoSQL 資料庫子集查詢的學習索引 Learned Index for Subset Query of NoSQL Databases
作者:	許軒祥 Hsu, Hsuan-Hsiang
貢獻者:	沈錳坤 Shan, Man-Kwan 許軒祥 Hsu, Hsuan-Hsiang
關鍵詞:	學習索引 NoSQL資料庫子集查詢 Learned Index NoSQL Database Subset Query
日期:	2024
上傳時間:	2024-09-04 14:59:08 (UTC+8)
摘要:	NoSQL資料庫處理半結構化或非結構化資料，子集查詢是NoSQL資料庫中常見的查詢。近年來，運用機器學習的學習索引技術為資料庫的索引技術開闢了新途徑。與傳統的B-Tree相比，學習索引在查詢時間上具有顯著優勢。傳統索引的查詢時間主要是記憶體擷取時間，而學習索引的查詢時間主要是CPU運算時間。現有學習索引的研究主要針對傳統關聯式資料庫的查詢。針對子集查詢，僅有近期基於Deep Sets的DGM。DGM主要在記憶體空間效率方面節省空間，但在查詢速度上仍有提升的空間。本研究提出了兩種創新的學習索引技術：LI4Subset-D和LI4Subset-P以提升NoSQL資料庫子集查詢的效能。LI4Subset-D與LI4Subset-P分別運用DeepSets與學習索引的PGM-index。實驗結果顯示LI4Subset-D在查詢速度上比DGM提升近149倍，記憶體空間僅增加約 7倍。LI4Subset-P在查詢速度比DGM快約3235倍，而記憶體空間約增加4倍。 NoSQL databases target at semi-structured or unstructured data, and subset queries are common in NoSQL databases. In recent years, learned index techniques based on machine learning have opened new avenues for database indexing. Compared to traditional B-Trees, learned indexes offer significant advantages in query time. Traditional indexes is memory intensive while learned index is CPU intensive. Existing research on learned indexes mainly focuses on traditional relational databases queries. For subset queries, the only recent development is the DGM approach based on Deep Sets. DGM is designed for space efficiency but still has room for improvement in time efficiency. This thesis proposes two novel learned index techniques, LI4Subset-D and LI4Subset-P, to enhance the performance of subset queries in NoSQL databases. LI4Subset-D and LI4Subset-P leverage Deep Sets and the PGM-index of learning indexes, respectively. Experimental results show that LI4Subset-D improves query speed by nearly 149 times compared to DGM, with the expense of 7 times increase in memory space. LI4Subset-P is approximately 3235 times faster than DGM in query speed, with the expense of 4 times increase in memory space.
參考文獻:	[1] T. Kraska, A. Beutel, E. H. Chi, J. Dean, and N. Polyzotis, The Case for Learned Index Structures, in Proceedings of the ACM 2018 International Conference on Management of Data (SIGMOD), pp. 489-504, 2018. [2] A. Davitkova, D. Gjurovski, and S. Michel, Learning over Sets for Databases, in Proceedings of the 27th International Conference on Extending Database Technology (EDBT), pp. 68-80, 2024. [3] M. Zaheer, S. Kottur, S. Ravanbakhsh, B. Poczos, R. R. Salakhutdinov, and A. J. Smola, Deep Sets, in Proceedings of Advances in Neural Information Processing Systems (NIPS), vol. 30, 2017. [4] P. Ferragina and G. Vinciguerra, The PGM-index: A Fully-Dynamic Compressed Learned Index with Provable Worst-Case Bounds, in Proceedings of the VLDB Endowment, vol. 13, no. 8, pp. 1162-1175, 2020. [5] U. Deppisch, S-tree: A Dynamic Balanced Signature Index for Office Retrieval, in Proceedings of the 9th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 77-87, 1986. [6] M. Morzy, T. Morzy, A. Nanopoulos, and Y. Manolopoulos, Hierarchical Bitmap Index: An Efficient and Scalable Indexing Technique for Set-Valued Attributes, in Proceedings of 7th East European Conference on Advances in Databases and Information Systems:: Springer, pp. 236-252, 2003. [7] S. Helmer, R. Aly, T. Neumann, and G. Moerkotte, Indexing set-valued attributes with a multi-level extendible hashing scheme, in Proceedings of 18th International Conference on Database and Expert Systems Applications:: Springer, pp. 98-108, 2007. [8] S. Bevc and I. Savnik, Using Tries for Subset and Superset Queries, in Proceedings of the ITI 2009 31st International Conference on Information Technology Interfaces: IEEE, pp. 147-152, 2009. [9] I. Savnik, Efficient Subset and Superset Queries, in DB&Local Proceedings: Citeseer, pp. 45-57, 2012. [10] I. Savnik, Index Data Structure for Fast Subset and Superset Queries, in Proceedings of International Conference on Availability, Reliability, and Security: Springer, pp. 134-148, 2013. [11] A. Galakatos, M. Markovitch, C. Binnig, R. Fonseca, and T. Kraska, Fiting-tree: A Data-Aware Index Structure, in Proceedings of the 2019 ACM International Conference on Management of Data (SIGMOD), pp. 1189-1206, 2019. [12] J. Rao and K. A. Ross, Cache Conscious Indexing for Decision-Support in Main Memory, in Proceedings of the 25th VLDB Conference, 1999. [13] A. Kipf et al., RadixSpline: A Single-Pass Learned Index, in Proceedings of the 3rd International Workshop on Exploiting Artificial Intelligence Techniques for Data Management, pp. 1-5, 2020. [14] R. Marcus et al., Benchmarking Learned Indexes, Proceedings of the VLDB Endowment, Volume 14, Issue 1, 2020.
描述:	碩士國立政治大學資訊科學系 111753122
資料來源:	http://thesis.lib.nccu.edu.tw/record/#G0111753122
資料類型:	thesis
顯示於類別:	[資訊科學系] 學位論文

文件中的檔案:

檔案	大小	格式	瀏覽次數
312201.pdf	1232Kb	Adobe PDF	0	檢視/開啟

在政大典藏中所有的資料項目都受到原著作權保護.

社群 sharing

著作權政策宣告 Copyright Announcement

1.本網站之數位內容為國立政治大學所收錄之機構典藏，無償提供學術研究與公眾教育等公益性使用，惟仍請適度，合理使用本網站之內容，以尊重著作權人之權益。商業上之利用，則請先取得著作權人之授權。
The digital content of this website is part of National Chengchi University Institutional Repository. It provides free access to academic research and public education for non-commercial use. Please utilize it in a proper and reasonable manner and respect the rights of copyright owners. For commercial use, please obtain authorization from the copyright owner in advance.

2.本網站之製作，已盡力防止侵害著作權人之權益，如仍發現本網站之數位內容有侵害著作權人權益情事者，請權利人通知本網站維護人員(nccur@nccu.edu.tw)，維護人員將立即採取移除該數位著作等補救措施。
NCCU Institutional Repository is made to protect the interests of copyright owners. If you believe that any material on the website infringes copyright, please contact our staff(nccur@nccu.edu.tw). We will remove the work from the repository and investigate your claim.

DSpace Software Copyright © 2002-2004 MIT & Hewlett-Packard / Enhanced by NTU Library IR team Copyright © - 回饋