政大機構典藏-National Chengchi University Institutional Repository(NCCUR):Item 140.119/56330

English | 正體中文 | 简体中文 | Post-Print筆數 : 27 | Items with full text/Total items : 118539/149589 (79%)
Visitors : 79230743 Online Users : 1719

RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.

Scope

please add "double quotation mark" for query phrases to get precise results

please goto advance search for comprehansive author search

Adv. Search

Home ‧ Login ‧ Upload ‧ Help ‧ About ‧ Administer

Goto mobile version

政大機構典藏 > 資訊學院 > 資訊科學系 > 學位論文 > Item 140.119/56330

Please use this identifier to cite or link to this item: https://nccur.lib.nccu.edu.tw/handle/140.119/56330

Title:	在高度分散式環境下進行Top-k相似文件檢索 Similar Top-k documents retrieval in highly distributed environments
Authors:	王俊閎 Wang, Chun Hung
Contributors:	陳良弼 Chen, Arbee L.P. 王俊閎 Wang, Chun Hung
Keywords:	分散式環境 Tok-k 相似文件檢索端對端網路 distributed environments similar top-k documents retrieval peer-to-peer network
Date:	2012
Issue Date:	2012-12-03 11:27:23 (UTC+8)
Abstract:	在文件資料庫的查詢處理上，Top-k相似文件查詢主要是協助使用者可以從龐大的文件集合中，檢索出和查詢文件具有高度相關性的文件集合。將資料庫內的文件依據和查詢文件之相似度程度，選擇出相似度最高的前k篇文件回傳給使用者。然而過去集中式資料庫，因其覆蓋性和可擴充性的不足，使得這種排名傾向的文件查詢處理，需耗費大量時間及運算成本。近年來，使用端對端(Peer-to-peer, P2P)架構解決相關的文件檢索問題已成為一種趨勢，但在高度分散式環境下，支援排名傾向的相似文件查詢是困難的，因為缺乏全域資訊和適當的系統協調者。在本研究中，我們先針對各節點資料庫作分群前處理，並提出一個利用區域切割的作法[1]，將P2P環境劃分成數個子區塊後，建立特徵索引表。因此在查詢處理時，可透過索引表加快挑選出Top-k相似群集的速度，並且確保有適當數量的回傳結果。最後在實驗中，我們提出的方法會與傳統集中式搜尋引擎以及SON-based [1] 做比較，在高度分散式環境下，我們的方法在執行Top-k相似文件查詢時，會比上述兩種作法有較為優異的表現。 On query processing in a large database, similar top-k documents query is an important mechanism to retrieve the highly correlated document collection with query for users. It ranks documents with a similarity ranking function and reports the k documents with highest similarity. However, the former approach in web searching, i.e., centralized search engines, rises some issues such as lack of coverage and scalability, impact provides rank-based query become a costly operation. Recently, using Peer-to-peer (P2P) architectures to tackle above issues has emerged as a trend of solution, but due to the shortage of global knowledge and some appropriate central coordinators, support rank-based query in highly distributed environment has been difficulty. In this paper, we proposed a framework to solve these problems. First, we performed the local cluster pre-processing on each peer, followed by the zone creation process, forming sub-zones over P2P network, and then constructing the feature index table to improve the performance of selecting similar top-k cluster results. The experiments show that our approach performs similar top-k documents query outperforms than SON-based approach in highly distributed environment.
Reference:	[1] Christos Doulkeridis, Kjetil Nørvåg, Michalis Vazirgiannis. 2008. Peer-to-peer similarity search over widely distributed document collections. LSDS-IR 35-42. [2] Stoica, I., Morris, R., Karger, D., Kaashoek, M.F., Balakrishnan, H. 2001. Chord : A scalable peer-to-peer lookup service for internet applications. In Proceedings of the ACM SIGCOMM 149-160. [3] Ratnasamy, S., Francis, P., Handley, M., Karp, R., Schenker, S. 2001. A scalable contentaddressable network. In Proceedings of the ACM SIGCOMM 161-172. [4] Rowstron, A., Druschel, P. 2001. Pastry : Scalable, distributed object location and routing for large-scale peer-to-peer systems. In Proceedings of the Middleware [5] Chunqiang Tang, Zhichen Xu, Sandhya Dwarkadas. 2003. Peer-to-peer information retrieval using self-organizing semantic overlay networks. In Proceedings of the ACM SIGCOMM 175-186. [6] BitTorrent. <http://bittorrent.com/>. [7] eMula. <http://www.emula-project.net/>. [8] Beverly Yang, Hector Garcia-Molina. 2003. Designing a Super-Peer Network. ICDE 49-60. [9] The Gnutella protocol specification v0.6. <http://rfcgnutella.sourceforge.net>. [10] KaZaA. <http://www.kazaa.com>. [11] Salton, G., Wong, A., Yang, C.S. 1975. A vector space model for automatic indexing. Communications of the ACM Volume 18 Issue 11 613-620. [12] Bernard J. Jansen, Soumen Chakrabarti. 2006. Mining the Web : Discovering Knowledge from Hypertext Data. Morgan-Kaufmann Publishers, 352 pp., ISBN: 1-55860-754-4. Inf. Process. Manage. (IPM) 42(1) 317-318. [13] Christos. Doulkeridis, Kjetil Nørvåg, and Michalis Vazirgiannis. 2007. DESENT: Decentralized and distributed semantic overlay generation in P2P networks. IEEE Journal on Selected Areas in Communications (J-SAC) 25(1) 25–34. [14] Hersh, W.R., Buckley, C., J.Leone, T., Hickam, D.H. 1994. Ohsumed: An interactive retrieval evaluation and new large test collection for research. In Proceedings of the ACM SIGIR. 192–201 [15] GT-ITM : Georgia Tech Internetwork Topology Models. <http://www.cc.gatech.edu/projects/gtitm/>. [16] Wolf-Tilo Balke, Wolfgang Nejdl, Wolf Siberski, Uwe Thaden. 2005. Progressive Distributed Top k Retrieval in Peer-to-Peer Networks. ICDE 174-185. [17] Wolf-Tilo Balke. 2005. Supporting Information Retrieval in Peer-to-Peer Systems. Peer-to-Peer Systems and Applications 337-352. [18] C. Gkantsidis, M. Mihail, and A. Saberi. 2005. Hybrid search schemes for unstructured peer-to-peer networks. In Proceedings of INFOCOM. [19] Inderjit S. Dhillon, Dharmendra S. Modha. 2001. Concept Decompositions for Large Sparse Text Data Using Clustering. Machine Learning 42(1/2): 143-175. [20] Akrivi Vlachou, Christos Doulkeridis, Kjetil Nørvåg, Michalis Vazirgiannis. 2008. On efficient top-k query processing in highly distributed environments. SIGMOD 753-764. [21] Shiwei Zhu, Junjie Wu, Hui Xiong, Guoping Xia. 2011. Scaling up top-K cosine similarity search. Data Knowl. Eng. (DKE) 70(1) 60-83. [22] Aoying Zhou, Rong Zhang, Weining Qian, Quang Hieu Vu, Tianming Hu. 2008. Adaptive indexing for content-based search in P2P systems. Data Knowl. Eng. (DKE) 67(3) 381-398.
Description:	碩士國立政治大學資訊科學學系 99753034 101
Source URI:	http://thesis.lib.nccu.edu.tw/record/#G0099753034
Data Type:	thesis
Appears in Collections:	[資訊科學系] 學位論文

Files in This Item:

File	Size	Format
303401.pdf	1496Kb	Adobe PDF2	869	View/Open

All items in 政大典藏 are protected by copyright, with all rights reserved.

社群 sharing

著作權政策宣告 Copyright Announcement

1.本網站之數位內容為國立政治大學所收錄之機構典藏，無償提供學術研究與公眾教育等公益性使用，惟仍請適度，合理使用本網站之內容，以尊重著作權人之權益。商業上之利用，則請先取得著作權人之授權。
The digital content of this website is part of National Chengchi University Institutional Repository. It provides free access to academic research and public education for non-commercial use. Please utilize it in a proper and reasonable manner and respect the rights of copyright owners. For commercial use, please obtain authorization from the copyright owner in advance.

2.本網站之製作，已盡力防止侵害著作權人之權益，如仍發現本網站之數位內容有侵害著作權人權益情事者，請權利人通知本網站維護人員(nccur@nccu.edu.tw)，維護人員將立即採取移除該數位著作等補救措施。
NCCU Institutional Repository is made to protect the interests of copyright owners. If you believe that any material on the website infringes copyright, please contact our staff(nccur@nccu.edu.tw). We will remove the work from the repository and investigate your claim.

DSpace Software Copyright © 2002-2004 MIT & Hewlett-Packard / Enhanced by NTU Library IR team Copyright © - Feedback