English  |  正體中文  |  简体中文  |  Post-Print筆數 : 27 |  Items with full text/Total items : 111321/142230 (78%)
Visitors : 48416511      Online Users : 1050
RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version
    政大機構典藏 > 商學院 > 資訊管理學系 > 學位論文 >  Item 140.119/111897
    Please use this identifier to cite or link to this item: https://nccur.lib.nccu.edu.tw/handle/140.119/111897


    Title: 基於大數據資料的非監督分散式分群演算法
    An Effective Distributed GHSOM Algorithm for Unsupervised Clustering on Big Data
    Authors: 邱垂暉
    Chiu, Chui Hui
    Contributors: 郁方
    Yu, Fang
    邱垂暉
    Chiu, Chui Hui
    Keywords: 非監督式分群
    GHSOM
    Actor Model
    惡意程式偵測
    平行運算
    Unsupervised clustering
    GHSOM
    Actor model
    Malware detection
    Parallel computation
    Date: 2017
    Issue Date: 2017-08-10 11:13:04 (UTC+8)
    Abstract: 基於屬性相似度將樣本進行分群的技術已經被廣泛應用在許多領域,如模式識別,特徵提取和惡意行為偵測。由於此技術的重要性,很多人已經將各種分群技術利用分散式框架進行再製,例如K-means搭配Hadoop在Apache Mahout平台上。由於K-means需要預先定義分群數量,而自組織映射圖(SOM)需要預先定義圖的大小,所以能夠自動將樣本依照樣本間的變化容差進行分群的GHSOM(增長層次自組織映射圖)就提供了一個很棒的非監督學習方法用來針對某些資訊不完整的資料。然而,GHSOM目前並不是一個分散式的演算法,這就限制了其在大數據資料的應用上。在本篇論文中,我們提出了一種新的分散式GHSOM演算法。我們使用Scala的Actor Model來實現GHSOM的分散式系統,我們將GHSOM演算法中的水平擴增以及垂直擴增交由Actor來處理並顯示出顯著的性能提升。為了評估我們所提出的方法,我們收集並分析了數千個惡意程式在現實生活中的執行行為,並通過在數百萬個樣本上進行非監督分群後推導出惡意程式行為的檢測規則來顯示其性能的改進、規則有效性以及實踐中的潛在用法。
    Clustering techniques that group samples based on their attribute similarity have been widely used in many fields such as pattern recognition, feature extraction and malicious behavior characterization. Due to its importance, various clustering techniques have been developed with distributed frameworks such as K-means with Hadoop in Apache Mahout for scalable computation. While K-means requires the number of clusters and self organizing maps (SOM) requires the map size to be given, the technique of GHSOM (growing hierarchical self organizing maps) that clusters samples dynamically to satisfy the requirement on tolerance of variation between samples, poses an attractive unsupervised learning solution for data that have limited information to decide the number of clusters in advance. However it is not scalable with sequential computation, which limits its applications on big data. In this paper, we present a novel distributed algorithm on GHSOM. We take advantage of parallel computation with scala actor model for GHSOM construction, distributing vertical and horizontal expansion tasks to actors and showing significant performance improvement. To evaluate the presented approach, we collect and analyze execution behaviors of thousands of malware in real life and derive detection rules with the presented unsupervised clustering on millions samples, showing its performance improvement, rule effectiveness and potential usage in practice.
    Reference: [1] "Kvm," http://www.linux-kvm.org/page/Main Page/, (Visited on 7/15/2016).
    [2] S.-W. Lee and F. Yu, "Securing kvm-based cloud systems via virtualization intro-
    spection," in System Sciences (HICSS), 2014 47th Hawaii International Conference
    on. IEEE, 2014, pp. 5028-5037.
    [3] T. Kohonen, "The self-organizing map," Neurocomputing, vol. 21, no. 1, pp. 1-6,
    1998.
    [4] J. Vesanto, "Som-based data visualization methods," Intelligent data analysis, vol. 3,
    no. 2, pp. 111-126, 1999.
    [5] P. Tamayo, D. Slonim, J. Mesirov, Q. Zhu, S. Kitareewan, E. Dmitrovsky, E. S. Lan-
    der, and T. R. Golub, "Interpreting patterns of gene expression with self-organizing
    maps: methods and application to hematopoietic differentiation," Proceedings of the
    National Academy of Sciences, vol. 96, no. 6, pp. 2907-2912, 1999.
    [6] E. Alhoniemi, J. Hollmen, O. Simula, and J. Vesanto, "Process monitoring and mod-
    eling using the self-organizing map," Integrated Computer-Aided Engineering, vol. 6,
    no. 1, pp. 3-14, 1999.
    [7] A. M. Kalteh, P. Hjorth, and R. Berndtsson, "Review of the self-organizing map
    (som) approach in water resources: Analysis, modelling and application," Environ-
    mental Modelling & Software, vol. 23, no. 7, pp. 835-845, 2008.
    [8] E. J. Palomo, J. North, D. Elizondo, R. M. Luque, and T. Watson, "Application of
    growing hierarchical som for visualisation of network forensics traffic data," Neural
    Networks, vol. 32, pp. 275-284, 2012.
    [9] S.-Y. Huang and Y.-N. Huang, "Network traffic anomaly detection based on growing
    hierarchical som," in Dependable Systems and Networks (DSN), 2013 43rd Annual
    IEEE/IFIP International Conference on. IEEE, 2013, pp. 1-2.
    [10] Y.-H. Li, Y.-R. Tzeng, and F. Yu, "Viso: Characterizing malicious behaviors of
    virtual machines with unsupervised clustering," in Cloud Computing Technology and
    Science (CloudCom), 2015 IEEE 7th International Conference on. IEEE, 2015, pp.
    34-41.
    [11] R. M. Esteves, R. Pais, and C. Rong, "K-means clustering in the cloud-a mahout
    test," in Advanced Information Networking and Applications (WAINA), 2011 IEEE
    Workshops of International Conference on. IEEE, 2011, pp. 514-519.
    [12] "Apache mahout," http://mahout.apache.org/.
    [13] T. Kanungo, D. M. Mount, N. S. Netanyahu, C. D. Piatko, R. Silverman, and
    A. Y. Wu, "An effcient k-means clustering algorithm: Analysis and implementa-
    tion," IEEE transactions on pattern analysis and machine intelligence, vol. 24, no. 7,
    pp. 881-892, 2002.
    [14] A. McAfee, E. Brynjolfsson, T. H. Davenport, D. Patil, and D. Barton, "Big data,"
    The management revolution. Harvard Bus Rev, vol. 90, no. 10, pp. 61-67, 2012.
    [15] A. Fahad, N. Alshatri, Z. Tari, A. Alamri, I. Khalil, A. Y. Zomaya, S. Foufou, and
    A. Bouras, "A survey of clustering algorithms for big data: Taxonomy and empirical analysis," IEEE transactions on emerging topics in computing, vol. 2, no. 3, pp.
    267-279, 2014.
    [16] T. K. Moon, "The expectation-maximization algorithm," IEEE Signal processing
    magazine, vol. 13, no. 6, pp. 47-60, 1996.
    [17] "Bloom filter," http://en.wikipedia.org/wiki/Bloom_filter/, (Visited on 10/15/2016).
    [18] K. Leung and C. Leckie, "Unsupervised anomaly detection in network intrusion de-
    tection using clusters," in Proceedings of the Twenty-eighth Australasian conference
    on Computer Science-Volume 38. Australian Computer Society, Inc., 2005, pp.
    333-342.
    [19] I. Burguera, U. Zurutuza, and S. Nadjm-Tehrani, "Crowdroid: behavior-based mal-
    ware detection system for android," in Proceedings of the 1st ACM workshop on
    Security and privacy in smartphones and mobile devices. ACM, 2011, pp. 15-26.
    [20] C. Hewitt, "Actor model of computation: scalable robust information systems,"
    arXiv preprint arXiv:1008.1459, 2010.
    [21] "Akka," http://akka.io/, (Visited on 10/15/2016).
    [22] "Cuckoo sandbox," http://cuckoosandbox.org/, (Visited on 7/15/2016).
    [23] "Malware knowledge base," http://owl.nchc.org.tw/, (Visited on 6/20/2016).
    [24] S.-W. Hsiao, Y.-N. Chen, Y. S. Sun, and M. C. Chen, "Combining dynamic pas-
    sive analysis and active fingerprinting for effective bot malware detection in virtu-
    alized environments," in International Conference on Network and System Security.
    Springer, 2013, pp. 699-706.
    [25] "Virustotal," https://www.virustotal.com, (Visited on 4/15/2017).
    [26] M. Dittenbach, D. Merkl, and A. Rauber, "The growing hierarchical self-organizing
    map," in Neural Networks, 2000. IJCNN 2000, Proceedings of the IEEE-INNS-ENNS
    International Joint Conference on, vol. 6. IEEE, 2000, pp. 15-19.
    [27] J. A. Hartigan and M. A.Wong, "Algorithm as 136: A k-means clustering algorithm,"
    Journal of the Royal Statistical Society. Series C (Applied Statistics), vol. 28, no. 1,
    pp. 100-108, 1979.
    [28] A. Broder and M. Mitzenmacher, "Network applications of bloom filters: A survey,"
    Internet mathematics, vol. 1, no. 4, pp. 485-509, 2004.
    Description: 碩士
    國立政治大學
    資訊管理學系
    104356019
    Source URI: http://thesis.lib.nccu.edu.tw/record/#G0104356019
    Data Type: thesis
    Appears in Collections:[資訊管理學系] 學位論文

    Files in This Item:

    File Description SizeFormat
    601901.pdf1552KbAdobe PDF272View/Open


    All items in 政大典藏 are protected by copyright, with all rights reserved.


    社群 sharing

    著作權政策宣告 Copyright Announcement
    1.本網站之數位內容為國立政治大學所收錄之機構典藏,無償提供學術研究與公眾教育等公益性使用,惟仍請適度,合理使用本網站之內容,以尊重著作權人之權益。商業上之利用,則請先取得著作權人之授權。
    The digital content of this website is part of National Chengchi University Institutional Repository. It provides free access to academic research and public education for non-commercial use. Please utilize it in a proper and reasonable manner and respect the rights of copyright owners. For commercial use, please obtain authorization from the copyright owner in advance.

    2.本網站之製作,已盡力防止侵害著作權人之權益,如仍發現本網站之數位內容有侵害著作權人權益情事者,請權利人通知本網站維護人員(nccur@nccu.edu.tw),維護人員將立即採取移除該數位著作等補救措施。
    NCCU Institutional Repository is made to protect the interests of copyright owners. If you believe that any material on the website infringes copyright, please contact our staff(nccur@nccu.edu.tw). We will remove the work from the repository and investigate your claim.
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - Feedback