English  |  正體中文  |  简体中文  |  Post-Print筆數 : 27 |  Items with full text/Total items : 109948/140897 (78%)
Visitors : 46081650      Online Users : 1146
RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version
    政大機構典藏 > 商學院 > 統計學系 > 學位論文 >  Item 140.119/110782
    Please use this identifier to cite or link to this item: https://nccur.lib.nccu.edu.tw/handle/140.119/110782

    Title: 基於資訊理論熵之特徵選取
    Entropy based feature selection
    Authors: 許立農
    Contributors: 周珮婷
    Keywords: 機器學習
    Machine learning
    Feature selection
    Dimension reduction
    Date: 2017
    Issue Date: 2017-07-11 11:25:43 (UTC+8)
    Abstract: 特徵選取為機器學習常見的資料前處理的方法,現今已有許多不同的特徵選取演算法,然而並不存在一個在所有資料上都優於其他方法的演算法,且由於現今的資料種類繁多,所以研發新的方法能夠帶來更多有關資料的資訊並且根據資料的特性採用不同的變數選取演算法是較好的做法。
    Feature selection is a common preprocessing technique in machine learning. Although a large pool of feature selection techniques has existed, there is no such a dominant method in all datasets. Because of the complexity of various data formats, establishing a new method can bring more insight into data, and applying proper techniques to analyzing data would be the best choice.
    In this study, we used the concept of entropy from information theory to build a similarity matrix between features. Additionally, we constructed a DCG-tree to separate variables into clusters. Each core cluster consists of rather uniform variables, which share similar covariate information. With the core clusters, we reduced the dimension of a high-dimensional dataset. We assessed our method by comparing it with FCBF, Lasso, F-score, random forest and genetic algorithm. The performances of prediction were demonstrated through real-world datasets using hierarchical clustering with voting algorithm as the classifier. The results showed that our entropy method has more stable prediction performances and reduces sufficient dimensions of the datasets simultaneously.
    Reference: Akay, M. F. (2009). Support vector machines combined with feature selection for breast cancer diagnosis. Expert systems with applications, 36(2), 3240-3247.
    Chen, Y.-W., & Lin, C.-J. (2006). Combining SVMs with various feature selection strategies Feature extraction (pp. 315-324): Springer.
    Díaz-Uriarte, R., & Alvarez de Andrés, S. (2006). Gene selection and classification of microarray data using random forest. BMC Bioinformatics, 7(1), 3. doi:10.1186/1471-2105-7-3
    Fushing, H., & McAssey, M. P. (2010). Time, temperature, and data cloud geometry. Physical Review E, 82(6), 061110.
    Fushing, H., Wang, H., VanderWaal, K., McCowan, B., & Koehl, P. (2013). Multi-scale clustering by building a robust and self correcting ultrametric topology on data points. PloS one, 8(2), e56259.
    Golub, M. S., Hogrefe, C. E., Widaman, K. F., & Capitanio, J. P. (2009). Iron deficiency anemia and affective response in rhesus monkey infants. Developmental psychobiology, 51(1), 47-59.
    Lee, O. (2017). Data-driven computation for pattern information. ProQuest, UMI
    Dissertations Publishing.

    Raymer, M. L., Punch, W. F., Goodman, E. D., Kuhn, L. A., & Jain, A. K. (2000). Dimensionality reduction using genetic algorithms. IEEE Transactions on Evolutionary Computation, 4(2), 164-171. doi:10.1109/4235.850656
    Saeys, Y., Abeel, T., & Van de Peer, Y. (2008). Robust Feature Selection Using Ensemble Feature Selection Techniques. In W. Daelemans, B. Goethals, & K. Morik (Eds.), Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2008, Antwerp, Belgium, September 15-19, 2008, Proceedings, Part II (pp. 313-325). Berlin, Heidelberg: Springer Berlin Heidelberg.
    Saeys, Y., Inza, I., & Larrañaga, P. (2007). A review of feature selection techniques in bioinformatics. bioinformatics, 23(19), 2507-2517.
    Svetnik, V., Liaw, A., Tong, C., & Wang, T. (2004). Application of Breiman’s Random Forest to Modeling Structure-Activity Relationships of Pharmaceutical Molecules. In F. Roli, J. Kittler, & T. Windeatt (Eds.), Multiple Classifier Systems: 5th International Workshop, MCS 2004, Cagliari, Italy, June 9-11, 2004. Proceedings (pp. 334-343). Berlin, Heidelberg: Springer Berlin Heidelberg.
    Yu, L., & Liu, H. (2003). Feature selection for high-dimensional data: A fast correlation-based filter solution. Paper presented at the ICML.
    Description: 碩士
    Source URI: http://thesis.lib.nccu.edu.tw/record/#G0104354013
    Data Type: thesis
    Appears in Collections:[統計學系] 學位論文

    Files in This Item:

    File SizeFormat
    401301.pdf6354KbAdobe PDF2217View/Open

    All items in 政大典藏 are protected by copyright, with all rights reserved.

    社群 sharing

    著作權政策宣告 Copyright Announcement
    The digital content of this website is part of National Chengchi University Institutional Repository. It provides free access to academic research and public education for non-commercial use. Please utilize it in a proper and reasonable manner and respect the rights of copyright owners. For commercial use, please obtain authorization from the copyright owner in advance.

    NCCU Institutional Repository is made to protect the interests of copyright owners. If you believe that any material on the website infringes copyright, please contact our staff(nccur@nccu.edu.tw). We will remove the work from the repository and investigate your claim.
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - Feedback