English  |  正體中文  |  简体中文  |  Post-Print筆數 : 27 |  Items with full text/Total items : 94986/125531 (76%)
Visitors : 31081118      Online Users : 408
RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version
    政大機構典藏 > 商學院 > 統計學系 > 期刊論文 >  Item 140.119/66311
    Please use this identifier to cite or link to this item: http://nccur.lib.nccu.edu.tw/handle/140.119/66311

    Title: 高維度資料特徵選取之探討–應用於分類蛋白質質譜儀資料
    Other Titles: On Feature Selection of High Dimensional Data - Application on Classifying Proteomic Spectra Data
    Authors: 郭訓志;黃仁澤;薛慧敏
    Kuo, Hsun-Chih;Hunag, Jen-Tse;Hsueh,Huey-Miin
    Contributors: 統計系
    Keywords: 特徵選取,蛋白質質譜儀資料,支援向量機,交叉驗證
    Date: 2011.06
    Issue Date: 2014-05-27 15:13:43 (UTC+8)
    Abstract: 一般健檢的腫瘤指標的靈敏度和特異性皆不高,也無法偵測較小的腫瘤,因此通常無法及早診斷出腫瘤。本研究的資料為應用蛋白質晶片與表面強化雷射解吸電離飛行質譜技術(SELDI)的血清蛋白質質譜資料,血清樣本來自健康的正常人以及三組不同時期的攝護腺癌症病人。研究目的在選取有助於區分不同時期攝護腺癌症的蛋白質特徵,利用重複隨機抽樣的交叉驗證和支援向量機(Support Vector Machine),先以t 檢定的平均p值、Kruskal-Wallis 檢定的平均p值、或平均分錯率對於所有蛋白質特徵進行排序,再利用向前選取方式找出最小分錯率模型之特徵變數。為了精簡模型,本研究同時考慮佐以相關係數與判定係數萃取後的特徵變數之分類結果。在各個方法比較上,使用Kruskal-Wallis檢定之最小p值特徵選取法的分類效果較好,而輔助的萃取方法以最大相關係數萃取法最能有效縮減特徵個數,同時又保持分類效果。
    Often the time the tumor marker of regular health evaluation is low in sensitivity and specificity so that it could not detect tumor of small size in time. This research aims to develop a classification tool for early diagnosis of tumor by studying proteomic mass spectra of prostate cancer data at different stages. The prostate cancer data studied are the Surface-Enhanced Laser Desorption/Ionization Time-of-Flight Mass Spectrometry (SELDI-TOF-MS) generated from 327 serum samples. Of the 327 serum samples, 81 are from unaffected healthy men (HM), 78 are from patients diagnosed with benign prostatic hyperplasia (BPH), 84 are from patients with organ-confined PCA (T1/T2), and 84 are from patients with non-organ-confined PCA (T3/T4). The goal of this research is to select features (peaks) of the mass spectra that are useful for classifying different stages of prostate cancer via repeated random subsampling cross-validation. The forward minimum-p_value method (derived from t test or Kruskal-Wallis test) and the forward minimum-classification-error method incorporated with SVM are proposed in this study. In addition, maximum-correlation method and maximum-R2 method are considered for further feature selection. In comparison, the forward minimum-p_value method derived from Kruskal-Wallis test often outperforms other methods in terms of classification rate. Moreover, the maximum-correlation method not only can reduce the number of features effectively but also can preserve the classification rate at the same time.
    Relation: Journal of Data Analysis, 6(3), 67-80
    Data Type: article
    Appears in Collections:[統計學系] 期刊論文

    Files in This Item:

    File Description SizeFormat
    72-83.pdf1081KbAdobe PDF597View/Open

    All items in 政大典藏 are protected by copyright, with all rights reserved.

    社群 sharing

    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - Feedback