English  |  正體中文  |  简体中文  |  Post-Print筆數 : 11 |  Items with full text/Total items : 89671/119468 (75%)
Visitors : 23931749      Online Users : 494
RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version
    Please use this identifier to cite or link to this item: http://nccur.lib.nccu.edu.tw/handle/140.119/3862

    Title: 統計機器學習及其應用-病例分類與資料縮減研究-應用蛋白質資料庫檢測癌症(2/2)
    Other Titles: Disease Classification and Data Reduction--- Application to Cancer Detection Based on Proteomic
    Authors: 余清祥
    Keywords: 資料縮減;分類;病例診斷;模擬
    Data reduction;Classification;Diagnosis;Simulation
    Date: 2005
    Issue Date: 2007-04-18 16:36:53 (UTC+8)
    Publisher: 臺北市:國立政治大學統計學系
    Abstract: 在資料庫內容龐大紛雜的現代社會中,時效性往往是最重要的考量因素,以期在最短的時間內獲取近似、可接受的解答,為後續發展提供即時的建議。例如:醫師根據癌症病患的檢體報告,儘快判斷病患是否需要立即實施手術、化學治療,或甚至不需要任何治療、但須持續追蹤觀察。因為資料量的縮減通常代表較低的分析時間與成本,縮減資料自然成為講求時效及近似解答的最佳選擇之一,其中常見的方法包括直方圖(Histogram)、歧異值分解(Singular Value Decomposition)、索引樹(Index Tree)、抽樣、小波(Wavelet)等等。本計畫將使用攝護腺病人的蛋白質體資料庫(Proteomic data),其中病例個數約300人、變數個數卻接近5 萬個,以正確的病例分類為目標,比較幾種常見資料縮減方法的優劣。本計畫將預計分為三年進行:第一年使用人工篩選(錯誤較少、變數較少)過的蛋白質質譜儀數據,考慮以Support Vector Machine (SVM)、類神經網路、Classification and Regression Tree (CART)、羅吉士迴歸四種常見的分類方法,尋求在二元、分類標準下的最佳分類方法;第二年使用變數個數約5 萬個的原始資料,以二元分類為目標,配合之前較佳的分類方法,尋求可篩選出最多訊息的資料縮減方法;第三年則嘗試合併每位病人兩份檢體結果,以多元分 類為目標,獲得正確的病例診斷。
    It is often needed to get quick approximate answers from large databases (i.e., data reduction), since obtaining answers quickly is important and it is acceptable to sacrifice the accuracy of the answer for speed. The reduction process is important in the exploratory data analysis, particularly when interactive response times are critical. For example, doctors need to decide from the medical exam if cancer patients need surgeries, chemical therapies, or thorough physical exam. Popular data reduction methods include histogram, singular value decomposition (SVD), index tree, sampling, and wavelet. We will use data from prostate cancer patients (Proteomic data), which include records of about 300 patients and almost 50,000 variables. Our goal is to include the data reduction methods to minimize the classification error. The project will be divided into three years. The focus of the first year is to explore the performance of frequently used classification methods, such as support vector machine (SVM), neural network, classification and regression tree, and logistic regression. We shall use the pre-processed data with only 779 variables and possible errors corrected manually, and the goal of the first year is binary classification. Data reduction methods will be considered in the second year and the raw data (about 48,000 variables and errors not corrected) will be used as well. The focus will be on the diagnosis of patients and we shall consider methods of combining samples from the same patient.
    Description: 核定金額:323000元
    Data Type: report
    Appears in Collections:[統計學系] 國科會研究計畫

    Files in This Item:

    File Description SizeFormat
    942118M004001.pdf462KbAdobe PDF1205View/Open

    All items in 政大典藏 are protected by copyright, with all rights reserved.

    社群 sharing

    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - Feedback