政大機構典藏-National Chengchi University Institutional Repository(NCCUR):Item 140.119/53403
English  |  正體中文  |  简体中文  |  Post-Print筆數 : 27 |  Items with full text/Total items : 110119/141062 (78%)
Visitors : 46484432      Online Users : 123
RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version
    Please use this identifier to cite or link to this item: https://nccur.lib.nccu.edu.tw/handle/140.119/53403


    Title: 運用隨機森林分類方法在檢定基因集的顯著性
    Other Titles: Using Random Forest in Testing the Significance of a Gene-Set
    Authors: 薛慧敏;蔡政安
    Contributors: 國立政治大學統計學系
    行政院國家科學委員會
    Keywords: 統計;隨機森林分類方法;檢定基因集
    Date: 2011
    Issue Date: 2012-08-30 09:59:20 (UTC+8)
    Abstract: 近年來在基因微陣列(microarray)實驗中,越來越多的研究人員將研究目的由檢定個別基因與外顯表現變數(phenotype)的相關性,擴展到檢定特定基因集合(gene-set)的顯著性。研究人員依據基因之生物功能將基因歸類,目前已有多個公開資料庫提供基因組相關資訊。基因集合的顯著性檢定可分為兩類,第一類稱為競爭性檢定(competitive test),主要目的為檢定一特定基因集合在相較於其他的基因集合下,有特別顯著的表現。第二類則稱為自足的檢定(self-contained test),主要在檢定此特定基因集合是否有顯著表現。在這個研究中,我們將建立依據基因集合的分類器,並以此分類器的預測誤差率來評估此集合與外顯變數的相關性,我們將利用隨機森林(random forest)來建立分類器。由於此二個檢定的虛無假設不同,故其虛無分配也不同,我們在研究中也將探討各檢定的P值的計算方式。最後我們將應用我們的方法在實際資料上以與其他方法作比較,另外也將設計電腦模擬實驗來驗證本方法的有效性。
    In DNA microarray studies, a gene-set analysis (GSA) is used to evaluate the association between the expression of biological pathways, or a priori defined gene sets, and a particular phenotype. Two types of differentially expressed testing are of research interest: the competitive testing and the self-contained testing. The competitive test is to determine whether the specific gene set is relatively differentially expressed when compared to other gene sets. The self-contained test is interested in finding whether the gene set alone is differentially expressed. The two tests involve different null distributions. To take consideration on the interaction or correlation within the gene set, we consider assessing the significance of the gene set by the performance of a classifier developed upon the gene set. In this study, the Random Forest classification is applied. For each of the two tests, the corresponding empirical P-value of an observed out-of-bag (OOB) error rate of the classifier is introduced by using adequate resampling method. Several real examples will be analyzed for comparison. A simulation study will be conducted for verification.
    Relation: 應用研究
    學術補助
    研究期間:10008~ 10107
    研究經費:441仟元
    Data Type: report
    Appears in Collections:[Department of Statistics] NSC Projects

    Files in This Item:

    File Description SizeFormat
    100-2118-M004-004.pdf792KbAdobe PDF2606View/Open


    All items in 政大典藏 are protected by copyright, with all rights reserved.


    社群 sharing

    著作權政策宣告 Copyright Announcement
    1.本網站之數位內容為國立政治大學所收錄之機構典藏,無償提供學術研究與公眾教育等公益性使用,惟仍請適度,合理使用本網站之內容,以尊重著作權人之權益。商業上之利用,則請先取得著作權人之授權。
    The digital content of this website is part of National Chengchi University Institutional Repository. It provides free access to academic research and public education for non-commercial use. Please utilize it in a proper and reasonable manner and respect the rights of copyright owners. For commercial use, please obtain authorization from the copyright owner in advance.

    2.本網站之製作,已盡力防止侵害著作權人之權益,如仍發現本網站之數位內容有侵害著作權人權益情事者,請權利人通知本網站維護人員(nccur@nccu.edu.tw),維護人員將立即採取移除該數位著作等補救措施。
    NCCU Institutional Repository is made to protect the interests of copyright owners. If you believe that any material on the website infringes copyright, please contact our staff(nccur@nccu.edu.tw). We will remove the work from the repository and investigate your claim.
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - Feedback