政大機構典藏-National Chengchi University Institutional Repository(NCCUR):Item 140.119/158718

English | 正體中文 | 简体中文 | Post-Print筆數 : 27 | 全文筆數/總筆數 : 117581/148612 (79%)
造訪人次 : 69763034 線上人數 : 65

RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.

搜尋範圍

查詢小技巧：

您可在西文檢索詞彙前後加上"雙引號"，以獲取較精準的檢索結果

若欲以作者姓名搜尋，建議至進階搜尋限定作者欄位，可獲得較完整資料

進階搜尋

主頁 ‧ 登入 ‧ 上傳 ‧ 說明 ‧ 關於政大典藏 ‧ 管理

到手機版

政大機構典藏 > 商學院 > 統計學系 > 學位論文 > Item 140.119/158718

請使用永久網址來引用或連結此文件: https://nccur.lib.nccu.edu.tw/handle/140.119/158718

題名:	基於 Plaid 演算法的雙向分群缺失值插補方法 A Biclustering Approach to Missing-Value Imputation Based on the PLAID Algorithm
作者:	林詠盛 Lin, Yung-Sheng
貢獻者:	吳漢銘 Wu, Han-Ming 林詠盛 Lin, Yung-Sheng
關鍵詞:	缺失值補值雙向分群 PLAID 演算法 Missing data imputation Biclustering PLAID algorithm
日期:	2025
上傳時間:	2025-08-04 15:12:22 (UTC+8)
摘要:	在資料分析過程中，缺失值的處理是極為關鍵的一步，尤其是在生物資訊領域中，資料集常常包含缺漏的數值，這可能會削弱研究結果的有效性。目前常用的補值方法如多重插補（Multiple Imputation）與最近鄰插補法（K-Nearest Neighbors, KNN），皆存在明顯的限制。多重插補仰賴強烈且往往難以驗證的隨機假設，而 KNN 在高維資料中則表現不佳。為了解決這些問題，我們提出一種基於 PLAID 雙向分群（biclustering）演算法的新型補值框架。PLAID 能夠偵測資料中的重疊模式與區塊結構，有效捕捉在基因表現與臨床資料中常見的局部共變異與功能模組。透過這些結構導引補值，我們的方法能實現具有生物學意義且具情境關聯性的缺值處理。我們進行模擬實驗與實際資料分析，並與現有方法進行比較，結果顯示，相較於傳統方法，善用雙向叢集結構能帶來更準確且更具生物學意涵的補值結果。 Missing value imputation is a critical step in data analysis, especially in bioinformatics, where datasets frequently contain missing entries that can undermine the validity of results. Current imputation methods, such as multiple imputation and k-nearest neighbors (KNN), have notable limitations. Multiple imputation depends on strong, and often untestable, stochastic assumptions, while KNN suffers from poor performance in high-dimensional data. To address these challenges, we propose a new imputation framework based on the PLAID biclustering algorithm. PLAID detects overlapping patterns and block structures in the data, capturing localized co-variation and functional modules commonly found in gene expression and clinical datasets. By using these structures to guide imputation, our method ensures biologically coherent and context-aware missing data handling. Through simulation studies and real-world data analyses, we compare our approach with existing methods. The results demonstrate that leveraging biclustering structures leads to more accurate and biologically meaningful imputation compared to conventional techniques.
參考文獻:	Aittokallio, T. (2010). Dealing with missing values in large-scale studies: Microarray data imputation and beyond. Briefings in Bioinformatics, 11(2), 253–264. Azur, M. J., Stuart, E. A., Frangakis, C., & Leaf, P. J. (2011). Multiple imputation by chained equations: what is it and how does it work?. International Journal of Methods in Psychiatric Research, 20(1), 40–49. https://doi.org/10.1002/mpr.329 Andrews, T. S., & Hemberg, M. (2019). False signals induced by single-cell imputation. F1000Research, 7, 1740. https://doi.org/10.12688/f1000research.16613.2 Bishop, C. M. (1999). Variational principal components. In 1999 Ninth International Conference on Artificial Neural Networks ICANN 99 (Conf. Publ. No. 470) (Vol. 1, pp. 509–514). IET. Jadhav, A., Pramod, D., and Ramanathan, K. (2019). Comparison of performance of data imputation methods for numeric dataset. Applied Artificial Intelligence, 33(10), 913–933. Jin, L., Bi, Y., Hu, C., Qu, J., Shen, S., Wang, X., and Tian, Y. (2021). A comparative study of evaluating missing value imputation methods in label-free proteomics. Scientific Reports, 11(1), 1760. Lazzeroni, L. and Owen, A. (2002). Plaid models for gene expression data. Statistica Sinica, 12, 61–86. Liew, A.W.-C., Law, N.-F., and Yan, H. (2011). Missing value imputation for gene expression data: Computational techniques to recover missing data from available information. Briefings in Bioinformatics, 12(5), 498–513. Liao, S.G., Lin, Y., Kang, D.D., Chandra, D., Bon, J., Kaminski, N., and Tseng, G.C. (2014). Missing value imputation in high-dimensional phenomic data: imputable or not, and how? BMC bioinformatics, 15(1), 1–12. Oba, S., Sato, M.A., Takemasa, I., Monden, M., Matsubara, K.I., and Ishii, S. (2003). A Bayesian missing value estimation method for gene expression profile data. Bioinformatics, 19(16), 2088–2096. Stacklies, W., Redestig, H., Scholz, M., Walther, D., and Selbig, J. (2007). pcaMethods—A Bioconductor package providing PCA methods for incomplete data. Bioinformatics, 23(9), 1164–1167. https://doi.org/10.1093/bioinformatics/btm069 Schmitt, P., Mandel, J., and Guedj, M. (2015). A comparison of six methods for missing data imputation. Journal of Biometrics & Biostatistics, 6(1), 1. Samad, M., Kowsar, I., Rabbani, S., and Hou, Y. (2024). Deepifsac: Deep imputation of missing values using feature and sample attention within contrastive framework. Available at SSRN 5137008. Troyanskaya, O., Cantor, M., Sherlock, G., Brown, P., Hastie, T., Tibshirani, R., & Altman, R. B. (2001). Missing value estimation methods for DNA microarrays. Bioinformatics, 17(6), 520–525. https://doi.org/10.1093/bioinformatics/17.6.520 Turner, H., Bailey, T., and Krzanowski, W. (2005). Improved biclustering of microarray data demonstrated through systematic performance tests. Computational Statistics& Data Analysis, 48(2), 235–254 Van Buuren, S., & Oudshoorn, K. (1999). Flexible multivariate imputation by MICE (Tech. Rep.). TNO Report, TNO. Van Buuren, S., & Groothuis-Oudshoorn, K. (2011). Mice: Multivariate imputation by chained equations in R. Journal of Statistical Software, 45(3), 1–67. https://doi.org/10.18637/jss.v045.i03 Yang, Y., Xu, Z., & Song, D. (2016). Missing value imputation for microRNA expression data by using a GO-based similarity measure. BMC Bioinformatics, 17(Suppl 17), 109–116. https://doi.org/10.1186/s12859-016-1275-2 Zappia, L., Phipson, B., and Oshlack, A. (2017). Splatter: simulation of single-cell RNA sequencing data. Genome Biology, 18(1), 174.
描述:	碩士國立政治大學統計學系 112354029
資料來源:	http://thesis.lib.nccu.edu.tw/record/#G0112354029
資料類型:	thesis
顯示於類別:	[統計學系] 學位論文

文件中的檔案:

檔案	描述	大小	格式	瀏覽次數
402901.pdf		5278Kb	Adobe PDF	0	檢視/開啟

在政大典藏中所有的資料項目都受到原著作權保護.

社群 sharing

著作權政策宣告 Copyright Announcement

1.本網站之數位內容為國立政治大學所收錄之機構典藏，無償提供學術研究與公眾教育等公益性使用，惟仍請適度，合理使用本網站之內容，以尊重著作權人之權益。商業上之利用，則請先取得著作權人之授權。
The digital content of this website is part of National Chengchi University Institutional Repository. It provides free access to academic research and public education for non-commercial use. Please utilize it in a proper and reasonable manner and respect the rights of copyright owners. For commercial use, please obtain authorization from the copyright owner in advance.

2.本網站之製作，已盡力防止侵害著作權人之權益，如仍發現本網站之數位內容有侵害著作權人權益情事者，請權利人通知本網站維護人員(nccur@nccu.edu.tw)，維護人員將立即採取移除該數位著作等補救措施。
NCCU Institutional Repository is made to protect the interests of copyright owners. If you believe that any material on the website infringes copyright, please contact our staff(nccur@nccu.edu.tw). We will remove the work from the repository and investigate your claim.

DSpace Software Copyright © 2002-2004 MIT & Hewlett-Packard / Enhanced by NTU Library IR team Copyright © - 回饋