資料載入中.....
|
請使用永久網址來引用或連結此文件:
https://nccur.lib.nccu.edu.tw/handle/140.119/158718
|
題名: | 基於 Plaid 演算法的雙向分群缺失值插補方法 A Biclustering Approach to Missing-Value Imputation Based on the PLAID Algorithm |
作者: | 林詠盛 Lin, Yung-Sheng |
貢獻者: | 吳漢銘 Wu, Han-Ming 林詠盛 Lin, Yung-Sheng |
關鍵詞: | 缺失值補值 雙向分群 PLAID 演算法 Missing data imputation Biclustering PLAID algorithm |
日期: | 2025 |
上傳時間: | 2025-08-04 15:12:22 (UTC+8) |
摘要: | 在資料分析過程中,缺失值的處理是極為關鍵的一步,尤其是在生物資訊領域中,資料集常常包含缺漏的數值,這可能會削弱研究結果的有效性。目前常用的補值方法如多重插補(Multiple Imputation)與最近鄰插補法(K-Nearest Neighbors, KNN),皆存在明顯的限制。多重插補仰賴強烈且往往難以驗證的隨機假設,而 KNN 在高維資料中則表現不佳。為了解決這些問題,我們提出一種基於 PLAID 雙向分群(biclustering)演算法的新型補值框架。PLAID 能夠偵測資料中的重疊模式與區塊結構,有效捕捉在基因表現與臨床資料中常見的局部共變異與功能模組。透過這些結構導引補值,我們的方法能實現具有生物學意義且具情境關聯性的缺值處理。我們進行模擬實驗與實際資料分析,並與現有方法進行比較,結果顯示,相較於傳統方法,善用雙向叢集結構能帶來更準確且更具生物學意涵的補值結果。 Missing value imputation is a critical step in data analysis, especially in bioinformatics, where datasets frequently contain missing entries that can undermine the validity of results. Current imputation methods, such as multiple imputation and k-nearest neighbors (KNN), have notable limitations. Multiple imputation depends on strong, and often untestable, stochastic assumptions, while KNN suffers from poor performance in high-dimensional data. To address these challenges, we propose a new imputation framework based on the PLAID biclustering algorithm. PLAID detects overlapping patterns and block structures in the data, capturing localized co-variation and functional modules commonly found in gene expression and clinical datasets. By using these structures to guide imputation, our method ensures biologically coherent and context-aware missing data handling. Through simulation studies and real-world data analyses, we compare our approach with existing methods. The results demonstrate that leveraging biclustering structures leads to more accurate and biologically meaningful imputation compared to conventional techniques. |
參考文獻: | Aittokallio, T. (2010). Dealing with missing values in large-scale studies: Microarray data imputation and beyond. Briefings in Bioinformatics, 11(2), 253–264. Azur, M. J., Stuart, E. A., Frangakis, C., & Leaf, P. J. (2011). Multiple imputation by chained equations: what is it and how does it work?. International Journal of Methods in Psychiatric Research, 20(1), 40–49. https://doi.org/10.1002/mpr.329 Andrews, T. S., & Hemberg, M. (2019). False signals induced by single-cell imputation. F1000Research, 7, 1740. https://doi.org/10.12688/f1000research.16613.2 Bishop, C. M. (1999). Variational principal components. In 1999 Ninth International Conference on Artificial Neural Networks ICANN 99 (Conf. Publ. No. 470) (Vol. 1, pp. 509–514). IET. Jadhav, A., Pramod, D., and Ramanathan, K. (2019). Comparison of performance of data imputation methods for numeric dataset. Applied Artificial Intelligence, 33(10), 913–933. Jin, L., Bi, Y., Hu, C., Qu, J., Shen, S., Wang, X., and Tian, Y. (2021). A comparative study of evaluating missing value imputation methods in label-free proteomics. Scientific Reports, 11(1), 1760. Lazzeroni, L. and Owen, A. (2002). Plaid models for gene expression data. Statistica Sinica, 12, 61–86. Liew, A.W.-C., Law, N.-F., and Yan, H. (2011). Missing value imputation for gene expression data: Computational techniques to recover missing data from available information. Briefings in Bioinformatics, 12(5), 498–513. Liao, S.G., Lin, Y., Kang, D.D., Chandra, D., Bon, J., Kaminski, N., and Tseng, G.C. (2014). Missing value imputation in high-dimensional phenomic data: imputable or not, and how? BMC bioinformatics, 15(1), 1–12. Oba, S., Sato, M.A., Takemasa, I., Monden, M., Matsubara, K.I., and Ishii, S. (2003). A Bayesian missing value estimation method for gene expression profile data. Bioinformatics, 19(16), 2088–2096. Stacklies, W., Redestig, H., Scholz, M., Walther, D., and Selbig, J. (2007). pcaMethods—A Bioconductor package providing PCA methods for incomplete data. Bioinformatics, 23(9), 1164–1167. https://doi.org/10.1093/bioinformatics/btm069 Schmitt, P., Mandel, J., and Guedj, M. (2015). A comparison of six methods for missing data imputation. Journal of Biometrics & Biostatistics, 6(1), 1. Samad, M., Kowsar, I., Rabbani, S., and Hou, Y. (2024). Deepifsac: Deep imputation of missing values using feature and sample attention within contrastive framework. Available at SSRN 5137008. Troyanskaya, O., Cantor, M., Sherlock, G., Brown, P., Hastie, T., Tibshirani, R., & Altman, R. B. (2001). Missing value estimation methods for DNA microarrays. Bioinformatics, 17(6), 520–525. https://doi.org/10.1093/bioinformatics/17.6.520 Turner, H., Bailey, T., and Krzanowski, W. (2005). Improved biclustering of microarray data demonstrated through systematic performance tests. Computational Statistics& Data Analysis, 48(2), 235–254 Van Buuren, S., & Oudshoorn, K. (1999). Flexible multivariate imputation by MICE (Tech. Rep.). TNO Report, TNO. Van Buuren, S., & Groothuis-Oudshoorn, K. (2011). Mice: Multivariate imputation by chained equations in R. Journal of Statistical Software, 45(3), 1–67. https://doi.org/10.18637/jss.v045.i03 Yang, Y., Xu, Z., & Song, D. (2016). Missing value imputation for microRNA expression data by using a GO-based similarity measure. BMC Bioinformatics, 17(Suppl 17), 109–116. https://doi.org/10.1186/s12859-016-1275-2 Zappia, L., Phipson, B., and Oshlack, A. (2017). Splatter: simulation of single-cell RNA sequencing data. Genome Biology, 18(1), 174. |
描述: | 碩士 國立政治大學 統計學系 112354029 |
資料來源: | http://thesis.lib.nccu.edu.tw/record/#G0112354029 |
資料類型: | thesis |
顯示於類別: | [統計學系] 學位論文
|
文件中的檔案:
檔案 |
描述 |
大小 | 格式 | 瀏覽次數 |
402901.pdf | | 5278Kb | Adobe PDF | 0 | 檢視/開啟 |
|
在政大典藏中所有的資料項目都受到原著作權保護.
|