政大機構典藏-National Chengchi University Institutional Repository(NCCUR):Item 140.119/158718

政大機構典藏-National Chengchi University Institutional Repository(NCCUR):Item 140.119/158718

English | 正體中文 | 简体中文 | Post-Print筆數 : 27 | Items with full text/Total items : 117578/148609 (79%)
Visitors : 70205580 Online Users : 680

RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.

Scope

please add "double quotation mark" for query phrases to get precise results

please goto advance search for comprehansive author search

Adv. Search

Home ‧ Login ‧ Upload ‧ Help ‧ About ‧ Administer

Goto mobile version

政大典藏 > College of Commerce > Department of Statistics > Theses > Item 140.119/158718

Please use this identifier to cite or link to this item: https://nccur.lib.nccu.edu.tw/handle/140.119/158718

Title:	基於 Plaid 演算法的雙向分群缺失值插補方法 A Biclustering Approach to Missing-Value Imputation Based on the PLAID Algorithm
Authors:	林詠盛 Lin, Yung-Sheng
Contributors:	吳漢銘 Wu, Han-Ming 林詠盛 Lin, Yung-Sheng
Keywords:	缺失值補值雙向分群 PLAID 演算法 Missing data imputation Biclustering PLAID algorithm
Date:	2025
Issue Date:	2025-08-04 15:12:22 (UTC+8)
Abstract:	在資料分析過程中，缺失值的處理是極為關鍵的一步，尤其是在生物資訊領域中，資料集常常包含缺漏的數值，這可能會削弱研究結果的有效性。目前常用的補值方法如多重插補（Multiple Imputation）與最近鄰插補法（K-Nearest Neighbors, KNN），皆存在明顯的限制。多重插補仰賴強烈且往往難以驗證的隨機假設，而 KNN 在高維資料中則表現不佳。為了解決這些問題，我們提出一種基於 PLAID 雙向分群（biclustering）演算法的新型補值框架。PLAID 能夠偵測資料中的重疊模式與區塊結構，有效捕捉在基因表現與臨床資料中常見的局部共變異與功能模組。透過這些結構導引補值，我們的方法能實現具有生物學意義且具情境關聯性的缺值處理。我們進行模擬實驗與實際資料分析，並與現有方法進行比較，結果顯示，相較於傳統方法，善用雙向叢集結構能帶來更準確且更具生物學意涵的補值結果。 Missing value imputation is a critical step in data analysis, especially in bioinformatics, where datasets frequently contain missing entries that can undermine the validity of results. Current imputation methods, such as multiple imputation and k-nearest neighbors (KNN), have notable limitations. Multiple imputation depends on strong, and often untestable, stochastic assumptions, while KNN suffers from poor performance in high-dimensional data. To address these challenges, we propose a new imputation framework based on the PLAID biclustering algorithm. PLAID detects overlapping patterns and block structures in the data, capturing localized co-variation and functional modules commonly found in gene expression and clinical datasets. By using these structures to guide imputation, our method ensures biologically coherent and context-aware missing data handling. Through simulation studies and real-world data analyses, we compare our approach with existing methods. The results demonstrate that leveraging biclustering structures leads to more accurate and biologically meaningful imputation compared to conventional techniques.
Reference:	Aittokallio, T. (2010). Dealing with missing values in large-scale studies: Microarray data imputation and beyond. Briefings in Bioinformatics, 11(2), 253–264. Azur, M. J., Stuart, E. A., Frangakis, C., & Leaf, P. J. (2011). Multiple imputation by chained equations: what is it and how does it work?. International Journal of Methods in Psychiatric Research, 20(1), 40–49. https://doi.org/10.1002/mpr.329 Andrews, T. S., & Hemberg, M. (2019). False signals induced by single-cell imputation. F1000Research, 7, 1740. https://doi.org/10.12688/f1000research.16613.2 Bishop, C. M. (1999). Variational principal components. In 1999 Ninth International Conference on Artificial Neural Networks ICANN 99 (Conf. Publ. No. 470) (Vol. 1, pp. 509–514). IET. Jadhav, A., Pramod, D., and Ramanathan, K. (2019). Comparison of performance of data imputation methods for numeric dataset. Applied Artificial Intelligence, 33(10), 913–933. Jin, L., Bi, Y., Hu, C., Qu, J., Shen, S., Wang, X., and Tian, Y. (2021). A comparative study of evaluating missing value imputation methods in label-free proteomics. Scientific Reports, 11(1), 1760. Lazzeroni, L. and Owen, A. (2002). Plaid models for gene expression data. Statistica Sinica, 12, 61–86. Liew, A.W.-C., Law, N.-F., and Yan, H. (2011). Missing value imputation for gene expression data: Computational techniques to recover missing data from available information. Briefings in Bioinformatics, 12(5), 498–513. Liao, S.G., Lin, Y., Kang, D.D., Chandra, D., Bon, J., Kaminski, N., and Tseng, G.C. (2014). Missing value imputation in high-dimensional phenomic data: imputable or not, and how? BMC bioinformatics, 15(1), 1–12. Oba, S., Sato, M.A., Takemasa, I., Monden, M., Matsubara, K.I., and Ishii, S. (2003). A Bayesian missing value estimation method for gene expression profile data. Bioinformatics, 19(16), 2088–2096. Stacklies, W., Redestig, H., Scholz, M., Walther, D., and Selbig, J. (2007). pcaMethods—A Bioconductor package providing PCA methods for incomplete data. Bioinformatics, 23(9), 1164–1167. https://doi.org/10.1093/bioinformatics/btm069 Schmitt, P., Mandel, J., and Guedj, M. (2015). A comparison of six methods for missing data imputation. Journal of Biometrics & Biostatistics, 6(1), 1. Samad, M., Kowsar, I., Rabbani, S., and Hou, Y. (2024). Deepifsac: Deep imputation of missing values using feature and sample attention within contrastive framework. Available at SSRN 5137008. Troyanskaya, O., Cantor, M., Sherlock, G., Brown, P., Hastie, T., Tibshirani, R., & Altman, R. B. (2001). Missing value estimation methods for DNA microarrays. Bioinformatics, 17(6), 520–525. https://doi.org/10.1093/bioinformatics/17.6.520 Turner, H., Bailey, T., and Krzanowski, W. (2005). Improved biclustering of microarray data demonstrated through systematic performance tests. Computational Statistics& Data Analysis, 48(2), 235–254 Van Buuren, S., & Oudshoorn, K. (1999). Flexible multivariate imputation by MICE (Tech. Rep.). TNO Report, TNO. Van Buuren, S., & Groothuis-Oudshoorn, K. (2011). Mice: Multivariate imputation by chained equations in R. Journal of Statistical Software, 45(3), 1–67. https://doi.org/10.18637/jss.v045.i03 Yang, Y., Xu, Z., & Song, D. (2016). Missing value imputation for microRNA expression data by using a GO-based similarity measure. BMC Bioinformatics, 17(Suppl 17), 109–116. https://doi.org/10.1186/s12859-016-1275-2 Zappia, L., Phipson, B., and Oshlack, A. (2017). Splatter: simulation of single-cell RNA sequencing data. Genome Biology, 18(1), 174.
Description:	碩士國立政治大學統計學系 112354029
Source URI:	http://thesis.lib.nccu.edu.tw/record/#G0112354029
Data Type:	thesis
Appears in Collections:	[Department of Statistics] Theses

Files in This Item:

File	Description	Size	Format
402901.pdf		5278Kb	Adobe PDF	0	View/Open

All items in 政大典藏 are protected by copyright, with all rights reserved.

社群 sharing

著作權政策宣告 Copyright Announcement

1.本網站之數位內容為國立政治大學所收錄之機構典藏，無償提供學術研究與公眾教育等公益性使用，惟仍請適度，合理使用本網站之內容，以尊重著作權人之權益。商業上之利用，則請先取得著作權人之授權。
The digital content of this website is part of National Chengchi University Institutional Repository. It provides free access to academic research and public education for non-commercial use. Please utilize it in a proper and reasonable manner and respect the rights of copyright owners. For commercial use, please obtain authorization from the copyright owner in advance.

2.本網站之製作，已盡力防止侵害著作權人之權益，如仍發現本網站之數位內容有侵害著作權人權益情事者，請權利人通知本網站維護人員(nccur@nccu.edu.tw)，維護人員將立即採取移除該數位著作等補救措施。
NCCU Institutional Repository is made to protect the interests of copyright owners. If you believe that any material on the website infringes copyright, please contact our staff(nccur@nccu.edu.tw). We will remove the work from the repository and investigate your claim.

DSpace Software Copyright © 2002-2004 MIT & Hewlett-Packard / Enhanced by NTU Library IR team Copyright © - Feedback