政大機構典藏-National Chengchi University Institutional Repository(NCCUR):Item 140.119/132067

政大機構典藏-National Chengchi University Institutional Repository(NCCUR):Item 140.119/132067

English | 正體中文 | 简体中文 | Post-Print筆數 : 27 | Items with full text/Total items : 110934/141859 (78%)
Visitors : 47689852 Online Users : 1254

RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.

Scope

please add "double quotation mark" for query phrases to get precise results

please goto advance search for comprehansive author search

Adv. Search

Home ‧ Login ‧ Upload ‧ Help ‧ About ‧ Administer

Goto mobile version

政大典藏 > College of Informatics > Department of Computer Science > Theses > Item 140.119/132067

Please use this identifier to cite or link to this item: https://nccur.lib.nccu.edu.tw/handle/140.119/132067

Title:	HiCBin：利用 Hi-C 交互網路對總體基因組裝進行反捲積 HiCBin: Deconvoluting metagenomic assemblies by Hi-C connect network
Authors:	鄭惟文 Cheng, Wei-Wen
Contributors:	張家銘 Chang, Jia-Ming 鄭惟文 Cheng, Wei-Wen
Keywords:	Hi-C 總體基因組學總體基因組組裝基因組連結網路基因組分箱智慧局部移動法 Hi-C Metagenomics Metagenome-Assembled genomes Connect network Genome binning SLM
Date:	2020
Issue Date:	2020-10-05 15:16:42 (UTC+8)
Abstract:	背景:總體基因組學是一項從環境樣本中還原微生物群落基因組的研究。由於大部分微生物都無法獨立進行培養，因此從總體基因組中對個別物種的基因組(即由總體基因組組裝而成的基因組，簡稱 MAGs)進行反捲積，是一件困難的任務。先前有些研究描述如何應用 Hi-C 資料復原 MAG 的方法，例如 MetaPhase、ProxiMeta 和 bin3C。結果:在本研究中除了應用 Hi-C 資料來進行基因組分箱之外，我們更進一步分析 Hi-C 連結網路的特性。結果顯示 Hi-C 連結網路遵循「截斷的冪次定律分佈」，這是一種冪次定律分佈的變型。在先前的研究中，智慧局部移動法(簡稱 SLM)在分群遵循冪次定律分佈的網路時具有出色的表現，因此我們採用 SLM 演算法來進行基因組分箱。我們將此方法命名為 HiCBin，並與另外兩個相關的工具——bin3C 與 ProxiMeta，比較基因組分箱的結果。相較另外兩種工具，HiCBin 不只復原較多 Near 等級的 MAGs，也復原更多 Moderate 等級以上的 MAGs。結論:HiCBin 雖有許多部分的步驟是遵循 bin3C 的方法，但我們在基因組分箱的表現更為優異。這表示針對 Hi-C 連結網路的屬性分析，以及使用合適的叢集演算法，可以獲得更好的分箱結果。於此，HiCBin 提供了一個新的觀點，在未來可能改進基於 Hi-C 的總體基因組反捲積方法。實驗的原始碼可在以下連結公開取得: https://github.com/changlabtw/HiCBin Background: Metagenomics is the study of recovering the collective microbial genomes from an environmental sample. Due to most micro-organisms that can’t be cultured independently from their native community, it is challenging to identify individual species genomes from metagenomes, namely metagenome-assembled genomes (MAGs). Previous works like MetaPhase, ProxiMeta, and bin3C have described the methods applying Hi-C data to recover the MAGs. Results: In this work, in addition to using Hi-C data for genome binning, we further analyze the property of the Hi-C connect networks. The results show that the Hi-C connect networks follow the truncated power-law distribution, a variation of a power-law distribution. Thus, we use a smart local moving algorithm for genome binning, which has stellar performance on clustering the networks following a power-law distribution in previous works. Then, we compare our method, HiCBin, against two related tools, bin3C and ProxiMeta in a real biological data. HiCBin outperforms other tools in the number of retrieved near-complete MAGs and recovers more MAGs above the “Moderate” level. Conclusions: Although HiCBin follows most of the steps of bin3C, we have better performance in genome binning. It indicates that the networks’ property and the suitable clustering algorithm should be considered to obtain better binning results. HiCBin could provide a new aspect where the Hi-C-based metagenomic deconvolution methods can be improved in the future. The source code for the whole experiment is publicly available at https://github.com/changlabtw/HiCBin.
Reference:	[1] A. C. Howe, J. K. Jansson, S. A. Malfatti, S. G. Tringe, J. M. Tiedje, and C. T. Brown, “Tackling soil diversity with the assembly of large, complex metagenomes,” Proc. Natl. Acad. Sci. U. S. A., vol. 111, no. 13, pp. 4904–4909, 2014, doi: 10.1073/pnas.1402564111. [2] J. C. Venter et al., “Environmental Genome Shotgun Sequencing of the Sargasso Sea,” Science (80-. )., vol. 304, no. 5667, pp. 66–74, 2004, doi: 10.1126/science.1093857. [3] J. Oh et al., Biogeography and individuality shape function in the human skin metagenome, vol. 514, no. 7520. 2014. [4] J. Qin et al., “A human gut microbial gene catalogue established by metagenomic sequencing,” Nature, vol. 464, no. 7285, pp. 59–65, 2010, doi: 10.1038/nature08821. [5] Jo Handelsman, “Metagenomics: Application of Genomics to Uncultured Microorganisms,” Microbiol. Mol. Biol. Rev., vol. 68, no. 4, pp. 669–685, 2004, doi: 10.1128/MBR.68.4.669–685.2004. [6] M. S. Rappé and S. J. Giovannoni, “The Uncultured Microbial Majority,” Annu. Rev. Microbiol., vol. 57, no. 1, pp. 369–394, 2003, doi: 10.1146/annurev.micro.57.030502.090759. [7] C. W. Beitel et al., “Strain- and plasmid-level deconvolution of a synthetic metagenome by sequencing proximity ligation products,” PeerJ, vol. 2, p. e415, 2014, doi: 10.7717/peerj.415. [8] T. Thomas, J. Gilbert, and F. Meyer, “Metagenomics - a guide from sampling to data analysis,” Microb. Inform. Exp., vol. 2, no. 1, p. 3, 2012, doi: 10.1186/2042-5783-2-3. [9] L. W. Hugerth et al., “Metagenome-assembled genomes uncover a global brackish microbiome,” Genome Biol., vol. 16, no. 1, pp. 1–18, 2015, doi: 10.1186/s13059-015-0834-7. [10] J. N. Burton, I. Liachko, M. J. Dunham, and J. Shendure, “Species-level deconvolution of metagenome assemblies with Hi-C-based contact probability maps,” G3 Genes, Genomes, Genet., vol. 4, no. 7, pp. 1339– 1346, 2014, doi: 10.1534/g3.114.011825. [11] V. Iverson, R. M. Morris, C. D. Frazar, C. T. Berthiaume, R. L. Morales, and E. V. Armbrust, “Untangling Genomes from Metagenomes: Revealing an Uncultured Class of Marine Euryarchaeota,” Science (80-. )., vol. 335, no. 6068, pp. 587 LP – 590, Feb. 2012, doi: 10.1126/science.1212665. [12] S. Mitra et al., “Analysis of the intestinal microbiota using SOLiD 16S rRNA gene sequencing and SOLiD shotgun sequencing,” BMC Genomics, vol. 14 Suppl 5, no. Suppl 5, pp. S16–S16, 2013, doi: 10.1186/1471- 2164-14-S5-S16. [13] P. Narasingarao et al., “De novo metagenomic assembly reveals abundant novel major lineage of Archaea in hypersaline microbial communities,” ISME J., vol. 6, no. 1, pp. 81–93, Jan. 2012, doi: 10.1038/ismej.2011.78. [14] C. Rinke et al., “Insights into the phylogeny and coding potential of microbial dark matter,” Nature, vol. 499, no. 7459, pp. 431–437, 2013, doi: 10.1038/nature12352. [15] G. J. Dick et al., “Community-wide analysis of microbial genome sequence signatures,” Genome Biol., vol. 10, no. 8, p. R85, 2009, doi: 10.1186/gb-2009-10-8-r85. [16] L. A. Hug et al., “Community genomic analyses constrain the distribution of metabolic traits across the Chloroflexi phylum and indicate roles in sediment carbon cycling,” Microbiome, vol. 1, no. 1, p. 22, 2013, doi: 10.1186/2049-2618-1-22. [17] I. Sharon, M. J. Morowitz, B. C. Thomas, E. K. Costello, D. A. Relman, and J. F. Banfield, “Time series community genomics analysis reveals rapid shifts in bacterial species, strains, and phage during infant gut colonization,” Genome Res., vol. 23, no. 1, pp. 111–120, Jan. 2013, doi: 10.1101/gr.142315.112. [18] M. Albertsen, P. Hugenholtz, A. Skarshewski, K. L. Nielsen, G. W. Tyson, and P. H. Nielsen, “Genome sequences of rare, uncultured bacteria obtained by differential coverage binning of multiple metagenomes,” Nat. Biotechnol., vol. 31, no. 6, pp. 533–538, 2013, doi: 10.1038/nbt.2579. [19] V. Mallawaarachchi, A. Wickramarachchi, and Y. Lin, “GraphBin: refined binning of metagenomic contigs using assembly graphs,” Bioinformatics, Mar. 2020, doi: 10.1093/bioinformatics/btaa180. [20] Y.-W. Wu, Y.-H. Tang, S. G. Tringe, B. A. Simmons, and S. W. Singer, “MaxBin: an automated binning method to recover individual genomes from metagenomes using an expectation-maximization algorithm,” Microbiome, vol. 2, no. 1, p. 26, 2014, doi: 10.1186/2049-2618-2-26. [21] Y.-W. Wu, B. A. Simmons, and S. W. Singer, “MaxBin 2.0: an automated binning algorithm to recover genomes from multiple metagenomic datasets,” Bioinformatics, vol. 32, no. 4, pp. 605–607, Oct. 2015, doi: 10.1093/bioinformatics/btv638. [22] D. D. Kang, J. Froula, R. Egan, and Z. Wang, “MetaBAT, an efficient tool for accurately reconstructing single genomes from complex microbial communities,” PeerJ, vol. 3, p. e1165, 2015, doi: 10.7717/peerj.1165. [23] J. Alneberg et al., “Binning metagenomic contigs by coverage and composition,” Nat. Methods, vol. 11, no. 11, pp. 1144–1146, 2014, doi: 10.1038/nmeth.3103. [24] M. Z. DeMaere and A. E. Darling, “bin3C: exploiting Hi-C sequencing data to accurately resolve metagenome-assembled genomes,” Genome Biol., vol. 20, no. 1, p. 46, 2019, doi: 10.1186/s13059-019- 1643-1. [25] M. O. Press et al., “Hi-C deconvolution of a human gut microbiome yields high-quality draft genomes and reveals plasmid-genome interactions,” bioRxiv, p. 198713, Jan. 2017, doi: 10.1101/198713. [26] E. Lieberman-Aiden et al., “Comprehensive Mapping of Long-Range Interactions Reveals Folding Principles of the Human Genome,” Science, vol. 326, pp. 289–293, Oct. 2009, doi: 10.1126/science.1181369. [27] M. Rosvall, D. Axelsson, and C. T. Bergstrom, “The map equation,” Eur. Phys. J. Spec. Top., vol. 178, no. 1, pp. 13–23, 2009, doi: 10.1140/epjst/e2010-01179-1. [28] M. De Domenico, A. Lancichinetti, A. Arenas, and M. Rosvall, “Identifying modular flows on multilayer networks reveals highly overlapping organization in interconnected systems,” Phys. Rev. X, vol. 5, no. 1, 2015, doi: 10.1103/PhysRevX.5.011027. [29] Bushnell B., “BBTools.” [Online]. Available: sourceforge.net/projects/bbmap/ (visited on 06/13/2019). [30] S. Nurk, D. Meleshko, A. Korobeynikov, and P. A. Pevzner, “MetaSPAdes: A new versatile metagenomic assembler,” Genome Res., vol. 27, no. 5, pp. 824–834, 2017, doi: 10.1101/gr.213959.116. [31] H. Li, “Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM,” ArXiv, vol. 1303, Mar. 2013. [32] H. Li et al., “The Sequence Alignment/Map format and SAMtools,” Bioinformatics, vol. 25, no. 16, pp. 2078–2079, Aug. 2009, doi: 10.1093/bioinformatics/btp352. [33] P. A. Knight and D. Ruiz, “A fast algorithm for matrix balancing,” IMA J. Numer. Anal., vol. 33, no. 3, pp. 1029–1047, Oct. 2012, doi: 10.1093/imanum/drs019. [34] I. Tëmkin and N. Eldredge, “Networks and Hierarchies: Approaching Complexity in Evolutionary Theory,” in Interdisciplinary Evolution Research, 2015, pp. 183–226. [35] P. Erdős and A. Rényi, “On the Evolution of Random Graphs,” in PUBLICATION OF THE MATHEMATICAL INSTITUTE OF THE HUNGARIAN ACADEMY OF SCIENCES, 1960, pp. 17–61. [36] M. E. J. Newman, “Power laws, Pareto distributions and Zipf’s law,” Contemp. Phys., vol. 46, no. 5, pp. 323–351, 2005, doi: 10.1080/00107510500052444. [37] A. Clauset, C. R. Shalizi, and M. E. J. Newman, “Power-Law Distributions in Empirical Data,” SIAM Rev., vol. 51, no. 4, pp. 661–703, Jul. 2009. [38] R. Kissell and J. Poserina, “Chapter 4 - Advanced Math and Statistics,” R. Kissell and J. B. T.-O. S. M. Poserina Statistics, and Fantasy, Eds. Academic Press, 2017, pp. 103–135. [39] A. Pombo and M. Nicodemi, “Physical mechanisms behind the large scale features of chromatin organization,” Transcription, vol. 5, no. 2, p. e28447, Apr. 2014, doi: 10.4161/trns.28447. [40] F. Ay, T. L. Bailey, and W. S. Noble, “Statistical confidence estimation for Hi-C data reveals regulatory chromatin contacts,” Genome Res., vol. 24, no. 6, pp. 999–1011, Jun. 2014, doi: 10.1101/gr.160374.113. [41] T. Liu and Z. Wang, “Reconstructing high-resolution chromosome three-dimensional structures by Hi-C complex networks,” BMC Bioinformatics, vol. 19, no. 17, p. 496, 2018, doi: 10.1186/s12859-018-2464-z. [42] S. Pigolotti, M. H. Jensen, and G. Tiana, “Hierarchical domain model explains multifractal scaling of chromosome contact maps,” bioRxiv, p. 686279, Jan. 2019, doi: 10.1101/686279. [43] T.-C. Kan, “Apply graph theory to visualizing and analyzing Hi-C contact network,” 國立政治大學, 2018. [44] S. Emmons, S. Kobourov, M. Gallant, and K. Börner, “Analysis of Network Clustering Algorithms and Cluster Quality Metrics at Scale,” PLoS One, vol. 11, no. 7, p. e0159161, Jul. 2016. [45] V. D. Blondel, J.-L. Guillaume, R. Lambiotte, and E. Lefebvre, “Fast unfolding of communities in large networks,” J. Stat. Mech. Theory Exp., vol. 2008, no. 10, p. P10008, 2008, doi: 10.1088/1742-5468/2008/10/p10008. [46] L. Waltman and N. J. van Eck, “A smart local moving algorithm for large-scale modularity-based community detection,” Eur. Phys. J. B, vol. 86, no. 11, p. 471, 2013, doi: 10.1140/epjb/e2013-40829-0. [47] M. Rosvall and C. T. Bergstrom, “Maps of random walks on complex networks reveal community structure,” Proc. Natl. Acad. Sci., vol. 105, no. 4, pp. 1118 LP – 1123, Jan. 2008, doi: 10.1073/pnas.0706851105. [48] U. N. Raghavan, R. Albert, and S. Kumara, “Near linear time algorithm to detect community structures in large-scale networks,” Phys. Rev. E, vol. 76, no. 3, p. 36106, Sep. 2007, doi: 10.1103/PhysRevE.76.036106. [49] A. Lancichinetti, S. Fortunato, and F. Radicchi, “Benchmark graphs for testing community detection algorithms,” Phys. Rev. E, vol. 78, no. 4, p. 46110, Oct. 2008, doi: 10.1103/PhysRevE.78.046110. [50] R. Rotta and A. Noack, “Multilevel Local Search Algorithms for Modularity Clustering,” ACM J. Exp. Algorithmics, vol. 16, Jul. 2011, doi: 10.1145/1963190.1970376. [51] A. Butler, P. Hoffman, P. Smibert, E. Papalexi, and R. Satija, “Integrating single-cell transcriptomic data across different conditions, technologies, and species,” Nat. Biotechnol., vol. 36, no. 5, pp. 411–420, 2018, doi: 10.1038/nbt.4096. [52] L. Waltman and N. J. van Eck, “A smart local moving algorithm for large-scale modularity-based community detection.” [Online]. Available: http://www.ludowaltman.nl/slm/ (visited on 06/17/2020). [53] J. Reichardt and S. Bornholdt, “Statistical mechanics of community detection,” Phys. Rev. E, vol. 74, no. 1, p. 16110, Jul. 2006, doi: 10.1103/PhysRevE.74.016110. [54] W. Simeon, “E‐prints and the Open Archives Initiative,” Libr. Hi Tech, vol. 21, no. 2, pp. 151–158, Jan. 2003, doi: 10.1108/07378830310479794. [55] D. H. Parks, M. Imelfort, C. T. Skennerton, P. Hugenholtz, and G. W. Tyson, “CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes,” Genome Res., vol. 25, no. 7, pp. 1043–1055, Jul. 2015, doi: 10.1101/gr.186072.114. [56] A. Gurevich, V. Saveliev, N. Vyahhi, and G. Tesler, “QUAST: quality assessment tool for genome assemblies,” Bioinformatics, vol. 29, no. 8, pp. 1072–1075, Apr. 2013, doi: 10.1093/bioinformatics/btt086. [57] J. D. Hunter, “Matplotlib: A 2D graphics environment,” Comput. Sci. Eng., vol. 9, no. 3, pp. 90–95, 2007, doi: 10.1109/MCSE.2007.55. [58] J. Alstott, E. Bullmore, and D. Plenz, “powerlaw: A Python Package for Analysis of Heavy-Tailed Distributions,” PLoS One, vol. 9, no. 1, p. e85777, Jan. 2014. [59] J.-L. R. Stevens, P. Rudiger, and J. A. Bednar, “HoloViews: Building Complex Visualizations Easily for Reproducible Science,” in Proceedings of the 14th Python in Science Conference, 2015, pp. 59–66, doi: 10.25080/Majora-7b98e3ed-00a. [60] E. Almaas and A.-L. Barabási, “Power Laws in Biological Networks BT - Power Laws, Scale-Free Networks and Genome Biology,” E. V Koonin, Y. I. Wolf, and G. P. Karev, Eds. Boston, MA: Springer US, 2006, pp. 1–11. [61] O. Dudchenko et al., “De novo assembly of the <em>Aedes aegypti</em> genome using Hi-C yields chromosome-length scaffolds,” Science (80-. )., vol. 356, no. 6333, pp. 92 LP – 95, Apr. 2017, doi: 10.1126/science.aal3327.
Description:	碩士國立政治大學資訊科學系 106753031
Source URI:	http://thesis.lib.nccu.edu.tw/record/#G0106753031
Data Type:	thesis
DOI:	10.6814/NCCU202001729
Appears in Collections:	[Department of Computer Science ] Theses

Files in This Item:

File	Description	Size	Format
303101.pdf		7373Kb	Adobe PDF2	0	View/Open

All items in 政大典藏 are protected by copyright, with all rights reserved.

社群 sharing

著作權政策宣告 Copyright Announcement

1.本網站之數位內容為國立政治大學所收錄之機構典藏，無償提供學術研究與公眾教育等公益性使用，惟仍請適度，合理使用本網站之內容，以尊重著作權人之權益。商業上之利用，則請先取得著作權人之授權。
The digital content of this website is part of National Chengchi University Institutional Repository. It provides free access to academic research and public education for non-commercial use. Please utilize it in a proper and reasonable manner and respect the rights of copyright owners. For commercial use, please obtain authorization from the copyright owner in advance.

2.本網站之製作，已盡力防止侵害著作權人之權益，如仍發現本網站之數位內容有侵害著作權人權益情事者，請權利人通知本網站維護人員(nccur@nccu.edu.tw)，維護人員將立即採取移除該數位著作等補救措施。
NCCU Institutional Repository is made to protect the interests of copyright owners. If you believe that any material on the website infringes copyright, please contact our staff(nccur@nccu.edu.tw). We will remove the work from the repository and investigate your claim.

DSpace Software Copyright © 2002-2004 MIT & Hewlett-Packard / Enhanced by NTU Library IR team Copyright © - Feedback