000069736 001__ 69736
000069736 005__ 20190709135535.0
000069736 0247_ $$2doi$$a10.1093/gigascience/gix096
000069736 0248_ $$2sideral$$a104805
000069736 037__ $$aART-2017-104805
000069736 041__ $$aeng
000069736 100__ $$aDe Anda, V.
000069736 245__ $$aMEBS, a software platform to evaluate large (meta)genomic collections according to their metabolic machinery: Unraveling the sulfur cycle
000069736 260__ $$c2017
000069736 5060_ $$aAccess copy available to the general public$$fUnrestricted
000069736 5203_ $$aThe increasing number of metagenomic and genomic sequences has dramatically improved our understanding of microbial diversity, yet our ability to infer metabolic capabilities in such datasets remains challenging. We describe the Multigenomic Entropy Based Score pipeline (MEBS), a software platform designed to evaluate, compare, and infer complex metabolic pathways in large "omic" datasets, including entire biogeochemical cycles. MEBS is open source and available through https://github.com/eead-csic-compbio/metagenome Pfam score. To demonstrate its use, we modeled the sulfur cycle by exhaustively curating the molecular and ecological elements involved (compounds, genes, metabolic pathways, and microbial taxa). This information was reduced to a collection of 112 characteristic Pfam protein domains and a list of complete-sequenced sulfur genomes. Using the mathematical framework of relative entropy (H''), we quantitatively measured the enrichment of these domains among sulfur genomes. The entropy of each domain was used both to build up a final score that indicates whether a (meta)genomic sample contains the metabolic machinery of interest and to propose marker domains in metagenomic sequences such as DsrC (PF04358). MEBS was benchmarked with a dataset of 2107 non-redundant microbial genomes from RefSeq and 935 metagenomes from MG-RAST. Its performance, reproducibility, and robustness were evaluated using several approaches, including random sampling, linear regression models, receiver operator characteristic plots, and the area under the curve metric (AUC). Our results support the broad applicability of this algorithm to accurately classify (AUC = 0.985) hard-to-culture genomes (e.g., Candidatus Desulforudis audaxviator), previously characterized ones, and metagenomic environments such as hydrothermal vents, or deep-sea sediment. Our benchmark indicates that an entropy-based score can capture the metabolic machinery of interest and can be used to efficiently classify large genomic and metagenomic datasets, including uncultivated/unexplored taxa.
000069736 536__ $$9info:eu-repo/grantAgreement/ES/MINECO/CSIC13-4E-2490
000069736 540__ $$9info:eu-repo/semantics/openAccess$$aby$$uhttp://creativecommons.org/licenses/by/3.0/es/
000069736 590__ $$a7.267$$b2017
000069736 591__ $$aMULTIDISCIPLINARY SCIENCES$$b7 / 64 = 0.109$$c2017$$dQ1$$eT1
000069736 592__ $$a5.022$$b2017
000069736 593__ $$aHealth Informatics$$c2017$$dQ1
000069736 593__ $$aComputer Science Applications$$c2017$$dQ1
000069736 655_4 $$ainfo:eu-repo/semantics/article$$vinfo:eu-repo/semantics/publishedVersion
000069736 700__ $$aZapata-Peñasco, I.
000069736 700__ $$aPoot-Hernandez, A.C.
000069736 700__ $$aEguiarte, L.E.
000069736 700__ $$0(orcid)0000-0002-5462-907X$$aContreras-Moreira, B.$$uUniversidad de Zaragoza
000069736 700__ $$aSouza, V.
000069736 7102_ $$11002$$2060$$aUniversidad de Zaragoza$$bDpto. Bioq.Biolog.Mol. Celular$$cÁrea Bioquímica y Biolog.Mole.
000069736 773__ $$g6, 11 (2017), 1-17$$pGigaScience.$$tGigaScience$$x2047-217X
000069736 8564_ $$s741799$$uhttps://zaguan.unizar.es/record/69736/files/texto_completo.pdf$$yVersión publicada
000069736 8564_ $$s96624$$uhttps://zaguan.unizar.es/record/69736/files/texto_completo.jpg?subformat=icon$$xicon$$yVersión publicada
000069736 909CO $$ooai:zaguan.unizar.es:69736$$particulos$$pdriver
000069736 951__ $$a2019-07-09-12:04:49
000069736 980__ $$aARTICLE