Theoretical Models and Computational Techniques for the Analysis of Microbial Communities

  1. Riera Roca, Gabriel
Supervised by:
  1. Maria de la Mercè Llabrés Segura Director
  2. Francesc Andreu Rosselló Llompart Director

Defence university: Universitat de les Illes Balears

Fecha de defensa: 15 September 2023

Committee:
  1. José María Sempere Luna Chair
  2. Arnau Mir Torres Secretary
  3. Josefa Antón Botella Committee member

Type: Thesis

Abstract

Microbial communities are complex ecosystems comprising diverse microorganisms that interact within a shared living space. Understanding their diversity and the relationships between population compositions is crucial for comprehending their dynamics and ecological significance. In this thesis, we focus on two key aspects: (1) biodiversity assessment in microbial communities and (2) analyzing virus-host relations in metagenomic and metaviromic samples using computational techniques. To assess microbial biodiversity, measures based on phylogenetic information have been proposed. The most popular such measure is Faith’s phylogenetic diversity (PD), which quantifies the diversity of phenotypic characters in a set of species using a phylogenetic tree. However, in microbial evolution, reticulate events such as genetic recombinations and lateral gene transfers play significant roles, making it necessary the use of phylogenetic networks. We develop an exchange property for the extension of PD to phylogenetic networks, allowing the characterization of subsets of species with maximal rPSD scores on up to semi-binary level-2 networks or semi-ternary level-1 networks via a polynomial time greedy algorithm. Furthermore, in the same context, we investigate the application of interaction indices from game theory to phylogenetic networks. These indices evaluate the contributions of coalitions of species to the overall phylogenetic diversity. We derive simplified expressions for the Shapley interaction index and the Banzhaf interaction index for various cooperative games with phylogenetic meaning defined on phylogenetic networks, including rooted and unrooted phylogenetic subnet diversity on rooted phylogenetic networks and phylogenetic subnet diversity on a very popular type of non rooted phylogenetic networks, the split networks. These expressions deepen our understanding of value and power distribution among species and groups of species. In the second part, we delve into the analysis of virus-host relations within microbial communities. The study of virus-host relationships in metagenomic samples is crucial for understanding the dynamics and impact of viruses in microbial communities. We begin by addressing the challenge of classification of viruses in metaviromic samples. Despite viruses being the most abundant life forms on Earth, there has been a lack of software for the taxonomic classification of metaviromic data. We propose a new tool, VPF-Class, based on Viral Protein Families (VPFs) that provides both a taxonomic classification and a host prediction. Then we introduce METEOR, a tool that integrates VPF-Class and metagenomic assignment tools like MegaBLAST and TANGO. The host predictions of viral sequences generated by VPF-Class are cross-validated and enriched with evidence about putative hosts present in a metagenomic sample obtained from the same microbial community, resulting in more accurate host predictions restricted to hosts present in the metagenomic sample. Finally, we address the challenge of aligning virus-host protein-protein interaction networks (PPIN). We present a compact integer linear programming formulation of the PPIN alignment problem, which can be solved using state-of-the-art mathematical modeling and integer linear programming software tools. We also provide empirical results demonstrating that small biological networks, such as virus-host PPIN in the STRING Viruses database, can be aligned in a reasonable amount of time on a personal computer, yielding structurally coherent and biologically meaningful alignments.