Microbial pan-genomes are shaped by a complex combination of stochastic and deterministic forces. Even closely related genomes often exhibit extensive variation in their gene content. Understanding what drives this variation requires exploring the interactions of genes with each other and with their external environments. However, to date, conceptual models of pan-genome dynamics often represent genes as independent units and provide limited information about their mechanistic interactions. Here, we use pan-reactomes as proxies for pan-genomes since they can explicitly represent the interactions between the genes that code for metabolic reactions and simulate complex phenotypes that interact with the metabolic environment. We interpreted pan-reactomes as dynamic pools of metabolic reactions that are potentially gained or lost and simulated the routes along which different lineages lose reactions in alternative environments. We performed these simulations on the pan-reactomes of 46 bacterial and archaeal families covering a broad taxonomic range. These simulations allowed us to disentangle metabolic reactions whose presence does, and does not depend on the metabolite composition of the external environment, allowing us to identify reactions constrained "by nutrition" and "by nature", respectively. By comparing the frequency of reactions from the first group with their observed frequencies in bacterial and archaeal families, we predicted the metabolic niches that shaped the genomic composition of these lineages in their evolutionary past. Moreover, we found that the lineages that were shaped by a more diverse metabolic niche also occur in more diverse biomes as assessed by global environmental sequencing datasets. Together, we introduce a computational framework for analyzing and interpreting pan-reactomes that provides new insights into the ecological and evolutionary drivers of pan-genome composition.
Support the authors with ResearchCoin
Support the authors with ResearchCoin