C7: Publications
Erroneous energy-generating cycles in published genome scale metabolic networks: Identification and removal
Fritzemeier C.J., Hartleb D., Szappanos B., Papp B., Lercher M.J., 18. Apr. 2017, PLOS Computational Biology 13: e1005494, https://doi.org/10.1371/journal.pcbi.1005494
Energy metabolism is central to cellular biology. Thus, genome-scale models of heterotrophic unicellular species must account appropriately for the utilization of external nutrients to synthesize energy metabolites such as ATP. However, metabolic models designed for flux-balance analysis (FBA) may contain thermodynamically impossible energy-generating cycles: without nutrient consumption, these models are still capable of charging energy metabolites (such as ADP→ATP or NADP+→NADPH). Here, we show that energy-generating cycles occur in over 85% of metabolic models without extensive manual curation, such as those contained in the ModelSEED and MetaNetX databases; in contrast, such cycles are rare in the manually curated models of the BiGG database. Energy generating cycles may represent model errors, e.g., erroneous assumptions on reaction reversibilities. Alternatively, part of the cycle may be thermodynamically feasible in one environment, while the remainder is thermodynamically feasible in another environment; as standard FBA does not account for thermodynamics, combining these into an FBA model allows erroneous energy generation. The presence of energy-generating cycles typically inflates maximal biomass production rates by 25%, and may lead to biases in evolutionary simulations. We present efficient computational methods (i) to identify energy generating cycles, using FBA, and (ii) to identify minimal sets of model changes that eliminate them, using a variant of the GlobalFit algorithm.
Supra-operonic clusters of functionally related genes (SOCs) are a source of horizontal gene co-transfers
Pang T.Y., Lercher M.J., 9. Jan. 2017, Scientific Reports 7:40294, DOI: 10.1038/srep40294
Adaptation of bacteria occurs predominantly via horizontal gene transfer (HGT). While it is widely recognized that horizontal acquisitions frequently encompass multiple genes, it is unclear what the size distribution of successfully transferred DNA segments looks like and what evolutionary forces shape this distribution. Here, we identified 1790 gene family pairs that were consistently co-gained on the same branches across a phylogeny of 53 E. coli strains. We estimated a lower limit of their genomic distances at the time they were transferred to their host genomes; this distribution shows a sharp upper bound at 30 kb. The same gene-pairs can have larger distances (up to 70 kb) in other genomes. These more distant pairs likely represent recent acquisitions via transduction that involve the co-transfer of excised prophage genes, as they are almost always associated with intervening phage-associated genes. The observed distribution of genomic distances of co-transferred genes is much broader than expected from a model based on the co-transfer of genes within operons; instead, this distribution is highly consistent with the size distribution of supra-operonic clusters (SOCs), groups of co-occurring and co-functioning genes that extend beyond operons. Thus, we propose that SOCs form a basic unit of horizontal gene transfer.
Adaptive evolution of complex innovations through stepwise metabolic niche expansion
Balázs Szappanos, Jonathan Fritzemeier, Bálint Csörgő, Viktória Lázár, Xiaowen Lu, Gergely Fekete, Balázs Bálint, Róbert Herczeg, István Nagy, Richard A. Notebaart, Martin J. Lercher, Csaba Pál, Balázs Papp, Nature Communications (2016) DOI: 10.1038/ncomms11607
A central challenge in evolutionary biology concerns the mechanisms by which complex metabolic innovations requiring multiple mutations arise. Here, we propose that metabolic innovations accessible through the addition of a single reaction serve as stepping stones towards the later establishment of complex metabolic features in another environment. We demonstrate the feasibility of this hypothesis through three complementary analyses. First, using genome-scale metabolic modelling, we show that complex metabolic innovations in Escherichia coli can arise via changing nutrient conditions. Second, using phylogenetic approaches, we demonstrate that the acquisition patterns of complex metabolic pathways during the evolutionary history of bacterial genomes support the hypothesis. Third, we show how adaptation of laboratory populations of E. coli to one carbon source facilitates the later adaptation to another carbon source. Our work demonstrates how complex innovations can evolve through series of adaptive steps without the need to invoke non-adaptive processes.
Energy efficiency trade-offs drive nucleotide usage in transcribed regions
Wei-Hua Chen, Guanting Lu, Peer Bork, Songnian Hu, Martin J. Lercher, Nature Communications (2016) DOI: 10.1038/ncomms11334
Efficient nutrient usage is a trait under universal selection. A substantial part of cellular resources is spent on making nucleotides. We thus expect preferential use of cheaper nucleotides especially in transcribed sequences, which are often amplified thousand-fold compared with genomic sequences. To test this hypothesis, we derive a mutation-selection-drift equilibrium model for nucleotide skews (strand-specific usage of ‘A’ versus ‘T’ and ‘G’ versus ‘C’), which explains nucleotide skews across 1,550 prokaryotic genomes as a consequence of selection on efficient resource usage. Transcription-related selection generally favours the cheaper nucleotides ‘U’ and ‘C’ at synonymous sites. However, the information encoded in mRNA is further amplified through translation. Due to unexpected trade-offs in the codon table, cheaper nucleotides encode on average energetically more expensive amino acids. These trade-offs apply to both strand-specific nucleotide usage and GC content, causing a universal bias towards the more expensive nucleotides ‘A’ and ‘G’ at non-synonymous coding sites.
Horizontally transferred genes cluster spatially and metabolically
Dilthey A., Lercher M.J., 21. Dec. 2015, Biology Direct 10:72, https://doi.org/10.1186/s13062-015-0102-5
Background
Genomic uptake of DNA by prokaryotes often encompasses more than a single gene. In many cases, several horizontally transferred genes may be acquired together. Accordingly, we expect that horizontally transferred genes cluster spatially in the genome more often than expected if transfers were independent. Further, genes that depend on each other functionally may be unlikely to have beneficial fitness effects when taken up individually by a foreign genome. Hence, we also expect the co-acquisition of functionally related genes, resulting in the clustering of horizontally transferred genes in functional networks.
Results
Analysing spatial and metabolic clustering of recent horizontal (or lateral) gene transfers among 21 γ-proteobacteria, we confirm both predictions. When comparing two datasets of predicted transfers that differ in their expected false-positive rate, we find that the more stringent dataset shows a stronger enrichment of clustered pairs.
Conclusions
The enrichment of interdependent metabolic genes among predicted transfers supports a biologically significant role of horizontally transferred genes in metabolic adaptation. Our results further suggest that spatial and metabolic clustering may be used as a benchmark for methods that predict recent horizontal gene transfers.
Reviewers
This article was reviewed by Peter Gogarten in collaboration with Luiz Thiberio Rangel, and by Yuri Wolf.
CycleFreeFlux: efficient removal of thermodynamically infeasible loops from flux distributions
Amer Desouki A., Jarre F., Gelius-Dietrich G., Lercher M.J., 1. July 2015, Bioinformatics 31:2159-65, https://doi.org/10.1093/bioinformatics/btv096
Motivation: Constraint-based metabolic modeling methods such as Flux Balance Analysis (FBA) are routinely used to predict metabolic phenotypes, e.g. growth rates, ATP yield or the fitness of gene knockouts. One frequent difficulty of constraint-based solutions is the inclusion of thermodynamically infeasible loops (or internal cycles), which add nonbiological fluxes to the predictions.Results: We propose a simple postprocessing of constraint-based solutions, which removes internal cycles from any given flux distribution v(0) without disturbing other fluxes not involved in the loops. This new algorithm, termed CycleFreeFlux, works by minimizing the sum of absolute fluxes ||v||1 while (i) conserving the exchange fluxes and (ii) using the fluxes of the original solution to bound the new flux distribution. This strategy reduces internal fluxes until at least one reaction of every possible internal cycle is inactive, a necessary and sufficient condition for the thermodynamic feasibility of a flux distribution. If alternative representations of the input flux distribution in terms of elementary flux modes exist that differ in their inclusion of internal cycles, then CycleFreeFlux is biased towards solutions that maintain the direction given by v(0) and towards solutions with lower total flux ||v||1. Our method requires only one additional linear optimization, making it computationally very efficient compared to alternative strategies.Availability and implementation: We provide freely available R implementations for the enumeration of thermodynamically infeasible cycles as well as for cycle-free FBA solutions, flux variability calculations and random sampling of solution spaces.
Plant and animal glycolate oxidases have a common eukaryotic ancestor and convergently duplicated to evolve long-chain 2-hydroxy acid oxidases.
Esser C, Kuhn A, Groth G, Lercher MJ, Maurino VG. Mol Biol Evol. 2014 May;31(5):1089-101. doi: 10.1093/molbev/msu041. Epub 2014 Jan 9. PMID: 24408912
Glycolate oxidase (GOX) is a crucial enzyme of plant photorespiration. The encoding gene is thought to have originated from endosymbiotic gene transfer between the eukaryotic host and the cyanobacterial endosymbiont at the base of plantae. However, animals also possess GOX activities. Plant and animal GOX belong to the gene family of (L)-2-hydroxyacid-oxidases ((L)-2-HAOX). We find that all (L)-2-HAOX proteins in animals and archaeplastida go back to one ancestral eukaryotic sequence; the sole exceptions are green algae of the chlorophyta lineage. Chlorophyta replaced the ancestral eukaryotic (L)-2-HAOX with a bacterial ortholog, a lactate oxidase that may have been obtained through the primary endosymbiosis at the base of plantae; independent losses of this gene may explain its absence in other algal lineages (glaucophyta, rhodophyta, and charophyta). We also show that in addition to GOX, plants possess (L)-2-HAOX proteins with different specificities for medium- and long-chain hydroxyacids (lHAOX), likely involved in fatty acid and protein catabolism. Vertebrates possess lHAOX proteins acting on similar substrates as plant lHAOX; however, the existence of GOX and lHAOX subfamilies in both plants and animals is not due to shared ancestry but is the result of convergent evolution in the two most complex eukaryotic lineages. On the basis of targeting sequences and predicted substrate specificities, we conclude that the biological role of plantae (L)-2-HAOX in photorespiration evolved by co-opting an existing peroxisomal protein.
PopGenome: an efficient Swiss army knife for population genomic analyses in R
Pfeifer B, Wittelsbürger U, Ramos-Onsins SE, Lercher MJ, Mol Biol Evol, 31(7):1929-36 (2014)
Although many computer programs can perform population genetics calculations, they are typically limited in the analyses and data input formats they offer; few applications can process the large data sets produced by whole-genome resequencing projects. Furthermore, there is no coherent framework for the easy integration of new statistics into existing pipelines, hindering the development and application of new population genetics and genomics approaches. Here, we present PopGenome, a population genomics package for the R software environment (a de facto standard for statistical analyses). PopGenome can efficiently process genome-scale data as well as large sets of individual loci. It reads DNA alignments and single-nucleotide polymorphism (SNP) data sets in most common formats, including those used by the HapMap, 1000 human genomes, and 1001 Arabidopsis genomes projects. PopGenome also reads associated annotation files in GFF format, enabling users to easily define regions or classify SNPs based on their annotation; all analyses can also be applied to sliding windows. PopGenome offers a wide range of diverse population genetics analyses, including neutrality tests as well as statistics for population differentiation, linkage disequilibrium, and recombination. PopGenome is linked to Hudson's MS and Ewing's MSMS programs to assess statistical significance based on coalescent simulations. PopGenome's integration in R facilitates effortless and reproducible downstream analyses as well as the production of publication-quality graphics. Developers can easily incorporate new analyses methods into the PopGenome framework. PopGenome and R are freely available from CRAN (http://cran.r-project.org/) for all major operating systems under the GNU General Public License.
The role of photorespiration during the evolution of C4 photosynthesis in the genus Flaveria
Mallmann J, Heckmann D, Bräutigam A, Lercher MJ, Weber AP, Westhoff P, Gowik U, Elife, doi:10.7554/eLife.02478 (2014)
C4 photosynthesis represents a most remarkable case of convergent evolution of a complex trait, which includes the reprogramming of the expression patterns of thousands of genes. Anatomical, physiological, and phylogenetic and analyses as well as computational modeling indicate that the establishment of a photorespiratory carbon pump (termed C2 photosynthesis) is a prerequisite for the evolution of C4. However, a mechanistic model explaining the tight connection between the evolution of C4 and C2 photosynthesis is currently lacking. Here we address this question through comparative transcriptomic and biochemical analyses of closely related C3, C3–C4, and C4 species, combined with Flux Balance Analysis constrained through a mechanistic model of carbon fixation. We show that C2 photosynthesis creates a misbalance in nitrogen metabolism between bundle sheath and mesophyll cells. Rebalancing nitrogen metabolism requires anaplerotic reactions that resemble at least parts of a basic C4 cycle. Our findings thus show how C2 photosynthesis represents a pre-adaptation for the C4 system, where the evolution of the C2 system establishes important C4 components as a side effect.
Horizontal gene acquisitions by eukaryotes as drivers of adaptive evolution
Schönknecht G, Weber AP, Lercher MJ, Bioessays, 36(1):9-20 (2014)
In contrast to vertical gene transfer from parent to offspring, horizontal (or lateral) gene transfer moves genetic information between different species. Bacteria and archaea often adapt through horizontal gene transfer. Recent analyses indicate that eukaryotic genomes, too, have acquired numerous genes via horizontal transfer from prokaryotes and other lineages. Based on this we raise the hypothesis that horizontally acquired genes may have contributed more to adaptive evolution of eukaryotes than previously assumed. Current candidate sets of horizontally acquired eukaryotic genes may just be the tip of an iceberg. We have recently shown that adaptation of the thermoacidophilic red alga Galdieria sulphuraria to its hot, acid, toxic-metal laden, volcanic environment was facilitated by the acquisition of numerous genes from extremophile bacteria and archaea. Other recently published examples of horizontal acquisitions involved in adaptation include ice-binding proteins in marine algae, enzymes for carotenoid biosynthesis in aphids, and genes involved in fungal metabolism. Editor's suggested further reading in BioEssays Jumping the fine LINE between species: Horizontal transfer of transposable elements in animals catalyses genome evolution Abstract.
Sybil--efficient constraint-based modelling in R
Gelius-Dietrich G, Desouki AA, Fritzemeier CJ, Lercher MJ, BMC Syst Biol, 7:125 (2013)
BACKGROUND: Constraint-based analyses of metabolic networks are widely used to simulate the properties of genome-scale metabolic networks. Publicly available implementations tend to be slow, impeding large scale analyses such as the genome-wide computation of pairwise gene knock-outs, or the automated search for model improvements. Furthermore, available implementations cannot easily be extended or adapted by users.
RESULTS:Here, we present sybil, an open source software library for constraint-based analyses in R; R is a free, platform-independent environment for statistical computing and graphics that is widely used in bioinformatics. Among other functions, sybil currently provides efficient methods for flux-balance analysis (FBA), MOMA, and ROOM that are about ten times faster than previous implementations when calculating the effect of whole-genome single gene deletions in silico on a complete E. coli metabolic model.
CONCLUSIONS:Due to the object-oriented architecture of sybil, users can easily build analysis pipelines in R or even implement their own constraint-based algorithms. Based on its highly efficient communication with different mathematical optimisation programs, sybil facilitates the exploration of high-dimensional optimisation problems on small time scales. Sybil and all its dependencies are open source. Sybil and its documentation are available for download from the comprehensive R archive network (CRAN).
Predicting C4 Photosynthesis Evolution: Modular, Individually Adaptive Steps on a Mount Fuji Fitness Landscape
Heckmann D, Schulze S, Denton A, Gowik U, Westhoff P, Weber APM, Lercher MJ, Cell 153, No. 6 (2013)
An ultimate goal of evolutionary biology is the prediction and experimental verification of adaptive trajectories on macroevolutionary timescales. This aim has rarely been achieved for complex biological systems, as models usually lack clear correlates of organismal fitness. Here, we simulate the fitness landscape connecting two carbon fixation systems: C3 photosynthesis, used by most plant species, and the C4 system, which is more efficient at ambient CO2 levels and elevated temperatures and which repeatedly evolved from C3. Despite extensive sign epistasis, C4 photosynthesis is evolutionarily accessible through individually adaptive steps from any intermediate state. Simulations show that biochemical subtraits evolve in modules; the order and constitution of modules confirm and extend previous hypotheses based on species comparisons. Plant-species-designated C3-C4 intermediates lie on predicted evolutionary trajectories, indicating that they indeed represent transitory states. Contrary to expectations, we find no slowdown of adaptation and no diminishing fitness gains along evolutionary trajectories.
Gene Transfer from Bacteria and Archaea Facilitated Evolution of an Extremophilic Eukaryote
Schönknecht G, Chen WH, Ternes CM, Barbier GG, Shrestha RP, Stanke M, Bräutigam A, Baker BJ, Banfield JF, Garavito RM, Carr K, Wilkerson C, Rensing SA, Gagneul D, Dickenson NE, Oesterhelt C, Lercher MJ, Weber APM, Science, Vol. 339 no. 6124 pp. 1207-1210 (2013)
Some microbial eukaryotes, such as the extremophilic red alga
Galdieria sulphuraria, live in hot, toxic metal-rich, acidic environments. To elucidate the underlying molecular mechanisms of adaptation, we sequenced the 13.7-megabase genome of
G. sulphuraria. This alga shows an enormous metabolic flexibility, growing either photoautotrophically or heterotrophically on more than 50 carbon sources. Environmental adaptation seems to have been facilitated by horizontal gene transfer from various bacteria and archaea, often followed by gene family expansion. At least 5% of protein-coding genes of
G. sulphuraria were probably acquired horizontally. These proteins are involved in ecologically important processes ranging from heavy-metal detoxification to glycerol uptake and metabolism. Thus, our findings show that a pan-domain gene pool has facilitated environmental adaptation in this unicellular eukaryote.
Horizontal gene transfers as metagenomic gene duplications.
Grassi L, Caselle M, Lercher MJ, Lagomarsino MC, Molecular BioSystems 8, 790-795 (2012)
While it is well accepted that horizontal gene transfer plays an important role in the evolution and the diversification of prokaryotic genomes, many questions remain open regarding its functional mechanisms of action and its interplay with the extant genome. This study addresses the relationship between proteome innovation by horizontal gene transfer and genome content in Proteobacteria. We characterize the transferred genes, focusing on the protein domain compositions and their relationships with the existing protein domain superfamilies in the genome. In agreement with previous observations, we find that the protein domain architectures of horizontally transferred genes are significantly shorter than the genomic average. Furthermore, protein domains that are more common in the total pool of genomes appear to have a proportionally higher chance to be transferred. This suggests that transfer events behave as if they were drawn randomly from a cross-genomic community gene pool, much like gene duplicates are drawn from a genomic gene pool. Finally, horizontally transferred genes carry domains of exogenous families less frequently for larger genomes, although they might do it more than expected by chance.
OGEE: an online gene essentiality database.
Chen W, Minguez P, Lercher MJ, Bork P, Nucleic Acids Research 40, D901-6 (2012)
OGEE is an Online GEne Essentiality database. Its main purpose is to enhance our understanding of the essentiality of genes. This is achieved by collecting not only experimentally tested essential and non-essential genes, but also associated gene features such as expression profiles, duplication status, conservation across species, evolutionary origins and involvement in embryonic development. We focus on large-scale experiments and complement our data with text-mining results. Genes are organized into data sets according to their sources. Genes with variable essentiality status across data sets are tagged as conditionally essential, highlighting the complex interplay between gene functions and environments. Linked tools allow the user to compare gene essentiality among different gene groups, or compare features of essential genes to non-essential genes, and visualize the results. OGEE is freely available at http://ogeedb.embl.de.
An integrated approach to characterize genetic interaction networks in yeast metabolism.
Szappanos B, Kovacs K, Szamecz B, Honti F, Costanzo M, Baryshnikova A, Gelius-Dietrich G, Lercher MJ, Jelasity M, Myers CL, Andrews BJ et al., Nature Genetics 43, 656-662 (2011)
Although experimental and theoretical efforts have been applied to globally map genetic interactions, we still do not understand how gene-gene interactions arise from the operation of biomolecular networks. To bridge the gap between empirical and computational studies, we i, quantitatively measured genetic interactions between approximately 185,000 metabolic gene pairs in Saccharomyces cerevisiae, ii, superposed the data on a detailed systems biology model of metabolism and iii, introduced a machine-learning method to reconcile empirical interaction data with model predictions. We systematically investigated the relative impacts of functional modularity and metabolic flux coupling on the distribution of negative and positive genetic interactions. We also provide a mechanistic explanation for the link between the degree of genetic interaction, pleiotropy and gene dispensability. Last, we show the feasibility of automated metabolic model refinement by correcting misannotations in NAD biosynthesis and confirming them by in vivo experiments.
A gene's ability to buffer variation is predicted by its fitness contribution and genetic interactions.
Wang G, Liu J, Wang W, Zhang H, Lercher MJ, PloS One 6, e17650 (2011)
BACKGROUND: Many single-gene knockouts result in increased phenotypic (e.g., morphological) variability among the mutant's offspring. This has been interpreted as an intrinsic ability of genes to buffer genetic and environmental variation. A phenotypic capacitor is a gene that appears to mask phenotypic variation: when knocked out, the offspring shows more variability than the wild type. Theory predicts that this phenotypic potential should be correlated with a gene's knockout fitness and its number of negative genetic interactions. Based on experimentally measured phenotypic capacity, it was suggested that knockout fitness was unimportant, but that phenotypic capacitors tend to be hubs in genetic and physical interaction networks. METHODOLOGY/PRINCIPAL FINDINGS: We re-analyse the available experimental data in a combined model, which includes knockout fitness and network parameters as well as expression level and protein length as predictors of phenotypic potential. Contrary to previous conclusions, we find that the strongest predictor is in fact haploid knockout fitness (responsible for 9% of the variation in phenotypic potential), with an additional contribution from the genetic interaction network (5%); once these two factors are taken into account, protein-protein interactions do not make any additional contribution to the variation in phenotypic potential. CONCLUSIONS/SIGNIFICANCE: We conclude that phenotypic potential is not a mysterious "emergent" property of cellular networks. Instead, it is very simply determined by the overall fitness reduction of the organism (which in its compromised state can no longer compensate for multiple factors that contribute to phenotypic variation), and by the number (and presumably nature) of genetic interactions of the knocked-out gene. In this light, Hsp90, the prototypical phenotypic capacitor, may not be representative: typical phenotypic capacitors are not direct "buffers" of variation, but are simply genes encoding central cellular functions.
The effects of network neighbours on protein evolution.
Wang G, Lercher MJ, PloS One 6, e18288 (2011)
Interacting proteins may often experience similar selection pressures. Thus, we may expect that neighbouring proteins in biological interaction networks evolve at similar rates. This has been previously shown for protein-protein interaction networks. Similarly, we find correlated rates of evolution of neighbours in networks based on co-expression, metabolism, and synthetic lethal genetic interactions. While the correlations are statistically significant, their magnitude is small, with network effects explaining only between 2% and 7% of the variation. The strongest known predictor of the rate of protein evolution remains expression level. We confirmed the previous observation that similar expression levels of neighbours indeed explain their similar evolution rates in protein-protein networks, and showed that the same is true for metabolic networks. In co-expression and synthetic lethal genetic interaction networks, however, neighbouring genes still show somewhat similar evolutionary rates even after simultaneously controlling for expression level, gene essentiality and gene length. Thus, similar expression levels and related functions (as inferred from co-expression and synthetic lethal interactions) seem to explain correlated evolutionary rates of network neighbours across all currently available types of biological networks.
Assessing the influence of adjacent gene orientation on the evolution of gene upstream regions in Arabidopsis thaliana.
He F, Chen W, Collins S, Acquisti C, Goebel U, Ramos-Onsins S, Lercher MJ, de Meaux J, Genetics 185, 695-701 (2010)
The orientation of flanking genes may influence the evolution of intergenic regions in which cis-regulatory elements are likely to be located: divergently transcribed genes share their 5' regions, resulting either in smaller "private" spaces or in overlapping regulatory elements. Thus, upstream sequences of divergently transcribed genes (bi-directional upstream regions, or URs) may be more constrained than those of uni-directional gene pairs. We investigated this effect by analyzing nucleotide variation segregating within and between Arabidopsis species. Compared to uni-directional URs, bi-directional URs indeed display lower population mutation rate, as well as more low-frequency polymorphisms. Furthermore, we find that bi-directional regions undergo selection for the maintenance of intergenic distance. Altogether, however, we observe considerable variation in evolutionary rates, with putative signatures of selection on two uni-directional upstream regions.
Co-expression of neighbouring genes in Arabidopsis: separating chromatin effects from direct interactions.
Chen W, de Meaux J, Lercher MJ, BMC Genomics 11, 178 (2010)
BACKGROUND:In all eukaryotic species examined, genes that are chromosomal neighbours are more similar in their expression than random gene pairs. Currently, it is still unclear how much of this local co-expression is caused by direct transcriptional interactions, and how much is due to shared chromatin environments.RESULTS:We analysed neighbouring genes in Arabidopsis thaliana. At large intergenic distances (>400 bp), divergently and convergently transcribed gene pairs show very similar levels of co-expression, mediated most likely by shared chromatin environments. At gene distances below 400 bp, co-expression is strongly enhanced only for divergently transcribed gene pairs, indicating bi-directional transcription from a single promoter. Conversely, co-expression is suppressed for short convergently or uni-directionally transcribed pairs. This suppression points to transcriptional interference concentrated at the 3' end, e.g., in the context of transcription termination.CONCLUSIONS:Classifying linked gene pairs by their orientation, we are able to partially tease apart the different levels of regional expression modulation. (i) Regional chromatin characteristics modulate the accessibility for regulation and transcription, regardless of gene orientation; the strength of this chromatin effect can be assessed from divergently or convergently transcribed distant neighbours. (ii) Shared promoter regions up to 400 bp in length enhance the co-expression of close bi-directional neighbours. (iii) Transcriptional interference of close neighbours is concentrated at the 3' ends of genes, and reduces co-expression on average by 40%.
Integration of horizontally transferred genes into regulatory interaction networks takes many million years.
Lercher MJ, Pal C, Molecular Biology And Evolution 25, 559-567 (2008)
Adaptation of bacteria to new or changing environments is often associated with the uptake of foreign genes through horizontal gene transfer. However, it has remained unclear how (and how fast) new genes are integrated into their host's cellular networks. Combining the regulatory and protein interaction networks of Escherichia coli with comparative genomics tools, we provide the first systematic analysis of this issue. Genes transferred recently have fewer interaction partners compared to nontransferred genes in both regulatory and protein interaction networks. Thus, horizontally transferred genes involved in complex regulatory and protein-protein interactions are rarely favored by selection. Only few protein-protein interactions are gained after the initial integration of genes following the transfer event. In contrast, transferred genes are gradually integrated into the regulatory network of their host over evolutionary time. During adaptation to the host cellular environment, horizontally transferred genes recruit existing transcription factors of the host, reflected in the fast evolutionary rates of the cis-regulatory regions of transferred genes. Further, genes resulting from increasingly ancient transfer events show increasing numbers of transcriptional regulators as well as improved coregulation with interacting proteins. Fine-tuned integration of horizontally transferred genes into the regulatory network spans more than 8-22 million years and encompasses accelerated evolution of regulatory regions, stabilization of protein-protein interactions, and changes in codon usage.
An integrated view of protein evolution.
Pal C, Papp B, Lercher MJ, Nature Reviews. Genetics 7, 337-348 (2006)
Why do proteins evolve at different rates? Advances in systems biology and genomics have facilitated a move from studying individual proteins to characterizing global cellular factors. Systematic surveys indicate that protein evolution is not determined exclusively by selection on protein structure and function, but is also affected by the genomic position of the encoding genes, their expression patterns, their position in biological networks and possibly their robustness to mistranslation. Recent work has allowed insights into the relative importance of these factors. We discuss the status of a much-needed coherent view that integrates studies on protein evolution with biochemistry and functional and structural genomics.
Chance and necessity in the evolution of minimal metabolic networks.
Pal C, Papp B, Lercher MJ, Csermely P, Oliver SG, Hurst LD, Nature 440, 667-670 (2006)
It is possible to infer aspects of an organism's lifestyle from its gene content. Can the reverse also be done? Here we consider this issue by modelling evolution of the reduced genomes of endosymbiotic bacteria. The diversity of gene content in these bacteria may reflect both variation in selective forces and contingency-dependent loss of alternative pathways. Using an in silico representation of the metabolic network of Escherichia coli, we examine the role of contingency by repeatedly simulating the successive loss of genes while controlling for the environment. The minimal networks that result are variable in both gene content and number. Partially different metabolisms can thus evolve owing to contingency alone. The simulation outcomes do preserve a core metabolism, however, which is over-represented in strict intracellular bacteria. Moreover, differences between minimal networks based on lifestyle are predictable: by simulating their respective environmental conditions, we can model evolution of the gene content in Buchnera aphidicola and Wigglesworthia glossinidia with over 80% accuracy. We conclude that, at least for the particular cases considered here, gene content of an organism can be predicted with knowledge of its distant ancestors and its current lifestyle.
Adaptive evolution of bacterial metabolic networks by horizontal gene transfer.
Pal C, Papp B, Lercher MJ, Nature Genetics 37, 1372-1375 (2005)
Numerous studies have considered the emergence of metabolic pathways, but the modes of recent evolution of metabolic networks are poorly understood. Here, we integrate comparative genomics with flux balance analysis to examine (i) the contribution of different genetic mechanisms to network growth in bacteria, (ii) the selective forces driving network evolution and (iii) the integration of new nodes into the network. Most changes to the metabolic network of Escherichia coli in the past 100 million years are due to horizontal gene transfer, with little contribution from gene duplicates. Networks grow by acquiring genes involved in the transport and catalysis of external nutrients, driven by adaptations to changing environments. Accordingly, horizontally transferred genes are integrated at the periphery of the network, whereas central parts remain evolutionarily stable. Genes encoding physiologically coupled reactions are often transferred together, frequently in operons. Thus, bacterial metabolic networks evolve by direct uptake of peripheral reactions in response to changed environments.
back to project area C