C1: Publications
Adaptive Evolution of Gene Expression in Drosophila
Armita Nourmohammad, Joachim Rambeau, Torsten Held, Viera Kovacova, Johannes Berg, Michael Lässig, Cell Reports, Volume 20, Issue 6, 1385 - 1395, 8. August 2017, http://dx.doi.org/10.1016/j.celrep.2017.07.033
Gene expression levels are important quantitative traits that link genotypes to molecular functions and fitness. In Drosophila, population-genetic studies have revealed substantial adaptive evolution at the genomic level, but the evolutionary modes of gene expression remain controversial. Here, we present evidence that adaptation dominates the evolution of gene expression levels in flies. We show that 64% of the observed expression divergence across seven Drosophila species are adaptive changes driven by directional selection. Our results are derived from time-resolved data of gene expression divergence across a family of related species, using a probabilistic inference method for gene-specific selection. Adaptive gene expression is stronger in specific functional classes, including regulation, sensory perception, sexual behavior, and morphology. Moreover, we identify a large group of genes with sex-specific adaptation of expression, which predominantly occurs in males. Our analysis opens an avenue to map system-wide selection on molecular quantitative traits independently of their genetic basis.
Inverse statistical problems: from the inverse Ising problem to data science
Nguyen H.C., Zecchina R., Berg J., Advances in Physics, 66:197-261, 29. June 2017, https://doi.org/10.1080/00018732.2017.1341604
Inverse problems in statistical physics are motivated by the challenges of ‘big data’ in different fields, in particular high-throughput experiments in biology. In inverse problems, the usual procedure of statistical physics needs to be reversed: Instead of calculating observables on the basis of model parameters, we seek to infer parameters of a model based on observations. In this review, we focus on the inverse Ising problem and closely related problems, namely how to infer the coupling strengths between spins given observed spin correlations, magnetizations, or other data. We review applications of the inverse Ising problem, including the reconstruction of neural connections, protein structure determination, and the inference of gene regulatory networks. For the inverse Ising problem in equilibrium, a number of controlled and uncontrolled approximate solutions have been developed in the statistical mechanics community. A particularly strong method, pseudolikelihood, stems from statistics. We also review the inverse Ising problem in the non-equilibrium case, where the model parameters must be reconstructed based on non-equilibrium statistics.
Multiple-Line Inference of Selection on Quantitative Traits
N. Riedel, B. S. Khatri, M. Lässig, and J. Berg; Genetics September 2015 201:305-322; Early online July 2, 2015
Dynamic BMP signaling polarized by Toll patterns the dorsoventral axis in a hemimetabolous insect
L. Sachs, Y.-T. Chen, A. Drechsler, J. A. Lynch, K. A. Panfilio, M. Lässig, J. Berg, and S. Roth, eLife 4:e05502 (2015)
Toll-dependent patterning of the dorsoventral axis in Drosophila represents one of the best understood gene regulatory networks. However, its evolutionary origin has remained elusive. Outside the insects Toll is not known for a patterning function, but rather for a role in pathogen defense. Here, we show that in the milkweed bug Oncopeltus fasciatus, whose lineage split from Drosophila's more than 350 million years ago, Toll is only required to polarize a dynamic BMP signaling network. A theoretical model reveals that this network has self-regulatory properties and that shallow Toll signaling gradients are sufficient to initiate axis formation. Such gradients can account for the experimentally observed twinning of insect embryos upon egg fragmentation and might have evolved from a state of uniform Toll activity associated with protecting insect eggs against pathogens.
A Genomics-Based Classification of Human Lung Tumors
Seidel D et al., Science Translational Medicine, 5 (209) p. 209ra153 (2013)
We characterized genome alterations in 1255 clinically annotated lung tumors of all histological subgroups to identify genetically defined and clinically relevant subtypes. More than 55% of all cases had at least one oncogenic genome alteration potentially amenable to specific therapeutic intervention, including several personalized treatment approaches that are already in clinical evaluation. Marked differences in the pattern of genomic alterations existed between and within histological subtypes, thus challenging the original histomorphological diagnosis. Immunohistochemical studies confirmed many of these reassigned subtypes. The reassignment eliminated almost all cases of large cell carcinomas, some of which had therapeutically relevant alterations. Prospective testing of our genomics-based diagnostic algorithm in 5145 lung cancer patients enabled a genome-based diagnosis in 3863 (75%) patients, confirmed the feasibility of rational reassignments of large cell lung cancer, and led to improvement in overall survival in patients with EGFR-mutant or ALK-rearranged cancers. Thus, our findings provide support for broad implementation of genome-based diagnosis of lung cancer.
How epigenetic mutations can affect genetic evolution: Model and mechanism
Klironomos F, Berg J, Collins S, Bioessays, 35 (6), 571-578 (2013)
We hypothesize that heritable epigenetic changes can affect rates of fitness increase as well as patterns of genotypic and phenotypic change during adaptation. In particular, we suggest that when natural selection acts on pure epigenetic variation in addition to genetic variation, populations adapt faster, and adaptive phenotypes can arise before any genetic changes. This may make it difficult to reconcile the timing of adaptive events detected using conventional population genetics tools based on DNA sequence data with environmental drivers of adaptation, such as changes in climate. Epigenetic modifications are frequently associated with somatic cell differentiation, but recently epigenetic changes have been found that can be transmitted over many generations. Here, we show how the interplay of these heritable epigenetic changes with genetic changes can affect adaptive evolution, and how epigenetic changes affect the signature of selection in the genetic record.
Evolution of molecular phenotypes under stabilizing selection
Nourmohammad A, Schiffels S, Lässig M, Journal Of Statistical Mechanics, P01012 (34 pages) (2013)
Molecular phenotypes are important links between genomic information and organismic functions, fitness, and evolution. Complex phenotypes, which are also called quantitative traits, often depend on multiple genomic loci. Their evolution builds on genome evolution in a complicated way, which involves selection, genetic drift, mutations and recombination. Here we develop a coarse-grained evolutionary statistics for phenotypes, which decouples from details of the underlying genotypes. We derive approximate evolution equations for the distribution of phenotype values within and across populations. This dynamics covers evolutionary processes at high and low recombination rates, that is, it applies to sexual and asexual populations. In a fitness landscape with a single optimal phenotype value, the phenotypic diversity within populations and the divergence between populations reach evolutionary equilibria, which describe stabilizing selection. We compute the equilibrium distributions of both quantities analytically and we show that the ratio of mean divergence and diversity depends on the strength of selection in a universal way: it is largely independent of the phenotype's genomic encoding and of the recombination rate. This establishes a new method for the inference of selection on molecular phenotypes beyond the genome level. We discuss the implications of our findings for the predictability of evolutionary processes.
GraphAlignment: Bayesian pairwise alignment of biological networks
Kolář M, Meier J, Mustonen V, Lässig M, Berg J, BMC Systems Biology, 6:144 doi:10.1186/1752-0509-6-144 (2012)
Background
With increased experimental availability and accuracy of bio-molecular networks, tools for their comparative and evolutionary analysis are needed. A key component for such studies is the alignment of networks.
Results
We introduce the Bioconductor package GraphAlignment for pairwise alignment of bio-molecular networks. The alignment incorporates information both from network vertices and network edges and is based on an explicit evolutionary model, allowing inference of all scoring parameters directly from empirical data. We compare the performance of our algorithm to an alternative algorithm, Græmlin 2.0.
On simulated data, GraphAlignment outperforms Græmlin 2.0 in several benchmarks except for computational complexity. When there is little or no noise in the data, GraphAlignment is slower than Græmlin 2.0. It is faster than Græmlin 2.0 when processing noisy data containing spurious vertex associations. Its typical case complexity grows approximately as 𝒪(N2.6).
On empirical bacterial protein-protein interaction networks (PIN) and gene co-expression networks, GraphAlignment outperforms Græmlin 2.0 with respect to coverage and specificity, albeit by a small margin. On large eukaryotic PIN, Græmlin 2.0 outperforms GraphAlignment.
Conclusions
The GraphAlignment algorithm is robust to spurious vertex associations, correctly resolves paralogs, and shows very good performance in identification of homologous vertices defined by high vertex and/or interaction similarity. The simplicity and generality of GraphAlignment edge scoring makes the algorithm an appropriate choice for global alignment of networks.
Mean-field theory for the inverse Ising problem at low temperatures
Nguyen HC, Berg J, Phys. Rev. Lett. 109, 050602 (2012)
The large amounts of data from molecular biology and neuroscience have lead to a renewed interest in the inverse Ising problem: how to reconstruct parameters of the Ising model (couplings between spins and external fields) from a number of spin configurations sampled from the Boltzmann measure. To invert the relationship between model parameters and observables (magnetizations and correlations), mean-field approximations are often used, allowing the determination of model parameters from data. However, all known mean-field methods fail at low temperatures with the emergence of multiple thermodynamic states. Here, we show how clustering spin configurations can approximate these thermodynamic states and how mean-field methods applied to thermodynamic states allow an efficient reconstruction of Ising models also at low temperatures.
Bethe-Peierls approximation and the inverse Ising problem
Chau Nguyen H, Berg J, Journal Of Statistical Mechanics 03, 004 (2012)
We apply the Bethe-Peierls approximation to the inverse Ising problem and show how the linear response relation leads to a simple method for reconstructing couplings and fields of the Ising model. This reconstruction is exact on tree graphs, yet its computational expense is comparable to those of other mean-field methods. We compare the performance of this method to the independent-pair, naive mean-field, and Thouless-Anderson-Palmer approximations, the Sessak-Monasson expansion, and susceptibility propagation on the Cayley tree, SK model and random graph with fixed connectivity. At low temperatures, Bethe reconstruction outperforms all of these methods, while at high temperatures it is comparable to the best method available so far (the Sessak-Monasson method). The relationship between Bethe reconstruction and other mean-field methods is discussed.
Nonlinear fitness landscape of a molecular pathway.
Perfeito L, Ghozzi S, Berg J, Schnetz K, Lässig M, PLoS Genetics 7(7): e1002160 (2011)
Genes are regulated because their expression involves a fitness cost to the organism. The production of proteins by transcription and translation is a well-known cost factor, but the enzymatic activity of the proteins produced can also reduce fitness, depending on the internal state and the environment of the cell. Here, we map the fitness costs of a key metabolic network, the lactose utilization pathway in
Escherichia coli. We measure the growth of several regulatory lac operon mutants in different environments inducing expression of the lac genes. We find a strikingly nonlinear fitness landscape, which depends on the production rate and on the activity rate of the lac proteins. A simple fitness model of the lac pathway, based on elementary biophysical processes, predicts the growth rate of all observed strains. The nonlinearity of fitness is explained by a feedback loop: production and activity of the lac proteins reduce growth, but growth also affects the density of these molecules. This nonlinearity has important consequences for molecular function and evolution. It generates a cliff in the fitness landscape, beyond which populations cannot maintain growth. In viable populations, there is an expression barrier of the lac genes, which cannot be exceeded in any stationary growth process. Furthermore, the nonlinearity determines how the fitness of operon mutants depends on the inducer environment. We argue that fitness nonlinearities, expression barriers, and gene–environment interactions are generic features of fitness landscapes for metabolic pathways, and we discuss their implications for the evolution of regulation.
Significance analysis and statistical mechanics: an application to clustering.
Łuksza M, Lässig M, Berg J, Physical Review Letters 105, 220601 (2010)
This Letter addresses the statistical significance of structures in random data: given a set of vectors and a measure of mutual similarity, how likely is it that a subset of these vectors forms a cluster with enhanced similarity among its elements? The computation of this cluster p value for randomly distributed vectors is mapped onto a well-defined problem of statistical mechanics. We solve this problem analytically, establishing a connection between the physics of quenched disorder and multiple-testing statistics in clustering and related problems. In an application to gene expression data, we find a remarkable link between the statistical significance of a cluster and the functional relationships between its genes.
Bayesian analysis of biological networks: clusters, motifs, cross-species correlations
Berg J, Lässig M, in Statistical And Evolutionary Analysis Of Biological Network Data, M. Stumpf and C. Wiuf (Eds.), Imperial College Press (2010)
An important part of the analysis of bio-molecular networks is to detect different functional units. Different functions are reflected in a different evolutionary dynamics, and hence in different statistical characteristics of network parts. In this sense, the {\em global statistics} of a biological network, e.g., its connectivity distribution, provides a background, and {\em local deviations} from this background signal functional units. In the computational analysis of biological networks, we thus typically have to discriminate between different statistical models governing different parts of the dataset. The nature of these models depends on the biological question asked. We illustrate this rationale here with three examples: identification of functional parts as highly connected extit{network clusters}, finding extit{network motifs}, which occur in a similar form at different places in the network, and the analysis of extit{cross-species network correlations}, which reflect evolutionary dynamics between species.
Adaptive gene regulatory networks
Stauffer F, Berg J, EPL 88, 48004 (2009)
Regulatory interactions between genes show a large amount of cross-species variability, even when the underlying functions are conserved: there are many ways to achieve the same function. Here we investigate the ability of regulatory networks to reproduce given expression levels within a simple model of gene regulation. We find an exponentially large space of regulatory networks compatible with a given set of expression levels, giving rise to an extensive entropy of networks. Typical realisations of regulatory networks are found to share a bias towards symmetric interactions, in line with empirical evidence.
From protein interactions to functional annotation: graph alignment in Herpes.
Kolár M, Lässig M, Berg J, BMC Systems Biology 2, 90 (2008)
BACKGROUND:Sequence alignment is a prolific basis of functional annotation, but remains a challenging problem in the 'twilight zone' of high sequence divergence or short gene length. Here we demonstrate how information on gene interactions can help to resolve ambiguous sequence alignments. We compare two distant Herpes viruses by constructing a graph alignment, which is based jointly on the similarity of their protein interaction networks and on sequence similarity. This hybrid method provides functional associations between proteins of the two organisms that cannot be obtained from sequence or interaction data alone.RESULTS:We find proteins where interaction similarity and sequence similarity are individually weak, but together provide significant evidence of orthology. There are also proteins with high interaction similarity but without any detectable sequence similarity, providing evidence of functional association beyond sequence homology. The functional predictions derived from our alignment are consistent with genomic position and gene expression data.CONCLUSION:Our approach shows that evolutionary conservation is a powerful filter to make protein interaction data informative about functional similarities between the interacting proteins, and it establishes graph alignment as a powerful tool for the comparative analysis of data from highly diverged species.
Cross-species analysis of biological networks by Bayesian alignment.
Berg J, Lässig M, Proceedings Of The National Academy Of Sciences Of The United States Of America 103, 10967-10972 (2006)
Complex interactions between genes or proteins contribute a substantial part to phenotypic evolution. Here we develop an evolutionarily grounded method for the cross-species analysis of interaction networks by alignment, which maps bona fide functional relationships between genes in different organisms. Network alignment is based on a scoring function measuring mutual similarities between networks, taking into account their interaction patterns as well as sequence similarities between their nodes. High-scoring alignments and optimal alignment parameters are inferred by a systematic Bayesian analysis. We apply this method to analyze the evolution of coexpression networks between humans and mice. We find evidence for significant conservation of gene expression clusters and give network-based predictions of gene function. We discuss examples where cross-species functional relationships between genes do not concur with sequence similarity.
back to project area C