C4: Publications
The neutral frequency spectrum of linked sites
Luca Ferretti, Alexander Klassmann, Emanuele Raineri, Sebastián E. Ramos-Onsins, Thomas Wiehe, Guillaume Achaz, Theoretical Population Biology, 28. June 2018, https://doi.org/10.1016/j.tpb.2018.06.001
We introduce the conditional Site Frequency Spectrum (SFS) for a genomic region linked to a focal mutation of known frequency. An exact expression for its expected value is provided for the neutral model without recombination. Its relation with the expected SFS for two sites, 2-SFS, is discussed. These spectra derive from the coalescent approach of Fu (1995) for finite samples, which is reviewed. Remarkably simple expressions are obtained for the linked SFS of a large population, which are also solutions of the multi-allelic Kolmogorov equations. These formulae are the immediate extensions of the well known single site
neutral SFS. Besides the general interest in these spectra, they relate to relevant biological cases, such as structural variants and introgressions. As an application, a recipe to adapt Tajima’s
and other SFS-based neutrality tests to a non-recombining region containing a neutral marker is presented.
Detecting Recent Positive Selection with a Single Locus Test Bipartitioning the Coalescent Tree
Yang Z., Li J., Wiehe T., Li H., 1. Feb. 2018, Genetics 208:791, https://doi.org/10.1534/genetics.117.300401
Many population genomic studies have been conducted in the past to search for traces of recent events of positive selection. These traces, however, can be obscured by temporal variation of population size or other demographic factors. To reduce the confounding impact of demography, the coalescent tree topology has been used as an additional source of information for detecting recent positive selection in a population or a species. Based on the branching pattern at the root, we partition the hypothetical coalescent tree, inferred from a sequence sample, into two subtrees. The reasoning is that positive selection could impose a strong impact on branch length in one of the two subtrees while demography has the same effect on average on both subtrees. Thus, positive selection should be detectable by comparing statistics calculated for the two subtrees. Simulations demonstrate that the proposed test based on these principles has high power to detect recent positive selection even when DNA polymorphism data from only one locus is available, and that it is robust to the confounding effect of demography. One feature is that all components in the summary statistics (Embedded Image) can be computed analytically. Moreover, misinference of derived and ancestral alleles is seen to have only a limited effect on the test, and it therefore avoids a notorious problem when searching for traces of recent positive selection.
The Diverging Routes of BORIS and CTCF: An Interactomic and Phylogenomic Analysis
Jabbari K., Heger P., Sharma R., Wiehe T., 30. Jan. 2018, Life 8:4, https://doi.org/10.3390/life8010004
The CCCTC-binding factor (CTCF) is multi-functional, ubiquitously expressed, and highly conserved from Drosophila to human. It has important roles in transcriptional insulation and the formation of a high-dimensional chromatin structure. CTCF has a paralog called “Brother of Regulator of Imprinted Sites” (BORIS) or “CTCF-like” (CTCFL). It binds DNA at sites similar to those of CTCF. However, the expression profiles of the two proteins are quite different. We investigated the evolutionary trajectories of the two proteins after the duplication event using a phylogenomic and interactomic approach. We find that CTCF has 52 direct interaction partners while CTCFL only has 19. Almost all interactors already existed before the emergence of CTCF and CTCFL. The unique secondary loss of CTCF from several nematodes is paralleled by a loss of two of its interactors, the polycomb repressive complex subunit SuZ12 and the multifunctional transcription factor TYY1. In contrast to earlier studies reporting the absence of BORIS from birds, we present evidence for a multigene synteny block containing CTCFL that is conserved in mammals, reptiles, and several species of birds, indicating that not the entire lineage of birds experienced a loss of CTCFL. Within this synteny block, BORIS and its genomic neighbors seem to be partitioned into two nested chromatin loops. The high expression of SPO11, RAE1, RBM38, and PMEPA1 in male tissues suggests a possible link between CTCFL, meiotic recombination, and fertility-associated phenotypes. Using the 65,700 exomes and the 1000 genomes data, we observed a higher number of intergenic, non-synonymous, and loss-of-function mutations in CTCFL than in CTCF, suggesting a reduced strength of purifying selection, perhaps due to less functional constraint.
The Protistan Microbiome of Grassland Soil: Diversity in the Mesoscale
Paul Christiaan Venter, Frank Nitsche, Anne Domonell, Peter Heger, Hartmut Arndt, In Protist, Volume 168, Issue 5, 2017, Pages 546-564, ISSN 1434-4610, November 2017, https://doi.org/10.1016/j.protis.2017.03.005.
Genomic data for less than one quarter of ∼1.8 million named species on earth exist in public databases like GenBank. Little information exists on the estimated one million small sized (1–100 μm) heterotrophic nanoflagellates and ciliates and their taxa-area relationship. We analyzed environmental DNA from 150 geo-referenced grassland plots representing topographical and land-use ranges typical for Central Europe. High through-put barcoding allowed the identification of operational taxonomic units (OTUs) at species level, with high pairwise identity to reference sequences (≥99.7%), but also the identification of sequences at the genus (≥97%) and class (≥80%) taxonomic level. Species richness analyses revealed, on average, 100 genus level OTUs (332 unique individual read (UIR) and 56 class level OTUs per gram of soil sample in the mesoscale (1–1 000 km). Database shortfalls were highlighted by increased uncertain taxonomic lineages at lower resolution (≥80% sequence identity). No single barcode occurred ubiquitously across all sites. Taxa-area relationships indicated that OTUs spread over the entire mesoscale were more similar than in the local scale and increased land-use (fertilization, mowing and grazing) promoted taxa-area separation. Only a small fraction of sequences strictly matched reference library sequences, suggesting a large protistan “dark matter” in soil which warrants further research.
Decomposing the Site Frequency Spectrum: The Impact of Tree Topology on Neutrality Tests
Ferretti L., Ledda A., Wiehe T., Achaz G., Ramos-Onsins S.E., 1. Sept. 2017, Genetics 207: 229, https://doi.org/10.1534/genetics.116.188763
We investigate the dependence of the site frequency spectrum on the topological structure of genealogical trees. We show that basic population genetic statistics, for instance, estimators of θ or neutrality tests such as Tajima’s D, can be decomposed into components of waiting times between coalescent events and of tree topology. Our results clarify the relative impact of the two components on these statistics. We provide a rigorous interpretation of positive or negative values of an important class of neutrality tests in terms of the underlying tree shape. In particular, we show that values of Tajima’s D and Fay and Wu’s H depend in a direct way on a peculiar measure of tree balance, which is mostly determined by the root balance of the tree. We present a new test for selection in the same class as Fay and Wu’s H and discuss its interpretation and power. Finally, we determine the trees corresponding to extreme expected values of these neutrality tests and present formulas for these extreme values as a function of sample size and number of segregating sites.
Transcriptomic data from panarthropods shed new light on the evolution of insulator binding proteins in insects
Pauli T., Vedder L., Dowling D., Petersen M., Meusemann K., Donath A., Peters R.S., Podsiadlowski L., Mayer C., Liu S., Zhou X., Heger P., Wiehe T., Hering L., Mayer G., Misof B., Niehuis O., 3. Nov. 2016, BMC Genomics 17:861, https://doi.org/10.1186/s12864-016-3205-1
Background
Body plan development in multi-cellular organisms is largely determined by homeotic genes. Expression of homeotic genes, in turn, is partially regulated by insulator binding proteins (IBPs). While only a few enhancer blocking IBPs have been identified in vertebrates, the common fruit fly Drosophila melanogaster harbors at least twelve different enhancer blocking IBPs. We screened recently compiled insect transcriptomes from the 1KITE project and genomic and transcriptomic data from public databases, aiming to trace the origin of IBPs in insects and other arthropods.
Results
Our study shows that the last common ancestor of insects (Hexapoda) already possessed a substantial number of IBPs. Specifically, of the known twelve insect IBPs, at least three (i.e., CP190, Su(Hw), and CTCF) already existed prior to the evolution of insects. Furthermore we found GAF orthologs in early branching insect orders, including Zygentoma (silverfish and firebrats) and Diplura (two-pronged bristletails). Mod(mdg4) is most likely a derived feature of Neoptera, while Pita is likely an evolutionary novelty of holometabolous insects. Zw5 appears to be restricted to schizophoran flies, whereas BEAF-32, ZIPIC and the Elba complex, are probably unique to the genus Drosophila. Selection models indicate that insect IBPs evolved under neutral or purifying selection.
Conclusions
Our results suggest that a substantial number of IBPs either pre-date the evolution of insects or evolved early during insect evolution. This suggests an evolutionary history of insulator binding proteins in insects different to that previously thought. Moreover, our study demonstrates the versatility of the 1KITE transcriptomic data for comparative analyses in insects and other arthropods.
Structure and evolutionary history of a large family of NLR proteins in the zebrafish
Kerstin Howe, Philipp H Schiffer, Julia Zielinski, Thomas Wiehe, Gavin K Laird, John Marioni, Onuralp Soylemez, Fyodor Kondrashov, Maria Leptin, Open Biology (2015) doi: http://dx.doi.org/10.1101/022061
NACHT- and Leucine-Rich-Repeat-containing domain (NLR) proteins act as cytoplasmic sensors for pathogen- and danger-associated molecular patterns and are found throughout the plant and animal kingdoms. In addition to having a small set of conserved NLRs, the genomes in some animal lineages contain massive expansions of this gene family. One of these arose in fishes, after the creation of a gene fusion that combined the core NLR domains with another domain used for immune recognition, the PRY/SPRY or B30.2 domain. We have analysed the expanded NLR gene family in zebrafish, which contains 368 genes, and studied its evolutionary history. The encoded proteins share a defining overall structure, but individual domains show different evolutionary trajectories. Our results suggest gene conversion homogenizes NACHT and B30.2 domain sequences among different gene subfamilies, however, the functional implications of its action remains unclear. The majority of the genes are located on the long arm of chromosome 4, interspersed with several other large multi-gene families, including a new family encoding proteins with multiple tandem arrays of Zinc fingers. This suggests that chromosome 4 may be a hotspot for rapid evolutionary change in zebrafish.
On the sub-permutations of pattern avoiding permutations
Disanto F, Wiehe T, Discrete mathematics, Volume 337, 28 December 2014, Pages 127–141 (2014)
There is a deep connection between permutations and trees. Certain sub-structures of permutations, called sub-permutations, bijectively map to sub-trees of binary increasing trees. This opens a powerful tool set to study enumerative and probabilistic properties of sub-permutations and to investigate the relationships between 'local' and 'global' features using the concept of pattern avoidance. First, given a pattern {\mu}, we study how the avoidance of {\mu} in a permutation {\pi} affects the presence of other patterns in the sub-permutations of {\pi}. More precisely, considering patterns of length 3, we solve instances of the following problem: given a class of permutations K and a pattern {\mu}, we ask for the number of permutations π∈Avn(μ) whose sub-permutations in K satisfy certain additional constraints on their size. Second, we study the probability for a generic pattern to be contained in a random permutation {\pi} of size n without being present in the sub-permutations of {\pi} generated by the entry 1≤k≤n. These theoretical results can be useful to define efficient randomized pattern-search procedures based on classical algorithms of pattern-recognition, while the general problem of pattern-search is NP-complete.
Demography-adjusted tests of neutrality based on genome-wide SNP data
Rafajlović M, Klassmann A, Eriksson A, Wiehe T, Mehlig B, Theoretical Population Biology, Vol. 95, pp. 1–12 (2014)
Tests of the neutral evolution hypothesis are usually built on the standard null model which assumes that mutations are neutral and the population size remains constant over time. However, it is unclear how such tests are affected if the last assumption is dropped. Here, we extend the unifying framework for tests based on the site frequency spectrum, introduced by Achaz and Ferretti, to populations of varying size. Key ingredients are the first two moments of the site frequency spectrum. We show how these moments can be computed analytically if a population has experienced two instantaneous size changes in the past. We apply our method to data from ten human populations gathered in the 1000 genomes project, estimate their demographies and define demography-adjusted versions of Tajima’s DD, Fay & Wu’s HH, and Zeng’s EE. Our results show that demography-adjusted test statistics facilitate the direct comparison between populations and that most of the differences among populations seen in the original unadjusted tests can be explained by their underlying demographies. Upon carrying out whole-genome screens for deviations from neutrality, we identify candidate regions of recent positive selection. We provide track files with values of the adjusted and unadjusted tests for upload to the UCSC genome browser.
New tools in the box: An evolutionary synopsis of chromatin insulators
Heger P, Wiehe T, Trends In Genetics, Vol. 30, Issue 5, p161–171 (2014)
Despite progress in understanding genome organization and gene expression during the last decade, the evolutionary pathways that led to the intricate patterns of gene expression in different cells of an organism are still poorly understood. Important steps in this regulation take place at the level of chromatin, where the (epi)genomic environment of a gene determines its expression in time and space. Although the basic mechanisms of gene expression apply to all eukaryotes, multicellular organisms face the additional challenge of coordinating gene expression during development. In this review we summarize and put into evolutionary context current knowledge about chromatin insulators, an important class of regulatory factors mediating these tasks. Our interpretation of historical and recent findings points to a dynamic and ongoing evolution of insulator proteins characterized by multiple instances of convergent evolution, gene loss, and binding site changes in different organisms. The idea of two autonomously evolving insulator functions (as a barrier element and an enhancer blocker) further suggests that the evolution of metazoans and their enhancer-rich gene regulatory repertoire might be connected to the radiation of enhancer blocking insulators. Although speculative at the moment, such coevolution might create tools for complex gene regulation and therefore influence the evolutionary roadmaps of metazoans.
Yule-generated trees constrained by node imbalance
Disanto F, Schlizio A, Wiehe T, Mathematical Biosciences (online access) August 13 (2013)
The
Yule process generates a class of binary trees which is fundamental to population genetic models and other applications in evolutionary biology. In this paper, we introduce a family of sub-classes of ranked trees, called Ω-trees, which are characterized by imbalance of internal nodes. The degree of imbalance is defined by an integer 0⩽ω. For caterpillars, the extreme case of unbalanced trees, ω=0. Under models of neutral evolution, for instance the Yule model, trees with small ω are unlikely to occur by chance. Indeed, imbalance can be a signature of permanent selection pressure, such as observable in the genealogies of certain pathogens. From a mathematical point of view it is interesting to observe that the space of Ω-trees maintains several statistical invariants although it is drastically reduced in size compared to the space of unconstrained Yule trees. Using generating functions, we study here some basic combinatorial properties of Ω-trees. We focus on the distribution of the number of subtrees with two leaves. We show that expectation and variance of this distribution match those for unconstrained trees already for very small values of ω.
Successive gain of insulator proteins in arthropod evolution
Heger P, George R, Wiehe T, Evolution (online access) June 4 (2013)
Alteration of regulatory DNA elements or their binding proteins may have drastic consequences for morphological evolution. Chromatin insulators are one example of such proteins and play a fundamental role in organizing gene expression. While a single insulator protein, CTCF (CCCTC-binding factor), is known in vertebrates,
Drosophila melanogaster utilizes six additional factors. We studied the evolution of these proteins and show here that—in contrast to the bilaterian-wide distribution of CTCF—all other
D. melanogaster insulators are restricted to arthropods. The full set is present exclusively in the genus
Drosophila whereas only two insulators, Su(Hw) and CTCF, existed at the base of the arthropod clade and all additional factors have been acquired successively at later stages. Secondary loss of factors in some lineages further led to the presence of different insulator subsets in arthropods. Thus, the evolution of insulator proteins within arthropods is an ongoing and dynamic process that reshapes and supplements the ancient CTCF-based system common to bilaterians. Expansion of insulator systems may therefore be a general strategy to increase an organism's gene regulatory repertoire and its potential for morphological plasticity.
Coalescent tree imbalance and a simple test for selective sweeps based on microsatellite variation
Li H, Wiehe T, Plos Computational Biology, 9(5): e1003060 (2013)
It is one of the major interests in population genetics to contrast the properties and consequences of neutral and non-neutral modes of evolution. As is well-known, positive Darwinian selection and genetic hitchhiking drastically change the profile of genetic diversity compared to neutral expectations. The present-day observable genetic diversity in a sample of DNA sequences depends on events in their evolutionary history, and in particular on the shape of the underlying genealogical tree. In this paper we study how the shape of coalescent trees is affected by the presence of positively selected mutations. We define a measure of tree topology and study its properties under scenarios of neutrality and positive selection. We show that this measure can reliably be estimated from experimental data, and define an easy-to-compute statistical test of the neutral evolution hypothesis. We apply this test to data from a population of the malaria parasite
Plasmodium falciparum and confirm the signature of recent positive selection in the vicinity of a drug resistance locus.
Exact enumeration of cherries and pitchforks in ranked trees under the coalescent model.
Disanto F, Wiehe T, Mathematical Biosciences 242, 195-200 (2013)
We consider exact enumerations and probabilistic properties of ranked trees when generated under the random coalescent process. Using a new approach, based on generating functions, we derive several statistics such as the exact probability of finding k cherries in a ranked tree of fixed size n. We then extend our method to consider also the number of pitchforks. We find a recursive formula to calculate the joint and conditional probabilities of cherries and pitchforks when the size of the tree is fixed. These results provide insights into structural properties of coalescent trees under the model of neutral evolution.
The effect of single recombination events on coalescent tree height and shape
Ferretti L, Disanto F, Wiehe T, PLoS One, 8(4):e60123 (2013)
The coalescent with recombination is a fundamental model to describe the genealogical history of DNA sequence samples from recombining organisms. Considering recombination as a process which acts along genomes and which creates sequence segments with shared ancestry, we study the influence of single recombination events upon tree characteristics of the coalescent. We focus on properties such as tree height and tree balance and quantify analytically the changes in these quantities incurred by recombination in terms of probability distributions. We find that changes in tree topology are often relatively mild under conditions of neutral evolution, while changes in tree height are on average quite large. Our results add to a quantitative understanding of the spatial coalescent and provide the neutral reference to which the impact by other evolutionary scenarios, for instance tree distortion by selective sweeps, can be compared.
The chromatin insulator CTCF and the emergence of metazoan diversity.
Heger P, Marin B, Bartkuhn M, Schierenberg E, Wiehe T, Proceedings Of The National Academy Of Sciences Of The United States Of America 109, 17507-17512 (2012)
The great majority of metazoans belong to bilaterian phyla. They diversified during a short interval in Earth's history known as the Cambrian explosion, ~540 million years ago. However, the genetic basis of these events is poorly understood. Here we argue that the vertebrate genome organizer CTCF (CCCTC-binding factor) played an important role for the evolution of bilaterian animals. We provide evidence that the CTCF protein and a genome-wide abundance of CTCF-specific binding motifs are unique to bilaterian phyla, but absent in other eukaryotes. We demonstrate that CTCF-binding sites within vertebrate and Drosophila Hox gene clusters have been maintained for several hundred million years, suggesting an ancient origin of the previously known interaction between Hox gene regulation and CTCF. In addition, a close correlation between the presence of CTCF and Hox gene clusters throughout the animal kingdom suggests conservation of the Hox-CTCF link across the Bilateria. On the basis of these findings, we propose the existence of a Hox-CTCF kernel as principal organizer of bilaterian body plans. Such a kernel could explain (i) the formation of Hox clusters in Bilateria, (ii) the diversity of bilaterian body plans, and (iii) the uniqueness and time of onset of the Cambrian explosion.
Estimating mutation distances from unaligned genomes.
Haubold B, Pfaffelhuber P, Domazet-Loso M, Wiehe T, Journal Of Computational Biology 16, 1487-1500 (2009)
Abstract Alignment-free distance measures are generally less accurate but more efficient than traditional alignment-based metrics. In the context of genome sequence analysis, the efficiency gain is often so substantial that it outweights the loss in accuracy. However, a further disadvantage of alignment-free distances is that their relationship to evolutionary events such as substitutions is generally unknown. We have therefore derived an estimator of the number of substitutions per site between two unaligned DNA sequences, K(r). Simulations show that this estimator works well with "ideal" data. We compare K(r) to two alternative alignment-free distances: a k-tuple distance and a measure of relative entropy based on average common substring length. All three measures are applied to 27 primate mitochondrial genomes, eight whole genomes of Streptococcus agalactiae strains, and 12 whole genomes of Drosophila species. In each case, the cluster diagrams based on K(r) are equivalent to or significantly better than those based on the two alternative measures. This is due to the fact that in contrast to the alternative measures K(r) is derived from an explicit model of evolution. The computation of K(r) is efficiently implemented in the program kr, which can be downloaded freely from the internet.
Simulation of DNA sequence evolution under models of recent directional selection.
Kim Y, Wiehe T, Briefings In Bioinformatics 10, 84-96 (2009)
Computer simulation is an essential tool in the analysis of DNA sequence variation for mapping events of recent adaptive evolution in the genome. Various simulation methods are employed to predict the signature of selection in sequence variation. The most informative and efficient method currently in use is coalescent simulation. However, this method is limited to simple models of directional selection. Whole-population forward-in-time simulations are the alternative to coalescent simulations for more complex models. The notorious problem of excessive computational cost in forward-in-time simulations can be overcome by various simplifying amendments. Overall, the success of simulations depends on the creative application of some population genetic theory to the simulation algorithm.
Properties of Sequence Conservation in Upstream Regulatory and Protein Coding Sequences among Paralogs in Arabidopsis thaliana
Richardson DN, Wiehe T, Comparative Genomics 5817, 217 (2009)
Whole genome duplication (WGD) has catalyzed the formation of new species, genes with novel functions, altered expression patterns, complexified signaling pathways and has provided organisms a level of genetic robustness. We studied the long-term evolution and interrelationships of 5’ upstream regulatory sequences (URSs), protein coding sequences (CDSs) and expression correlations (EC) of duplicated gene pairs in Arabidopsis. Three distinct methods revealed significant evolutionary conservation between paralogous URSs and were highly correlated with microarray-based expression correlation of the respective gene pairs. Positional information on exact matches between sequences unveiled the contribution of micro-chromosomal rearrangements on expression divergence. A three-way rank analysis of URS similarity, CDS divergence and EC uncovered specific gene functional biases. Transcription factor activity was associated with gene pairs exhibiting conserved URSs and divergent CDSs, whereas a broad array of metabolic enzymes was found to be associated with gene pairs showing diverged URSs but conserved CDSs.
Identification of selective sweeps in closely related populations of the house mouse based on microsatellite scans.
Teschke M, Mukabayire O, Wiehe T, Tautz D, Genetics 180, 1537-1545 (2008)
Genome scans of polymorphisms promise to provide insights into the patterns and frequencies of positive selection under natural conditions. The use of microsatellites as markers has the potential to focus on very recent events, since in contrast to SNPs, their high mutation rates should remove signatures of older events. We assess this concept here in a large-scale study. We have analyzed two population pairs of the house mouse, one pair of the subspecies Mus musculus domesticus and the other of M. m. musculus. A total of 915 microsatellite loci chosen to cover the whole genome were assessed in a prescreening procedure, followed by individual typing of candidate loci. Schlötterer's ratio statistics (lnRH) were applied to detect loci with significant deviations from patterns of neutral expectation. For eight loci from each population pair we have determined the size of the potential sweep window and applied a second statistical procedure (linked locus statistics). For the two population pairs, we find five and four significant sweep loci, respectively, with an average estimated window size of 120 kb. On the basis of the analysis of individual allele frequencies, it is possible to identify the most recent sweep, for which we estimate an onset of 400-600 years ago. Given the known population history for the French-German population pair, we infer that the average frequency of selective sweeps in these populations is higher than 1 in 100 generations across the whole genome. We discuss the implications for adaptation processes in natural populations.
Second-order moments of segregating sites under variable population size.
Zivković D, Wiehe T, Genetics 180, 341-357 (2008)
The identification of genomic regions that have been exposed to positive selection is a major challenge in population genetics. Since selective sweeps are expected to occur during environmental changes or when populations are colonizing a new habitat, statistical tests constructed on the assumption of constant population size are biased by the co-occurrence of population size changes and selection. To delimit this problem and gain better insights into demographic factors, theoretical results regarding the second-order moments of segregating sites, such as the variance of segregating sites, have been derived. Driven by emerging genomewide surveys, which allow the estimation of demographic parameters, a generalized version of Tajima's D has been derived that takes into account a previously estimated demographic scenario to test single loci for traces of selection against the null hypothesis of neutral evolution under variable population size.
A pooling approach to detect signatures of selective sweeps in genome scans using microsatellites
Thomas M, Möller F, Wiehe T, Tautz D, Molecular Ecology Notes 7, 400-403 (2007)
We have evaluated a pooling approach that can reduce the number of polymerase chain reactions in a screen for selective sweeps by more than an order of magnitude. We show that the complex peak pattern that results from pooling of all samples from a given population is a faithful reflection of the composite pattern of the individual alleles, although with an under-representation of the larger alleles. Candidate loci for selective sweeps can be identified by visual inspection of the pool patterns. We have also implemented a software tool, which can find suitable microsatellite loci in the vicinity of annotated genes.
Identification of selective sweeps using a dynamically adjusted number of linked microsatellites.
Wiehe T, Nolte V, Zivković D, Schlötterer C, Genetics 175, 207-218 (2007)
There is currently large interest in distinguishing the signatures of genetic variation produced by demographic events from those produced by natural selection. We propose a simple multilocus statistical test to identify candidate sites of selective sweeps with high power. The test is based on the variability profile measured in an array of linked microsatellites. We also show that the analysis of flanking markers drastically reduces the number of false positives among the candidates that are identified in a genomewide survey of unlinked loci and find that this property is maintained in many population-bottleneck scenarios. However, for a certain range of intermediately severe population bottlenecks we find genomic signatures that are very similar to those produced by a selective sweep. While in these worst-case scenarios the power of the proposed test remains high, the false-positive rate reaches values close to 50%. Hence, selective sweeps may be hard to identify even if multiple linked loci are analyzed. Nevertheless, the integration of information from multiple linked loci always leads to a considerable reduction of the false-positive rate compared to a genome scan of unlinked loci. We discuss the application of this test to experimental data from Drosophila melanogaster.
back to project area C