C9: Publications
Accurate Promoter and Enhancer Identification in 127 ENCODE and Roadmap Epigenomics Cell Types and Tissues by GenoSTAN
Zacher B., Michel M., Schwalb B., Cramer P., Tresch A., Gagneur J., 5. Jan. 2017, PLoS One 12(1), doi: 10.1371/journal.pone.0169249
Accurate maps of promoters and enhancers are required for understanding transcriptional regulation. Promoters and enhancers are usually mapped by integration of chromatin assays charting histone modifications, DNA accessibility, and transcription factor binding. However, current algorithms are limited by unrealistic data distribution assumptions. Here we propose GenoSTAN (Genomic STate ANnotation), a hidden Markov model overcoming these limitations. We map promoters and enhancers for 127 cell types and tissues from the ENCODE and Roadmap Epigenomics projects, today’s largest compendium of chromatin assays. Extensive benchmarks demonstrate that GenoSTAN generally identifies promoters and enhancers with significantly higher accuracy than previous methods. Moreover, GenoSTAN-derived promoters and enhancers showed significantly higher enrichment of complex trait-associated genetic variants than current annotations. Altogether, GenoSTAN provides an easy-to-use tool to define promoters and enhancers in any system, and our annotation of human transcriptional cis-regulatory elements constitutes a rich resource for future research in biology and medicine.
Simultaneous characterization of sense and antisense genomic processes by the double-stranded hidden Markov model
Julia Glas, Sebastian Dümcke, Benedikt Zacher, Don Poron3, Julien Gagneur and Achim Tresch, Nucl. Acids Res. (2015) doi: 10.1093/nar/gkv1184
Hidden Markov models (HMMs) have been extensively used to dissect the genome into functionally distinct regions using data such as RNA expression or DNA binding measurements. It is a challenge to disentangle processes occurring on complementary strands of the same genomic region. We present the double-stranded HMM (dsHMM), a model for the strand-specific analysis of genomic processes. We applied dsHMM to yeast using strand specific transcription data, nucleosome data, and protein binding data for a set of 11 factors associated with the regulation of transcription.The resulting annotation recovers the mRNA transcription cycle (initiation, elongation, termination) while correctly predicting strand-specificity and directionality of the transcription process. We find that pre-initiation complex formation is an essentially undirected process, giving rise to a large number of bidirectional promoters and to pervasive antisense transcription. Notably, 12% of all transcriptionally active positions showed simultaneous activity on both strands. Furthermore, dsHMM reveals that antisense transcription is specifically suppressed by Nrd1, a yeast termination factor.
Annotation of genomics data using bidirectional hidden Markov models unveils variations in Pol II transcription cycle
Zacher B, Lidschreiber M, Cramer P, Gagneur J, Tresch A, DOI 10.15252/msb.20145654 | Published online 19.12.2014 Molecular Systems Biology(2014)10:768
DNA replication, transcription and repair involve the recruitment of protein complexes that change their composition as they progress along the genome in a directed or strand‐specific manner. Chromatin immunoprecipitation in conjunction with hidden Markov models (HMMs) has been instrumental in understanding these processes, as they segment the genome into discrete states that can be related to DNA‐associated protein complexes. However, current HMM‐based approaches are not able to assign forward or reverse direction to states or properly integrate strand‐specific (e.g., RNA expression) with non‐strand‐specific (e.g., ChIP) data, which is indispensable to accurately characterize directed processes. To overcome these limitations, we introduce bidirectional HMMs which infer directed genomic states from occupancy profiles de novo. Application to RNA polymerase II‐associated factors in yeast and chromatin modifications in human T cells recovers the majority of transcribed loci, reveals gene‐specific variations in the yeast transcription cycle and indicates the existence of directed chromatin state patterns at transcribed, but not at repressed, regions in the human genome. In yeast, we identify 32 new transcribed loci, a regulated initiation–elongation transition, the absence of elongation factors Ctk1 and Paf1 from a class of genes, a distinct transcription mechanism for highly expressed genes and novel DNA sequence motifs associated with transcription termination. We anticipate bidirectional HMMs to significantly improve the analyses of genome‐associated directed processes.
Genome-wide quantitative analysis of DNA methylation from bisulfite sequencing data
Akman K, Haaf T, Gravina S, Vijg J, Tresch A, Bioinformatics, 30(13) (2014)
Although many computer programs can perform population genetics calculations, they are typically limited in the analyses and data input formats they offer; few applications can process the large data sets produced by whole-genome resequencing projects. Furthermore, there is no coherent framework for the easy integration of new statistics into existing pipelines, hindering the development and application of new population genetics and genomics approaches. Here, we present PopGenome, a population genomics package for the R software environment (a de facto standard for statistical analyses). PopGenome can efficiently process genome-scale data as well as large sets of individual loci. It reads DNA alignments and single-nucleotide polymorphism (SNP) data sets in most common formats, including those used by the HapMap, 1000 human genomes, and 1001 Arabidopsis genomes projects. PopGenome also reads associated annotation files in GFF format, enabling users to easily define regions or classify SNPs based on their annotation; all analyses can also be applied to sliding windows. PopGenome offers a wide range of diverse population genetics analyses, including neutrality tests as well as statistics for population differentiation, linkage disequilibrium, and recombination. PopGenome is linked to Hudson's MS and Ewing's MSMS programs to assess statistical significance based on coalescent simulations. PopGenome's integration in R facilitates effortless and reproducible downstream analyses as well as the production of publication-quality graphics. Developers can easily incorporate new analyses methods into the PopGenome framework. PopGenome and R are freely available from CRAN (http://cran.r-project.org/) for all major operating systems under the GNU General Public License.
Periodic mRNA synthesis and degradation co‐operate during cell cycle gene expression
Eser P, Demel C, Maier KC, Schwalb B, Pirkl N, Martin DE, Cramer P, Tresch A, Mol Syst Biol. 10:717 (2014)
During the cell cycle, the levels of hundreds of mRNAs change in a periodic manner, but how this is achieved by alterations in the rates of mRNA synthesis and degradation has not been studied systematically. Here, we used metabolic RNA labeling and comparative dynamic transcriptome analysis (cDTA) to derive mRNA synthesis and degradation rates every 5 min during three cell cycle periods of the yeast
Saccharomyces cerevisiae. A novel statistical model identified 479 genes that show periodic changes in mRNA synthesis and generally also periodic changes in their mRNA degradation rates. Peaks of mRNA degradation generally follow peaks of mRNA synthesis, resulting in sharp and high peaks of mRNA levels at defined times during the cell cycle. Whereas the timing of mRNA synthesis is set by upstream DNA motifs and their associated transcription factors (TFs), the synthesis rate of a periodically expressed gene is apparently set by its core promoter.
back to project area C