Understanding transcription regulation

Background image Transcription — the copying of genomic DNA into RNA — is one of the most fascinating processes in biology, particularly in animals in which different cell types transcribe different genes to acquire different morphologies and functions. This differential gene transcription is a prerequisite of multi-cellularity, the main driver of animal development, and underlies most of the differences we observe between closely related species; failures in this tightly regulated process are causal to many diseases including cancer.

To us, the discrepancy between the seeming simplicity of transcriptional regulation and the apparent difficulty of achieving mechanistic insights to a satisfying degree is particularly fascinating: enhancer sequences, typically drive gene expression patterns reliably in both their endogenous genomic contexts and in simple reporter systems. However, despite this defined sequence-to-function relationship, the exact sequence requirements for enhancer activity remain unknown. Similarly, how the combinations of transcription factor and cofactor proteins that are recruited to enhancers control enhancer function and mediate the transcriptional activation of target core-promoters has remained elusive.

Our aim is to understand how transcription is regulated at the level of the two key types of regulatory genomic elements — enhancers and core-promoters — and the transcription factor and cofactor proteins that mediate transcription activation. To reach this goal, we follow an interdisciplinary approach, using genome-wide functional assays, bioinformatics, biochemistry, and mass-spectrometry. We develop and employ highly-controllable reporter assays that provide reliable functional readouts for each of questions we ask, while circumventing the many confounding issues that exist in complex gene regulatory systems in vivo.

STARR-seq: Genome-wide quantitative enhancer activity maps

STARR-seq We have developed STARR-seq (self-transcribing-active-regulatory-region-sequencing), a massively parallel reporter assay based on next-generation sequencing. STARR-seq allows the identification of transcriptional enhancers in a direct and quantitative manner in entire genomes, drawing genome-wide quantitative enhancer activity maps. We are applying STARR-seq in Drosophila and human cells to understand the sequence basis of cell-type specific enhancer activities.

STAP-seq: Genome-wide assessment of enhancer responsiveness

STAP-seq We have developed STAP-seq (Self-Transcribing Active core-Promoter-sequencing), to measure the enhancer responsiveness of core-promoter candidates genome-wide. While STARR-seq assesses enhancer function using a constant core-promoter, STAP-seq quantifies the ability of core-promoter candidates to initiate transcription provided activating input from a defined constant enhancer. STAP-seq thereby enables the identification of transcription initiation sites (TSSs) and the quantification of how strongly each TSS activates transcription in response to enhancers, i.e. the TSS’s enhancer responsiveness.

Direct identification of regulatory activities of transcription factors and cofactors

LEGO We have developed an enhancer complementation assay that allows the testing of regulatory activities of TFs regardless of their endogenous DNA binding specificities and developmental roles. By tethering TFs to different sites in mutant enhancers we are able to quantify the TFs’ regulatory activities in different contexts. Our results on the function of 474 TFs and 338 transcriptional cofactors assayed in Drosophila S2 cells are available from factors.starklab.org.

Genome-scale enhancer characterization in Drosophila embryos

Embryonic enhancers We have determined the temporal and spatial enhancer activity in Drosophila embryos of 7793 transcriptional reporter constructs integrated at a single defined genomic position in transgenic lines (Vienna-Tiles [VT] library). This provides the first genome-scale view on how developmental gene regulation is organized in an animal’s genome and how regulatory DNA sequences encode different patterns of gene expression. All the data is available from enhancers.starklab.org.

Cracking the transcriptional regulatory code.

Transcriptional regulatory code We use bioinformatics and machine-learning to dissect regulatory sequences and determine sequence features (e.g. TF motifs) that are predictive of regulatory function and required for enhancer activity. Our goal is to achieve a systematic understanding of the structure and functions of enhancers, i.e. to “crack” the regulatory code, predict enhancer activity from the DNA sequence, and understand how transcriptional networks define cellular and developmental programs.

Comparative genomics - tracing sequences through evolution

Comparative Genomics Functional genomic elements are often under evolutionary selection to maintain their functions in related organisms. We study the conservation and divergence of TF binding and enhancer activity and the underlying regulatory DNA sequences across different Drosophila species. Conservation and divergence allow powerful insights into how gene regulation is encoded in the DNA sequence and how it changes during evolution.

Determining and understanding tissue-specific TF binding

ChIP-seq Transcription factors are employed at several developmental stages and in different tissues and typically bind to and regulate different targets in each such context. We use ChIP-Seq to determine context-specific TF binding and study its sequence determinants, focusing on embryonic mesoderm and muscle development, the circadian clock, and homeobox (Hox) transcription factors. The combinatorics of TF motifs and TF binding lie at the heart of enhancer activity and understanding the sequence requirements of context-specific TF binding will give important insights into the regulatory code.