Dept. of Biochemistry
Dept. of Biological Sciences
Halfon Lab
Marc S. Halfon Ph.D.
Assistant Professor
Department of Biochemistry
Center of Excellence in Bioinformatics and the Life Sciences
Dept. of Biological Sciences

State University of New York at Buffalo

Adjunct Assistant Professor
Molecular and Cellular Biology Dept.
Roswell Park Cancer Institute
mshalfon@buffalo.edu
(716) 829-3126

rotation students welcome!

Genomic Approaches to Elucidating Developmental Regulatory Networks

My laboratory investigates the genetic regulatory circuitry responsible for assigning cell fates during development, using the Drosophila embryonic mesoderm as our primary model system. Our work combines genomics and bioinformatics with the traditional molecular and genetic techniques of Drosophila research to investigate two key components of developmental regulatory networks, intercellular signaling and transcriptional regulation. This powerful combination of in silico and in vivo approaches enables us not only to make predictions but also to validate them within specific biological contexts. Our approaches have broad applicability to the study of genomes other than that of Drosophila, including the human genome. Current research in the laboratory falls into two main areas: (a) discovery and characterization of transcriptional cis-regulatory modules (CRMs), and (b) mechanisms of specificity for receptor tyrosine kinase (RTK) signaling. The combined results of our studies will provide insight into gene regulation, genome structure, intercellular signaling, and the regulatory networks that govern embryonic development.

Regulatory Networks

Gene expression is controlled by the binding of transcription factors to specific cis-regulatory elements. In the higher eukaryotes, these elements can lie 5' to, 3' to, or within introns of a gene; in some cases, they can even be found within protein coding sequences! Spatial and temporal aspects of gene expression are often controlled in a modular fashion, with individual cis-regulatory elements (termed "modules" or "enhancers") regulating expression in a particular time and place. An emerging theme is that a specific combination of transcription factors activiated as a result of intercellular signaling binds a regulatory module in conjunction with tissue-specific transcription factors ("selectors"), forming a "transcriptional code" that regulates the expression of a given gene (see Figure 1).

network diagram Together, the signaling and transcriptional events form a network of interactions in which signaling induces gene transcription, which can in turn lead to further signaling events, which then induce additional gene expression, and so on. Cascades of transcription can also occur, whereby transcription factors induce the expression of other transcription factors, which can in turn regulate still other transcription factors. These developmental regulatory networks are often complex, with multiple levels of cross-talk between different signaling pathways and both positive and negative feedback loops (see Figure 1, right). Our ultimate goal is to be able to describe all of the regulatory interactions involved in embryonic development. As a tractable step in this direction, we have begun to identify and characterize specific cis-regulatory elements and signaling pathways involved in mesoderm development.

[return to top]

Defining cis-regulatory elements

cis-Regulatory modules (CRMs) are critical nodes in developmental regulatory networks, as it is here that signaling pathways and transcription factors are integrated to give rise to changes in the expression of specific genes. Mutations within CRMs have been implicated in a number of diseases, underscoring the importance of being able to identify and characterize them. However, CRM identification has traditionally been difficult, relying on a trial-and-error approach using the non-coding DNA flanking the gene of interest. We are using a number of computational approaches to attempt to locate the cis-regulatory modules responsible for directing specific patterns of gene expression in a rapid and comprehensive fashion. Our focus in particular has been on elements that modulate expression in the progenitors of the Drosophila embryonic muscles. Our basic strategy is to empirically define in detail at least one model enhancer, identifying the majority of the transcription factors that bind to it. We then search the entire genome to find other regions that contain a compact cluster of the same binding sites. This approach is effective, although it has a high false positive rate (i.e., many incorrect predictions). We are finding that a considerable gain in accuracy is achieved by incorporating comparative genome data from additional Drosophila speices, as true regulatory modules tend to be well conserved over the course of evolution. All of our predictions are extensively tested in vivo using reporter gene assays in the fly embryo so that we can definitively assess our success rate and refine our approach to achieve better performance.

In addition to this basic strategy, we are exploring other computational and empirical approaches to characterizing cis-regulatory elements, such as combining phylogenetic footprinting with sub-sequence profiling ("word" counts) as a way to identify functionally related enhancers, the development of high-throughput ways to test predictions in vivo, and the incorporation of data from transcriptional profiling experiments (see below).

Although our primary focus has been on CRM discovery, much can be learned from studying already-known CRMs using bioinformatics approaches. However, these studies are significantly hindered by the absence of readily available data on large numbers of CRMs. To address this shortcoming, we have constructed the REDfly database of published Drosophila CRMs. This database contains more than 600 CRMs associated with over 200 genes, along with their sequences and the expression patterns for which they are responsible. Computational analysis of this collection will allow us to discover previously unrecognized transcription factor binding sites (see below) as well as to begin to explore the "grammar" of CRMs--how differences in the order and spacing of individual binding sites affect the overall functioning of the module. We can address issues such as to what extent clustering of binding sites is important for enhancer activity, how modular versus nested or interrelated regulatory elements tend to be, and how subtle differences in structure and sequence affect enhancer activity.

[return to top]

Downstream responses to intercellular signaling

In order to characterize developmental regulatory networks, we must also understand how upstream intercellular signaling events establish the transcriptional codes that act at the cis-regulatory elements. To this end, we have used DNA microarrays to determine the downstream target genes of signaling pathways in the embryo. By marking subpopulations of cells with GFP or cell-surface markers, we are able to isolate RNA from cells of interest in wild type as well as in loss- and gain-of-function genetic backgrounds. We are extending these studies to look at the downstream effects of combinations of signaling pathways, to better understand the combinatorial nature of intercellular signaling. In addition to the microarray studies, we make extensive use of real time quantitative RT-PCR and of whole mount RNA in situ hybridizations to visualize gene expression patterns.

An important yet unresolved issue in microarray experiments lies in how to best analyze the data. Although numerous algorithms have been proposed for normalizing arrays, producing gene expression summaries, and statistically validating the results, it is unclear which of these methods work best. The main difficulty in determining this has been the lack of a comprehensive data set in which all of the RNAs being hybridized to the array, and their relative concentrations, are known. We have created such a set of over 4000 RNAs and hybridized them to Affymetrix arrays with relative concentration differences ranging from 1- to 4-fold. Using this data set we have assessed various methods of analysis to determine which give results that most accurately reflect the known input. In the future, we will repeat these experiments using other microarray platforms and analysis methods. The data sets should be of considerable value in benchmarking the performance of both existing and newly developed microarray analysis algorithms. Our manuscript describing these results can be found here.

We have been particularly interested in receptor tyrosine kinase (RTK) signaling pathways, including the receptors Heartless (a fibroblast growth factor (FGF) receptor homolog) and Egfr (an epidermal growth factor (EGF) receptor homolog), which play important roles in establishing mesodermal cell fates. Although the FGF and EGF receptors are often believed to be acting via identical downstream signaling cascades, our data suggest that there are significant points of divergence within these pathways. Thus, these studies will contribute to our knowledge of RTK pathway regulation as well as identify additional genes important for mesodermal development.

[return to top]

Selected Recent Publications

Halfon, M. S., Gallo, S. M. and Bergman, C. M. (2007). REDfly 2.0: an integrated database of cis-regulatory modules and transcription factor binding sites in Drosophila. Nucleic Acids Research, doi:10.1093/nar/gkm876.

Li, L., Zhu, Q., He, X., Sinha, S. and Halfon, M. S. (2007). Large-scale analysis of transcriptional cis-regulatory modules reveals both common features and distinct subclasses. Genome Biology, 8:R101.

Halfon, M. S. (2006). (Re)modeling the transcriptional enhancer. Nat Genet 38(10): 1102-1103.

Choe, S. E., Boutros, M., Michelson, A. M., Church, G. M. and Halfon, M. S. (2005). Preferred analysis methods for Affymetrix GeneChips revealed by a wholly-defined control dataset. Genome Biology. 6:R16.

Grad, Y., Roth, F. P., Halfon, M. S. and Church, G. M. (2004). Prediction of similarly-acting cis-regulatory modules by subsequence profiling and comparative genomics in D. melanogaster. Bioinformatics. 20:2738-2750.

Halfon, M. S., Grad, Y., Church, G. and Michelson, A.M. (2002). Computation-based discovery of related transcriptional regulatory modules and motifs using a combinatorial model. Genome Res. 12:1019-1028.

Halfon, M. S. and Michelson, A.M. (2002). Exploring genetic regulatory networks in metazoan development: methods and models. Physiological Genomics 10:131-143.

Halfon, M. S., Carmena, A., Gisselbrecht, S., Sackerson, C. M., Jiménez, F., Baylies, M. K. and Michelson, A. (2000) Ras Pathway Specificity Is Determined by the Integration of Multiple Signal-Activated and Tissue-Restricted Transcription Factors. Cell 103:63-74.

[return to top]