Genboree BCM
Help


The construction of the Human Epigenome Atlas is funded by the NIH Epigenomics Roadmap Project. Project participants include Reference Epigenome Mapping Centers, the NCBI, and the Epigenome Data Analysis and Coordination Center (EDACC) at Baylor College of Medicine.

The Human Epigenome Atlas will include human reference epigenomes and the results of their integrative and comparative analyses. Successive releases of the Atlas will provide progressively more detailed insights into locus-specific epigenomic states, including histone marks and DNA methylation marks across specific tissues and cell types, developmental stages, physiological conditions, genotypes, and disease states. The first release of the Atlas is anticipated during the first half of 2010. Return to this page for updates on our progress toward construction of the Atlas.

The data that will be used to construct the first release of the Atlas may be accessed by querying the NCBI Geo DataSets database using query ”Epigenomics Roadmap[Project]” [link].

The Atlas releases will coinsist of metadata and data at Levels 0-4. Most of Level 0 data will likely consist of DNA sequence reads (*-seq assays) but may also include chip data (*-chip assays). Level 1 data for *-seq assays is downloadable as SAM/BAM file and Level 2 data is downloadable as a WIG file from the Geo DataSets database at NCBI. Level 2 data also is displayed through the NCBI Sequence Viewer. A sample Geo record (H1 H3K4me3 signal obtained by a ChIP-seq assay) with all these features is available.

The development of the methods and pipelines to generate Level 3 data (normalized signals for epigenome comparisons across platforms) and Level 4 data (sample- and locus- specific epigenomic states) is in progress.

The construction of the Atlas requires informatic infrastructure, standards, and practices that will scale with increasing data throughputs, accommodate an increasing diversity of experimental and computational methodologies, and serve a diverse repertoire of research projects including reference epigenome mapping and disease-focused projects. EDACC has developed and partially implemented the following three aspects of such informatic infrastructure using the Genboree system:
 
1. Data flow. We have developed data models, quality standards, and online tools to support data flow from REMCs and disease projects to NCBI, thus creating an epigenomic data commons. We have also developed tools for metadata creation (study, sample and experimental assay information), verification for ChIP-seq, bisulfite-seq and other epigenomic assays and for project monitoring and coordination.

Epigenomic metadata submission — As part of the submission process, metadata is validated prior to final submission to the SRA/GEO archives at NCBI.
This is a partially functional prototype of the submission process.
(This demonstration version will not make actual submission to SRA/GEO, but is otherwise functional).

2. Primary data analysis. The Level 0 data (principally reads) are processed using reference analysis pipelines. The first step in sequence data analysis is the mapping of large volumes of reads (mappings = Level 1 data) followed by processing appropriate to the assay type such as producing read mapping densities for ChIP-seq experiments or methylation calls for bisulfite-seq experiments (Level 2 data).

Genboree pipeline modules. — - the aforementioned pipelines are exposed in a Galaxy installation and are loosely coupled with Genboree through the Genboree REST API where appropriate.
These pipelines have only been recently exposed and tuning, further automation, and tigher integration with the Genboree system is underway.

3. Integrative and comparative analysis. New methods are being developed in the areas of integrative analysis of histone marks, DNA methylation, small RNA and mRNA analysis. Normalization methods to produce data comparable across diverse assay types (Level 3 data), similarity measures and comparison algorithms are being developed to detect global similarities and local differences (Level 4 data) between epigenomes. In the near future EDACC intends to deploy web-based tools for epigenome comparisons for quality control and other purposes.

H1 - IMR90 Methylation Comparison Demo
H1 Histone Modification Comparison Demo
The results reported and linked on these pages are preliminary, although are based on analysis of epigenomic data.

Genboree Workbench
The Genboree workbench is currently a prototype depicting the overall direction and functionality that is under development. Currently, some precomputed epigenomic comparison results are accessible for tracks withing the 'Epigenome Atlas' group.

 

 


Bioinformatics Research Laboratory
The Epigenome Atlas, EDACC-BCM, and Genboree are hosted & maintained by the
Bioinformatics Research Laboratory at Baylor College of Medicine.
Genboree is a hosted service, but code is available (free for academic use).
BCM
© 2001-2010 Bioinformatics Research Laboratory
    (400D Jewish Wing, MS:BCM225, 1 Baylor Plaza, Houston, TX 77030, 713-798-5433)
Questions or comments?
Genboree Community Support Site