|
||||||
The construction of the Human Epigenome Atlas is funded by the NIH Epigenomics Roadmap Project. Project participants include Reference Epigenome Mapping Centers, the NCBI, and the Epigenome Data Analysis and Coordination Center (EDACC) at Baylor College of Medicine. The Human Epigenome Atlas will include human reference epigenomes and the results of their integrative and comparative analyses. Successive releases of the Atlas will provide progressively more detailed insights into locus-specific epigenomic states, including histone marks and DNA methylation marks across specific tissues and cell types, developmental stages, physiological conditions, genotypes, and disease states. The first release of the Atlas is anticipated during the first half of 2010. Return to this page for updates on our progress toward construction of the Atlas. The data that will be used to construct the first release of the Atlas may be accessed by querying the NCBI Geo DataSets database using query ”Epigenomics Roadmap[Project]” [link]. The Atlas releases will coinsist of metadata and data at Levels 0-4. Most of Level 0 data will likely consist of DNA sequence reads (*-seq assays) but may also include chip data (*-chip assays). Level 1 data for *-seq assays is downloadable as SAM/BAM file and Level 2 data is downloadable as a WIG file from the Geo DataSets database at NCBI. Level 2 data also is displayed through the NCBI Sequence Viewer. A sample Geo record (H1 H3K4me3 signal obtained by a ChIP-seq assay) with all these features is available. The development of the methods and pipelines to generate Level 3 data (normalized signals for epigenome comparisons across platforms) and Level 4 data (sample- and locus- specific epigenomic states) is in progress. The construction of the Atlas requires informatic infrastructure, standards, and practices that will scale with increasing data throughputs, accommodate an increasing diversity of experimental and computational methodologies, and serve a diverse repertoire of research projects including reference epigenome mapping and disease-focused projects. EDACC has developed and partially implemented the following three aspects of such informatic infrastructure using the Genboree system: 1. Data flow. We have developed data models, quality standards, and online tools to support data flow from REMCs and disease projects to NCBI, thus creating an epigenomic data commons. We have also developed tools for metadata creation (study, sample and experimental assay information), verification for ChIP-seq, bisulfite-seq and other epigenomic assays and for project monitoring and coordination.
Epigenomic metadata submission — As part of the submission process, metadata is validated prior to final submission to the SRA/GEO archives at NCBI.
Genboree pipeline modules. — - the aforementioned pipelines are exposed in a Galaxy installation and are loosely coupled with Genboree through the Genboree REST API where appropriate.
H1 - IMR90 Methylation Comparison Demo
Genboree Workbench |
||||||
|
|
||||||
|
||||||