The Release 9 primary data compendium contains uniformly pre-processed and mapped data from multiple profiling experiments (technical and biological replicates from multiple individuals and/or datasets from multiple centers). All datasets were uniformly pre-processed by mapping reads onto hg19 assembly of the human genome using Pash 3.0 read mapper. Complete metadata associated with each dataset in this collection is archived at the Gene Expression Omnibus and describes samples, assays, data processing details and quality metrics collected for each profiling experiment.
To reduce redundancy, improve data quality and achieve uniformity required for integrative analyses in the Roadmap Epigenomics Consortium paper, experiments were subjected to additional processing to obtain comprehensive data for 111 consolidated epigenomes. Numeric epigenome identifiers (EIDs; for example, E001) and mnemonics for epigenome names were assigned for each of the consolidated epigenomes. For additional details about the consolidated epigenome IDs see Supplementary Table 1, Epigenome Class Summary sheet in the Roadmap Epigenomic consortium paper. Data sets corresponding to 16 cell lines from the ENCODE project (with epigenome IDs ranging from E114 to E129) were also included in the uniformly processed dataset. To avoid artificial differences due to mappability, for each consolidated data set the raw mapped reads were uniformly truncated to 36 bp and then refiltered using a 36-bp custom mappability track to retain only reads that map to positions. Reads were also randomly subsampled to 30 million reads to ensure uniformity in the sequencing depth. Uniformly processed data sets were then merged across technical/biological replicates, and where necessary to obtain a single consolidated sample for every histone mark or DNase seq in each standardized epigenome.
A detailed description of alignments and processing is described here.