What it does
Takes a binary matrix and finds modules that have patterns of recurrence and mutual exclusivity.
One use is for for automated detection of modules of functionally-related genes. Preprocess the data such that it is in a binary matrix, with each row being a gene and each column representing a sample.
The tool will build a weighted graph using the Winnow algorithm, then search it for modules that are highly recurrent (across samples) and has high levels of mutual exclusivity. The significance of these patterns is determined using the algorithmic significance test
Note
The binary matrix should be constructed such that the first row and column contain labels, and every other value is either a one or a zero (see example below).
Parameters
Number of attributes: to allow for multiple testing correction, include all genes assayed, even if they are not represented the matrix (there is little reason to include genes in the matrix that have no mutations in any sample).
Maximum module size: The largest module size that the algorithm will attempt to find. Warning! The algorithm's complexity grows very quickly as a result of using combinatorial search. Values over 5, applied to very large matrices may take a long time and are not suitable for running through this web service. Please use the stand alone tool for these.
Winnow threshold: The algorithm speeds up the search process by excluding poor edges. This parameter controls the threhold score for an edge to be kept. Due to Winnow's design, these values should be powers of 2. (4, 8. 16, 32 ...)
The optimal value is dependent on size of the input data. Filtering too agressively may lead to missing modules that exist. Some suggested values: - ~1000 attributes, ~200 samples: 4 - ~5000 attributes, ~500 samples: 32 - ~18000 attributes, ~500 samples: 128
Minimum Frequency: Attributes must be altered in this proportion of the samples to be considered for inclusion in a module. Recommended default is 0.10, as the false positive rate increases below that point.
Background Rate: The expected odds of a particular attribute in a particular sample being altered. The default value assumes data composed of copy number and somatic mutation assays, and is derived from HapMap data and estimation of passenger mutation rates in glioblastoma multiforme.
Significance Threshold: The minimum algorithmic significance value that a module must exceed. Optimal values will depend on input data size. Suggested values: - ~1000 genes, ~200 samples: 50 - ~5000 genes, ~500 samples: 100 - ~18000 genes, ~500 samples: 200
Output
The tool outputs two files. The first (raw output) is a list of all potential RME modules, along with significance scores. The second is a list of the top RME modules, based on choosing those with largest module size and highest significance score. This file contains four columns: Attribute Names, Coverage (% of samples), Exclusivity (% of covered samples that are mutually exclusive), and Algorithmic significance score.
Example
This file contains a binary matrix of simulated data. The first 1290 rows are simulated genes with mutations distributed according to those found in a large glioblastoma tumor set. The last 3 are generated such that mutations follow an RME pattern.
To run the example, upload this matrix using the "Get Data" tool. Then, return to this page, fill in "1295" for the number of attributes, and leave other parameters at their default settings.
Reference More details can be found in the following paper:
Miller CA. Discovering functional modules relevant for cancer progression by identifying patterns of recurrent and mutually exclusive aberrations in tumor samples. (**in review, will be linked upon publishing**)
Download This tool is available as a stand-alone software package here.