De-correlating expression in gene-set analysis

 Motivation: Group-wise pattern analysis of genes, known as gene set analysis (GSA), addresses the differential expression pattern of biologically pre-defined gene sets. GSA exhibits high statistical power and has revealed many novel biological processes associated with specific phenotypes. In most cases, however, GSA relies on the invalid assumption that the members of each gene set are sampled independently, which increases false predictions.
 Results: We propose an algorithm, termed DECO, to remove (or alleviate) the bias caused by the correlation of the expression data inGSAs. This is accomplished through the eigenvalue-decomposition of covariance matrixes and a series of linear transformations of data. In particular, moderate de-correlation methods that truncate or rescale eigenvalues were proposed for a more reliable analysis. Tests of simulated and real experimental data show that DECO effectively corrects the correlation structure of gene expression and improves the prediction accuracy (specificity and sensitivity) for both gene and sample-randomizing GSA methods.

Bioinformatics, Vol. 26 ECCB 2010 issue, pages i511?i516