dcVar – a computationally efficient Linux command line tool for uncovering variants that modulate differential correlation structures in gene expression data.
Citation: Lareau CA, White BC, Montgomery C and McKinney BA (2015). dcVar: A Method for Identifying Common Variants that Modulate Differential Correlation Structures in Gene Expression Data. Front. Genet. 6:312. doi: 10.3389/fgene.2015.00312
Command Line
dcVar is invoked with the command dcVar and has the following command line options:
Required Input Files
–bfile arg
plink .bed/.bim/.fam file prefix containing SNPs
–numeric-file arg
inbix format matrix of individuals (rows) by transcripts (columns)
Optional Commands
–dcvar-pfilter
Enable p-value filtering
–dcvar-pfilter-value arg
P-value filter value
–dcvar-pfilter-type arg
P-value filter type {bon|fdr}
–var-model arg
dom, rec, hom
Help
–help
Displays this screen
Example Usage
./dcVar –bfile plinkBfile –numeric-file transcripts.num –out example
HapMap3 dcVar Analysis
Full annotated output from the HapMap3 dcVar analysis described in Lareau et al. The first and third columns show the ILMN probe IDs that were differentially correlated when conditioning on the variant in column 5. Gene symbols were annotated to the probe IDs in columns 2 and 4. The interaction score from STRING (string-db.org) for the two genes is listed in column 6 as well as genome-wide association from NHGRI (https://www.genome.gov/26525384) from general population studies in column 7. Probes that did not readily map to a particular gene ID, gene pairs that lacked an interaction score from STRING or variants that did not carry a GWAS annotation were labeled with “#NA#” in their respective columns.
Tutorial
dcVar Tutorial
Below is a summary of command line options, a usage scenario, and output for analyses performed using dcVar.
dcVar requires three input files: PLINK .bed/.bim/.fam or .map/.ped, a matrix file of transcript expression, and a coordinates file.
The matrix uses the inbix numeric format, which like PLINK PED files contains two columns of IDs followed by expression values by probe. Each row is a subject/observation.
FID IID ILMN_1804663 ILMN_1651799 ILMN_1712803 NA18939 NA18939 7.43694716359451 12.1440120474746 10.1206313020384 NA18940 NA18940 7.55399263595392 12.7853348338205 10.590979975707 NA18942 NA18942 7.55158390736588 12.2177602290744 10.2737707301728 NA18943 NA18943 7.48992435380628 11.912760771974 9.97233086800192 NA18944 NA18944 7.32994255785869 11.716949870639 10.0136601582572
Optional Commands
–fdr-on
Enable FDR p-value filter
–fdr-off
Disable FDR p-value filter
–fdr-value arg
FDR value
–var-model arg
dom, rec, hom
Use Case
Setup for all variants.
$ dcVar --bfile variants --numeric-file expression.matrix --dcvar --out tutorial Writing this text to log file [ tutorial.log ] Analysis started: Wed Mar 25 13:39:00 2015 Options in effect: --bfile variants --numeric-file expression.matrix --dcvar --out tutorial Reading map (extended format) from [ variants.bim ] 100 markers to be included from [ variants.bim ] Reading pedigree information from [ variants.fam ] 491 individuals read from [ variants.fam ] 491 individuals with nonmissing phenotypes Assuming a disease phenotype (1=unaff, 2=aff, 0=miss) Missing phenotype value is also -9 0 cases, 491 controls and 0 missing 249 males, 242 females, and 0 of unspecified sex Reading genotype bitfile from [ variants.bed ] Detected that binary PED file is v1.00 SNP-major mode Read 1000 numeric attributes from [ expression.matrix ] with nonmissing values for 491 individuals Before frequency and genotyping pruning, there are 100 SNPs 491 founders and 0 non-founders found Total genotyping rate in remaining individuals is 0.998086 0 SNPs failed missingness test ( GENO > 1 ) 0 SNPs failed frequency test ( MAF < 0 ) After frequency and genotyping pruning, there are 100 SNPs After filtering, 0 cases, 491 controls and 0 missing After filtering, 249 males, 242 females, and 0 of unspecified sex Performing dcVar analysis Converting data to Individual-major format 100 variants, and 1000 genes FDR Corrected p-value: 1.001e-09 Writing results to [ rs235214.dcVarTest.txt ] Performing z-tests with 489 degrees of freedom WARNING: all main p-values are set to 1. Computing coexpression for CASES and CONTROLS. Detected 274 affected and 217 unaffected individuals Loading case and control matrices Computing covariance matrix Computing correlation matrix Computing covariance matrix Computing correlation matrix Performing Z-tests for interactions Found [3] FDR tested p-values, min/max: 3.52465e-12 / 1 Writing results to [ rs214331.dcVarTest.txt ] Performing z-tests with 489 degrees of freedom WARNING: all main p-values are set to 1. Computing coexpression for CASES and CONTROLS. Detected 317 affected and 174 unaffected individuals Loading case and control matrices Computing covariance matrix Computing correlation matrix Computing covariance matrix Computing correlation matrix . . .
Outputs
To reduce output file sizes, each unique transcript analyzed has its own dcVar output file.
$ cat rs235214.dcVarTest.txt ILMN_1676986 ILMN_1753115 2.11843e-11 ILMN_1753115 ILMN_1692706 3.52465e-12 ILMN_1653251 ILMN_1759989 6.55952e-10
Using the R Version
Download dcVar.R, and use your favorite editor to set the various files and run options. Use the source (“dcVar.R”) command or an IDE like RStudio to run dcVar with your settings.
Installing the C++ Version
- Download dcVar-1.0 and unzip the file.
- Install Armadillo library if not already installed.
- Build the dcVar executable program with the command: make -f Makefile.dcVar or make -f Makefile.dcVar.osx for Mac OSX.
- Install the dcVar executable program to /usr/local/bin with the command: sudo make -f Makefile.dcVar install.