dcVar – a computationally efficient Linux command line tool for uncovering variants that modulate differential correlation structures in gene expression data.

Citation: Lareau CA, White BC, Montgomery C and McKinney BA (2015). dcVar: A Method for Identifying Common Variants that Modulate Differential Correlation Structures in Gene Expression Data. Front. Genet. 6:312. doi: 10.3389/fgene.2015.00312

Command Line

dcVar is invoked with the command dcVar and has the following command line options:

Required Input Files

–bfile arg
plink .bed/.bim/.fam file prefix containing SNPs

–numeric-file arg
inbix format matrix of individuals (rows) by transcripts (columns)

Optional Commands

–dcvar-pfilter
Enable p-value filtering

–dcvar-pfilter-value arg
P-value filter value

–dcvar-pfilter-type arg
P-value filter type {bon|fdr}

–var-model arg
dom, rec, hom

Help

–help
Displays this screen

Example Usage

./dcVar –bfile plinkBfile –numeric-file transcripts.num –out example

HapMap3 dcVar Analysis

Full annotated output from the HapMap3 dcVar analysis described in Lareau et al. The first and third columns show the ILMN probe IDs that were differentially correlated when conditioning on the variant in column 5. Gene symbols were annotated to the probe IDs in columns 2 and 4. The interaction score from STRING (string-db.org) for the two genes is listed in column 6 as well as genome-wide association from NHGRI (https://www.genome.gov/26525384) from general population studies in column 7. Probes that did not readily map to a particular gene ID, gene pairs that lacked an interaction score from STRING or variants that did not carry a GWAS annotation were labeled with “#NA#” in their respective columns.

Tutorial

dcVar Tutorial

Below is a summary of command line options, a usage scenario, and output for analyses performed using dcVar.

dcVar requires three input files: PLINK .bed/.bim/.fam or .map/.ped, a matrix file of transcript expression, and a coordinates file.

The matrix uses the inbix numeric format, which like PLINK PED files contains two columns of IDs followed by expression values by probe. Each row is a subject/observation.

FID IID ILMN_1804663  ILMN_1651799  ILMN_1712803
NA18939 NA18939 7.43694716359451  12.1440120474746  10.1206313020384
NA18940 NA18940 7.55399263595392  12.7853348338205  10.590979975707
NA18942 NA18942 7.55158390736588  12.2177602290744  10.2737707301728
NA18943 NA18943 7.48992435380628  11.912760771974 9.97233086800192
NA18944 NA18944 7.32994255785869  11.716949870639 10.0136601582572
    
Optional Commands

–fdr-on
Enable FDR p-value filter

–fdr-off
Disable FDR p-value filter

–fdr-value arg
FDR value

–var-model arg
dom, rec, hom

Use Case

Setup for all variants.

$ dcVar --bfile variants --numeric-file expression.matrix --dcvar --out tutorial
Writing this text to log file [ tutorial.log ]
Analysis started: Wed Mar 25 13:39:00 2015

Options in effect:
  --bfile variants
  --numeric-file expression.matrix
  --dcvar
  --out tutorial

Reading map (extended format) from [ variants.bim ] 
100 markers to be included from [ variants.bim ]
Reading pedigree information from [ variants.fam ] 
491 individuals read from [ variants.fam ] 
491 individuals with nonmissing phenotypes
Assuming a disease phenotype (1=unaff, 2=aff, 0=miss)
Missing phenotype value is also -9
0 cases, 491 controls and 0 missing
249 males, 242 females, and 0 of unspecified sex
Reading genotype bitfile from [ variants.bed ] 
Detected that binary PED file is v1.00 SNP-major mode
Read 1000 numeric attributes from [ expression.matrix ] with nonmissing values for 491 individuals
Before frequency and genotyping pruning, there are 100 SNPs
491 founders and 0 non-founders found
Total genotyping rate in remaining individuals is 0.998086
0 SNPs failed missingness test ( GENO > 1 )
0 SNPs failed frequency test ( MAF < 0 )
After frequency and genotyping pruning, there are 100 SNPs
After filtering, 0 cases, 491 controls and 0 missing
After filtering, 249 males, 242 females, and 0 of unspecified sex
Performing dcVar analysis
Converting data to Individual-major format
100 variants, and 1000 genes
FDR Corrected p-value: 1.001e-09
Writing results to [ rs235214.dcVarTest.txt ]
Performing z-tests with 489 degrees of freedom
WARNING: all main p-values are set to 1.
Computing coexpression for CASES and CONTROLS.
Detected 274 affected and 217 unaffected individuals
Loading case and control matrices
Computing covariance matrix
Computing correlation matrix
Computing covariance matrix
Computing correlation matrix
Performing Z-tests for interactions
Found [3] FDR tested p-values, min/max: 3.52465e-12 / 1
Writing results to [ rs214331.dcVarTest.txt ]
Performing z-tests with 489 degrees of freedom
WARNING: all main p-values are set to 1.
Computing coexpression for CASES and CONTROLS.
Detected 317 affected and 174 unaffected individuals
Loading case and control matrices
Computing covariance matrix
Computing correlation matrix
Computing covariance matrix
Computing correlation matrix
.
.
.
Outputs

To reduce output file sizes, each unique transcript analyzed has its own dcVar output file.

$ cat rs235214.dcVarTest.txt
ILMN_1676986  ILMN_1753115  2.11843e-11
ILMN_1753115  ILMN_1692706  3.52465e-12
ILMN_1653251  ILMN_1759989  6.55952e-10

Using the R Version

Download dcVar.R, and use your favorite editor to set the various files and run options. Use the source (“dcVar.R”) command or an IDE like RStudio to run dcVar with your settings.

Installing the C++ Version

  • Download dcVar-1.0 and unzip the file.
  • Install Armadillo library if not already installed.
  • Build the dcVar executable program with the command: make -f Makefile.dcVar or make -f Makefile.dcVar.osx for Mac OSX.
  • Install the dcVar executable program to /usr/local/bin with the command: sudo make -f Makefile.dcVar install.