Title: | The W-Test for Genetic Interactions Testing |
---|---|
Description: | Perform the calculation of W-test, diagnostic checking, calculate minor allele frequency (MAF) and odds ratio. |
Authors: | Rui Sun, Maggie Haitian Wang |
Maintainer: | Rui Sun <[email protected]> |
License: | GPL-2 |
Version: | 3.2 |
Built: | 2025-03-06 03:47:12 UTC |
Source: | https://github.com/cran/wtest |
This dataset contains pseudo positions of 200 CpG sites.
data(CpG.pos)
data(CpG.pos)
The format is: CpG sites by rows; column 1: names of CpG sites; column 2: positions of CpG sites.
A data frame contains 23 SNPs for 115 individuals.
data(diabetes.geno)
data(diabetes.geno)
The format is: subjects by rows and genotypes by columns.
Wang, M. H., Li, J., Yeung, V. S. Y., Zee, B. C. Y., Yu, R. H. Y., Ho, S., & Waye, M. M. Y. (2014). Four pairs of gene-gene interactions associated with increased risk for type 2 diabetes (CDKN2BAS-KCNJ11), obesity (SLC2A9-IGF2BP2, FTO-APOA5), and hypertension (MC4R-IGF2BP2) in Chinese women. Meta Gene, 2, 384-391. http://doi.org/10.1016/j.mgene.2014.04.010
This simulated data frame contains 300 individuals and 200 SNPs.
data(genotype)
data(genotype)
The format is: subjects by rows, and genotype by columns.
Estimate parameters (h and f) for W-test
.
hf(data, w.order, B = 400, n.sample = nrow(data), n.marker = "default.nmarker")
hf(data, w.order, B = 400, n.sample = nrow(data), n.marker = "default.nmarker")
data |
a data frame or matrix containing genotypes in the columns and subjects in the rows. Genotypes should be coded as (0, 1, 2) or (0, 1). |
w.order |
a numeric number. |
B |
a numeric number specifying the number of replicates. Default is 400. |
n.sample |
a numeric number specifying the number of samples to be used for estimating parameters. Default is the total number of samples in the data. |
n.marker |
a numeric value, the number of biomarkers to include in bootstrapping. For |
a set of h and f values indexed by k, estimated automatically. For main effect, k is the number of levels of a predictor variable. For interactions, k is the number of categorical combinations of a variable pair.
Rui Sun, Maggie Haitian Wang
Maggie Haitian Wang, Rui Sun, Junfeng Guo, Haoyi Weng, Jack Lee, Inchi Hu, Pak Sham and Benny C.Y. Zee (2016). A fast and powerful W-test for pairwise epistasis testing. Nucleic Acids Research. doi:10.1093/nar/gkw347.
data(diabetes.geno) # Please note that parameter B is recommended to be greater than 400. # For high order interaction analysis (w.order > 2), it is recommended to use default n.sample. hf1 <- hf(data = diabetes.geno, w.order = 1, B = 100) hf2 <- hf(data = diabetes.geno, w.order = 2, B = 80)
data(diabetes.geno) # Please note that parameter B is recommended to be greater than 400. # For high order interaction analysis (w.order > 2), it is recommended to use default n.sample. hf1 <- hf(data = diabetes.geno, w.order = 1, B = 100) hf2 <- hf(data = diabetes.geno, w.order = 2, B = 80)
Estimate parameters (h and f) for W-test
.
hf.snps.meth(B = 400, geno, meth, y, geno.pos, meth.pos, window.size, n.sample = nrow(geno), n.pair = 1000)
hf.snps.meth(B = 400, geno, meth, y, geno.pos, meth.pos, window.size, n.sample = nrow(geno), n.pair = 1000)
B |
a numeric number specifying the number of bootstrapping times. Default is 400. |
geno |
a data frame or matrix containing genotypes in the columns. Genotypes should be coded as (0, 1, 2) or (0, 1). SNP names should be stored as column names. |
meth |
a data frame or matrix containing methylation data in the columns. Methylation data should be recoded as (0, 1, 2) or (0, 1). Names of CpG sites should be stored as column names. |
y |
a numeric vector of 0 or 1. |
geno.pos |
a data frame containing SNP names and positions in two columns. |
meth.pos |
a data frame containing CpG names and positions in two columns. |
window.size |
a numeric number specifying the size of genome distance. Interaction of the SNPs and CpG sites located within the size of genome distance will be evaluated exhaustively. |
n.sample |
a numeric number specifying the number of samples to be included for estimating parameters. Default is the total number of samples. |
n.pair |
a numeric value, the number of SNP-CpG pairs to use in bootstrapping. Default = min(P, 1000). P is the total number of pairs within the |
a set of h and f values indexed by k, estimated automatically. Variable k is the number of categorical combinations of a variable pair.
Rui Sun, Maggie Haitian Wang
Maggie Haitian Wang, Rui Sun, Junfeng Guo, Haoyi Weng, Jack Lee, Inchi Hu, Pak Sham and Benny C.Y. Zee (2016). A fast and powerful W-test for pairwise epistasis testing. Nucleic Acids Research. doi:10.1093/nar/gkw347.
data(SNP.pos) data(CpG.pos) data(genotype) data(methylation) data(phenotype2) # Please note that parameter B is recommended to be greater than 400. hf.pair <- hf.snps.meth(B = 80, geno = genotype, meth = methylation, y = phenotype2, geno.pos = SNP.pos, meth.pos = CpG.pos, window.size = 1000)
data(SNP.pos) data(CpG.pos) data(genotype) data(methylation) data(phenotype2) # Please note that parameter B is recommended to be greater than 400. hf.pair <- hf.snps.meth(B = 80, geno = genotype, meth = methylation, y = phenotype2, geno.pos = SNP.pos, meth.pos = CpG.pos, window.size = 1000)
Calculate minor allele frequency.
maf(data, which.snp = NULL)
maf(data, which.snp = NULL)
data |
a data frame or matrix containing genotypes in the columns. Genotypes should be coded as (0, 1, 2) or (0, 1). |
which.snp |
a numeric value, indicating which SNP to calculate. When which.snp = NULL, MAF of all the markers is calculated. Default is NULL. |
The MAF of one marker.
data(diabetes.geno) result <- maf(diabetes.geno, which.snp=10)
data(diabetes.geno) result <- maf(diabetes.geno, which.snp=10)
This data frame contains 300 samples and 200 CpG sites.
data(methylation)
data(methylation)
The format is: subjects by rows and methylation by columns.
Code a CpG variable into two levels (high and low) by the two-mean clustering method.
methylation.recode(data)
methylation.recode(data)
data |
a data frame or matrix contains methylation data in the columns. |
data(methylation) data.recoded <- methylation.recode(methylation)
data(methylation) data.recoded <- methylation.recode(methylation)
Calculate odds ratio for a single SNP or a pair of SNPs. Single marker odds ratio is computed by contigency table as the odds of disease at minor allele vs the odds of diseases at major allele. Odds ratio of a pair of SNPs is calculated by the Logistic Regression.
odds.ratio(data, y, w.order, which.marker)
odds.ratio(data, y, w.order, which.marker)
data |
a data frame or matrix containing genotypes in the columns. Genotypes should be coded as (0, 1, 2) or (0, 1), according to minor allele count. |
y |
binary values. |
w.order |
a numeric number taking values 1 or 2. If w.order = 1, odds ratio of main effect is calculated. If w.order = 2, odds ratio of pairwise interaction is calculated. |
which.marker |
a numeric vector, when w.order = 1, a single value indicating the column index of the variable to calculate; when w.order = 2, a vector indicating the column index of a SNP-pair to calculate. |
The odds ratio of a SNP or a SNP-pair.
data(diabetes.geno) data(phenotype1) y <- as.numeric(phenotype1) OR.snp4.snp8 <- odds.ratio(diabetes.geno, y, w.order=2, which.marker = c(4,8)) OR.snp4 <- odds.ratio(diabetes.geno, y, w.order = 1, which.marker = 4)
data(diabetes.geno) data(phenotype1) y <- as.numeric(phenotype1) OR.snp4.snp8 <- odds.ratio(diabetes.geno, y, w.order=2, which.marker = c(4,8)) OR.snp4 <- odds.ratio(diabetes.geno, y, w.order = 1, which.marker = 4)
A binary variable indicate the status of 115 individuals (1 = affacted, 0 = unaffacted).
data(phenotype1)
data(phenotype1)
The format is: 115 rows and 1 column.
Wang, M. H., Li, J., Yeung, V. S. Y., Zee, B. C. Y., Yu, R. H. Y., Ho, S., & Waye, M. M. Y. (2014). Four pairs of gene-gene interactions associated with increased risk for type 2 diabetes (CDKN2BAS-KCNJ11), obesity (SLC2A9-IGF2BP2, FTO-APOA5), and hypertension (MC4R-IGF2BP2) in Chinese women. Meta Gene, 2, 384-391. http://doi.org/10.1016/j.mgene.2014.04.010
The phenotype of the 300 individuals (1 = affacted, 0 = unaffacted).
data(phenotype2)
data(phenotype2)
The format is: 300 rows and 1 column.
Positions of 200 SNPs.
data(SNP.pos)
data(SNP.pos)
The format is: SNPs by rows; column 1: names of SNPs, column 2: positions of SNPs.
Diagnostic checking of W-test probability distribution estimation.
w.diagnosis(data, w.order = c(1, 2), n.rep = 10, n.sample = nrow(data), n.marker = ncol(data), hf1 = "default.hf1", hf2 = "default.hf2", ...)
w.diagnosis(data, w.order = c(1, 2), n.rep = 10, n.sample = nrow(data), n.marker = ncol(data), hf1 = "default.hf1", hf2 = "default.hf2", ...)
data |
a data frame or matrix containing genotypes in the columns. Genotypes should be coded as (0, 1, 2) or (0, 1). |
w.order |
an integer value of 0 or 1. |
n.rep |
a numeric value, the number of bootstrapping times. |
n.sample |
a numeric value, the number of samples to use in bootstrapping. Default is the total number of samples in the data. |
n.marker |
a numeric value, the number of markers to use in bootstrapping. Default is the total number of markers. |
hf1 |
h and f values to calculate main effect, organized as a matrix, with columns (k, h, f), k = 2 to 3. Needed when |
hf2 |
h and f values to calculate interaction associations, organized as a matrix, with columns (k, h, f), k = 2 to 9. Needed when |
... |
graphical parameters. |
This function evaluates the input W values of main or interaction effects using a set of null Y by the W-test
, and the evaluation is performed in several bootstrap samples to achieve fast and stable output. The W histogram and its theoretical Chi-squared distribution density with f degrees of freedom are plotted indexed by k. Close overlaying of the histogram and the probability density curve indicates that the estimated h and f give a good test statistic probability distribution.
Rui Sun, Maggie Haitian Wang
Maggie Haitian Wang, Rui Sun, Junfeng Guo, Haoyi Weng, Jack Lee, Inchi Hu, Pak Sham and Benny C.Y. Zee (2016). A fast and powerful W-test for pairwise epistasis testing. Nucleic Acids Research. doi:10.1093/nar/gkw347.
data(diabetes.geno) # Please note that parameter B is recommended to be greater than 400. hf1 <- hf(data = diabetes.geno, w.order = 1, B = 100) hf2 <- hf(data = diabetes.geno, w.order = 2, B = 50) w.diagnosis(diabetes.geno, w.order = 1, n.rep = 100, hf1 = hf1, main=NULL, xlab=NULL, ylab=NULL) w.diagnosis(diabetes.geno, w.order = 2, n.rep = 100, hf2 = hf2, main=NULL, xlab=NULL, ylab=NULL)
data(diabetes.geno) # Please note that parameter B is recommended to be greater than 400. hf1 <- hf(data = diabetes.geno, w.order = 1, B = 100) hf2 <- hf(data = diabetes.geno, w.order = 2, B = 50) w.diagnosis(diabetes.geno, w.order = 1, n.rep = 100, hf1 = hf1, main=NULL, xlab=NULL, ylab=NULL) w.diagnosis(diabetes.geno, w.order = 2, n.rep = 100, hf2 = hf2, main=NULL, xlab=NULL, ylab=NULL)
Draw a Q-Q plot for W-test
w.qqplot(data, y, w.order = c(1, 2), input.poolsize = 200, hf1 = "default.hf1", hf2 = "default.hf2", ...)
w.qqplot(data, y, w.order = c(1, 2), input.poolsize = 200, hf1 = "default.hf1", hf2 = "default.hf2", ...)
data |
a data frame or matrix containing genotypes in the columns. Genotypes should be coded as (0, 1, 2) or (0, 1). |
y |
a numeric vector of 0 or 1. |
w.order |
a numeric number taking values 1 or 2. |
input.poolsize |
a numeric number; The maximum number of SNPs to calculate the Q-Q plot. Default is 200. The |
hf1 |
h and f values to calculate main effect, organized as a matrix, with columns (k, h, f), k = 2 to 3. Needed when |
hf2 |
h and f values to calculate interaction associations, organized as a matrix, with columns (k, h, f), k = 2 to 9. Needed when |
... |
graphical parameters. |
With a given data and y, the p-value of W-test is calculated at given h and f values, which are plotted against the theoretical distribution.
Q-Q plot
data(diabetes.geno) data(phenotype1) ## Step 1. HF Calculation # Please note that parameter B is recommended to be greater than 400. hf1<-hf(data = diabetes.geno, w.order = 1, B = 200) ## Step 2. Q-Q Plot w.qqplot(data = diabetes.geno, y = phenotype1, w.order = 1, hf1 = hf1, cex =.5) abline(0,1)
data(diabetes.geno) data(phenotype1) ## Step 1. HF Calculation # Please note that parameter B is recommended to be greater than 400. hf1<-hf(data = diabetes.geno, w.order = 1, B = 200) ## Step 2. Q-Q Plot w.qqplot(data = diabetes.geno, y = phenotype1, w.order = 1, hf1 = hf1, cex =.5) abline(0,1)
This function performs the W-test
to calculate main effect or pairwise interactions in case-control studies
for categorical data sets. The test measures target variables' distributional difference between cases and controls via a combined
log of odds ratio. It follows a Chi-squared probability distribution with data-adaptive degrees of freedom. For pairwise interaction
calculation, the user has 3 options: (1) calculate a single pair's W-value, (2) calculate pairwise interaction for a list of variables,
which p-values are smaller than a threshold (input.pval
); (3) calculate the pairwise interaction exhaustively for all variables.
For both main and interaction calculation, the output can be filtered by p-values, such that only sets with smaller p-value
than a threshold (output.pval
) will be returned. An extension of the W-test for rare variant analysis is available in zfa
package.
wtest(data, y, w.order = c(1, 2), hf1 = "default.hf1", hf2 = "default.hf2", which.marker = NULL, output.pval = NULL, sort = TRUE, input.pval = 0.1, input.poolsize = 150)
wtest(data, y, w.order = c(1, 2), hf1 = "default.hf1", hf2 = "default.hf2", which.marker = NULL, output.pval = NULL, sort = TRUE, input.pval = 0.1, input.poolsize = 150)
data |
a data frame or matrix containing genotypes in the columns. Genotypes should be coded as (0, 1, 2) or (0, 1). |
y |
a numeric vector of 0 or 1. |
w.order |
an integer value of 0 or 1. |
hf1 |
h and f values to calculate main effect, organized as a matrix, with columns (k, h, f), k = 2 to 3. Needed when |
hf2 |
h and f values to calculate interaction associations, organized as a matrix, with columns (k, h, f), k = 2 to 9. Needed when |
which.marker |
a numeric vector, when |
output.pval |
a p-value threshold for filtering the output. If NULL, all the results will be listed; otherwise, the function will only output the results with p-values smaller than the |
sort |
a logical value indicating whether or not to sort the output by p-values in ascending order. Default = TRUE. |
input.pval |
a p-value threshold to select markers for pairwise calculation, used only when |
input.poolsize |
an integer, with value less than the number of input variables. It is an optional filter to control the maximum number of variables to include in pairwise calculation, used only when |
W-test is a model-free statistical test to measure main effect or pairwise interactions in case-control studies with categorical variables. Theoretically, the test statistic follows a Chi-squared distribution with f degrees of freedom. The data-adaptive degree of freedom f, and a scalar h in the test statistics allow the W-test to correct for distributional bias due to sparse data and small sample size. Let k be the number of columns of the 2 by k contingency table formed by a single variable or a variable pair. When the sample size is large and there is no population stratification, the h and f will approximate well to the theoretical value h = (k-1)/k, and f = k-1. When sample size is small and there is population stratification, the h and f will vary to correct for distributional bias caused by the data structure.
When w.order
=2, the wtest()
will automatically calculate the main effect first and then do a pre-filter before calculating interactions.
This filtering is to avoid overloading the memory before having a better understanding of the data. User can specify a smaller input.pval such as 0.05 or 0.001
for less output, or input.pval
=1 or NULL for exhaustive pairwise calculation. Another optional filter is input.poolsize
. It will take the top input.poolsize
number of variables to calculated pairwise effect exhaustively, selected by smallest p-value; when used together with input.pval
, the smaller set will be passed to pairwise calculation.
An object "wtest"
containing:
order |
the "w.order" specified. |
results |
When |
hf1 |
The h and f values used in main effect calculation. |
hf2 |
The h and f values used in pairwise interaction calculation. |
Rui Sun, Maggie Haitian Wang
Maggie Haitian Wang, Rui Sun, Junfeng Guo, Haoyi Weng, Jack Lee, Inchi Hu, Pak Sham and Benny C.Y. Zee (2016). A fast and powerful W-test for pairwise epistasis testing. Nucleic Acids Research. doi:10.1093/nar/gkw347.
Maggie Haitian Wang, Haoyi Weng, Rui Sun, Jack Lee, William K.K. Wu, Ka Chun Chong, Benny C.Y. Zee. (2017). A Zoom-Focus algorithm (ZFA) to locate the optimal testing region for rare variant association tests. Bioinformatics, 33(15), 2330-2336.
data(diabetes.geno) data(phenotype1) ## Step 1. HF Calculation # Please note that parameter B is recommended to be greater than 400. hf1 <- hf(data = diabetes.geno, w.order = 1, B = 100) hf2 <- hf(data = diabetes.geno, w.order = 2, B = 50) ## Step 2. W-test Calculation w1 <- wtest(diabetes.geno, phenotype1, w.order = 1, hf1 = hf1) w2 <- wtest(diabetes.geno, phenotype1, w.order = 2, input.pval = 0.3, input.poolsize = 50, output.pval = 0.01, hf1 = hf1, hf2 = hf2) w.pair <- wtest(diabetes.geno, phenotype1, w.order = 2, which.marker = c(10,13), hf2 = hf2)
data(diabetes.geno) data(phenotype1) ## Step 1. HF Calculation # Please note that parameter B is recommended to be greater than 400. hf1 <- hf(data = diabetes.geno, w.order = 1, B = 100) hf2 <- hf(data = diabetes.geno, w.order = 2, B = 50) ## Step 2. W-test Calculation w1 <- wtest(diabetes.geno, phenotype1, w.order = 1, hf1 = hf1) w2 <- wtest(diabetes.geno, phenotype1, w.order = 2, input.pval = 0.3, input.poolsize = 50, output.pval = 0.01, hf1 = hf1, hf2 = hf2) w.pair <- wtest(diabetes.geno, phenotype1, w.order = 2, which.marker = c(10,13), hf2 = hf2)
This function performs the W-test
to calculate high-order interactions in case-control studies
for categorical data sets. The test measures target variables' distributional difference between cases and controls via a combined
log of odds ratio. It follows a Chi-squared probability distribution with data-adaptive degrees of freedom. For high-order interaction
calculation, the user has 3 options: (1) calculate W-test of a set of SNPs, (2) calculate high-order interaction for a list of variables,
which p-values are smaller than a threshold (input.pval
); (3) calculate high-order interaction exhaustively for all variables.
Output can be filtered by p-values, such that only sets with smaller p-value than a threshold (output.pval
) will be returned.
wtest.high(data, y, w.order = 3, hf1 = "default.hf1", hf.high.order = "default.high", which.marker = NULL, output.pval = NULL, sort = TRUE, input.pval = 0.1, input.poolsize = 10)
wtest.high(data, y, w.order = 3, hf1 = "default.hf1", hf.high.order = "default.high", which.marker = NULL, output.pval = NULL, sort = TRUE, input.pval = 0.1, input.poolsize = 10)
data |
a data frame or matrix containing genotypes in the columns. Genotypes should be coded as (0, 1, 2) or (0, 1). |
y |
a numeric vector of 0 or 1. |
w.order |
an integer value, indicating the order of high-way interactions. For example, |
hf1 |
h and f values to calculate main effect, organized as a matrix, with columns (k, h, f), k = 2 to 3. |
hf.high.order |
h and f values to calculate high-order interactions, organized as a matrix, with columns (k, h, f), where k is the number of genotype combinations of a set of SNPs. |
which.marker |
a numeric vector indicating the column index of a set of SNPs to calculate. Default |
output.pval |
a p-value threshold for filtering the output. If NULL, all the results will be listed; otherwise, the function will only output the results with p-values smaller than the |
sort |
a logical value indicating whether or not to sort the output by p-values in ascending order. Default = TRUE. |
input.pval |
a p-value threshold to select markers for high-order interaction calculation, used only when |
input.poolsize |
an integer, with value less than the number of input variables. It is an optional filter to control the maximum number of variables to include in high-order interaction calculation, used only when |
W-test is a model-free statistical test orginally proposed to measure main effect or pairwise interactions in case-control studies with categorical variables. It can be extended to high-order interaction detection by the wtest.high() function. Theoretically, the test statistic follows a Chi-squared distribution with f degrees of freedom. The data-adaptive degree of freedom f, and a scalar h in the test statistics allow the W-test to correct for distributional bias due to sparse data and small sample size. Let k be the number of columns of the 2 by k contingency table formed by a single variable or a variable pair. When the sample size is large and there is no population stratification, the h and f will approximate well to the theoretical value h = (k-1)/k, and f = k-1. When sample size is small and there is population stratification, the h and f will vary to correct for distributional bias caused by the data structure.
When w.order
> 2, the wtest()
will automatically calculate the main effect first and then do a pre-filter before calculating interactions.
This filtering is to avoid overloading the memory before having a better understanding of the data. User can specify a smaller input.pval such as 0.05 or 0.001
for less output, or input.pval
=1 or NULL for exhaustive high-order interaction calculation. Another optional filter is input.poolsize
. It will select the top input.poolsize
number of variables, ranked by p-values, to calculate high-order interactions. When used together with input.pval
, the algorithm selects the smaller set in the high-order calculation.
An object "wtest"
containing:
order |
the "w.order" specified. |
results |
When order > 2 and which.marker = NULL, the test results include: (information of a set) [SNPs name, W-value, k, p-value]; (Information of the first variable in the set) [W-value, k, p-value]; (Information of the second variable in the set) [W-value, k, p-value] ... |
hf1 |
The h and f values used in main effect calculation. |
hf2 |
The h and f values used in high-order interaction calculation. |
Rui Sun, Maggie Haitian Wang
Maggie Haitian Wang, Rui Sun, Junfeng Guo, Haoyi Weng, Jack Lee, Inchi Hu, Pak Sham and Benny C.Y. Zee (2016). A fast and powerful W-test for pairwise epistasis testing. Nucleic Acids Research. doi:10.1093/nar/gkw347.
data(diabetes.geno) data(phenotype1) ## Step 1. HF Calculation # Please note that parameter B is recommended to be greater than 400 for w.order = 1 or 2. # For high order interaction analysis (w.order > 2), it is recommended to use default n.sample. hf1 <- hf(data = diabetes.geno, w.order = 1, B = 100) hf.high <- hf(data = diabetes.geno, w.order = 3, B = 30, n.marker = 10) ## Step 2. W-test Calculation w1 <- wtest.high(diabetes.geno, phenotype1, w.order = 1, hf1 = hf1) w3 <- wtest.high(diabetes.geno, phenotype1, w.order = 3, input.pval = 0.3, input.poolsize = 50, output.pval = 0.5, hf1 = hf1, hf.high.order = hf.high) w.set <- wtest.high(diabetes.geno, phenotype1, w.order = 3, which.marker = c(10,13,20), hf.high.order = hf.high)
data(diabetes.geno) data(phenotype1) ## Step 1. HF Calculation # Please note that parameter B is recommended to be greater than 400 for w.order = 1 or 2. # For high order interaction analysis (w.order > 2), it is recommended to use default n.sample. hf1 <- hf(data = diabetes.geno, w.order = 1, B = 100) hf.high <- hf(data = diabetes.geno, w.order = 3, B = 30, n.marker = 10) ## Step 2. W-test Calculation w1 <- wtest.high(diabetes.geno, phenotype1, w.order = 1, hf1 = hf1) w3 <- wtest.high(diabetes.geno, phenotype1, w.order = 3, input.pval = 0.3, input.poolsize = 50, output.pval = 0.5, hf1 = hf1, hf.high.order = hf.high) w.set <- wtest.high(diabetes.geno, phenotype1, w.order = 3, which.marker = c(10,13,20), hf.high.order = hf.high)
Calculate cis-gene-methylation interaction of a (SNP, CpG) pair in user-defined window, and can run in a genome-wide manner. The output can be filtered by p-values, such that only sets with smaller
p-value than the threshold (output.pval
) will be returned.
wtest.snps.meth(geno, meth, y, geno.pos, meth.pos, window.size = 10000, hf = "default.hf", output.pval = NULL, sort = TRUE, which.marker = NULL)
wtest.snps.meth(geno, meth, y, geno.pos, meth.pos, window.size = 10000, hf = "default.hf", output.pval = NULL, sort = TRUE, which.marker = NULL)
geno |
a data frame or matrix containing genotypes in the columns and subjects in the rows. Genotypes should be coded as (0, 1, 2) or (0, 1). SNP names should be stored as column names of the data. |
meth |
a data frame or matrix containing methylation data in the columns. Methylation data should be recoded as (0, 1, 2) or (0, 1). Names of CpG sites should be stored as column names of the data. |
y |
a numeric vector of 0 or 1. |
geno.pos |
a data frame containing SNP names and positions in two columns. |
meth.pos |
a data frame containing CpG names and positions in two columns. |
window.size |
a numeric number specifying the size of genome distance. Interaction effects of the SNPs and CpG sites located within the size of genome distance will be evaluated exhaustively. |
hf |
h and f values to calculate gene-methylation interaction associations, organized as a matrix, with columns (k, h, f), k = 2 to 6. |
output.pval |
a p-value threshold for filtering the output. If NULL, all results will be listed; otherwise, the function will only output the results with p-values smaller than |
sort |
a logical value indicating whether or not to sort the output by p-values in ascending order. Default = TRUE. |
which.marker |
a vector indicating the column index of a SNP-CpG pair to calculate. Default |
Calculate cis-gene-methylation interaction of a (SNP, CpG) pair in user-defined window, and can run in a genome-wide manner. The output can be filtered by p-values, such that only sets with smaller
p-value than the threshold (output.pval
) will be returned.
An object "wtest.snps.meth"
containing:
results |
The test results include: SNP name, CpG name, SNP position, CpG position, W value, k, and p-value. |
hf |
The h and f values used for each k in pairwise calculation, where k = 2 to 6. |
Rui Sun, Maggie Haitian Wang
Maggie Haitian Wang, Rui Sun, Junfeng Guo, Haoyi Weng, Jack Lee, Inchi Hu, Pak Sham and Benny C.Y. Zee (2016). A fast and powerful W-test for pairwise epistasis testing. Nucleic Acids Research. doi:10.1093/nar/gkw347.
data(SNP.pos) data(CpG.pos) data(genotype) data(methylation) data(phenotype2) w <- 13000 # Recode methylation data methylation <- methylation.recode(methylation) ## Step 1. HF Calculation. # Please note that parameter B is recommended to be greater than 400. hf.pair <- hf.snps.meth(B = 80, geno = genotype, meth = methylation, y = phenotype2, geno.pos = SNP.pos, meth.pos = CpG.pos, window.size = w) ## Step 2. Application result <- wtest.snps.meth(geno = genotype, meth = methylation, y = phenotype2, geno.pos = SNP.pos, meth.pos = CpG.pos, window.size = w, hf = hf.pair, output.pval = 0.1)
data(SNP.pos) data(CpG.pos) data(genotype) data(methylation) data(phenotype2) w <- 13000 # Recode methylation data methylation <- methylation.recode(methylation) ## Step 1. HF Calculation. # Please note that parameter B is recommended to be greater than 400. hf.pair <- hf.snps.meth(B = 80, geno = genotype, meth = methylation, y = phenotype2, geno.pos = SNP.pos, meth.pos = CpG.pos, window.size = w) ## Step 2. Application result <- wtest.snps.meth(geno = genotype, meth = methylation, y = phenotype2, geno.pos = SNP.pos, meth.pos = CpG.pos, window.size = w, hf = hf.pair, output.pval = 0.1)