Title: | Microbiome Based Sum of Powered Score (MiSPU) Tests |
---|---|
Description: | There is an increasing interest in investigating how the compositions of microbial communities are associated with human health and disease. In this package, we present a novel global testing method called aMiSPU, that is highly adaptive and thus high powered across various scenarios, alleviating the issue with the choice of a phylogenetic distance. Our simulations and real data analysis demonstrated that aMiSPU test was often more powerful than several competing methods while correctly controlling type I error rates. |
Authors: | Chong Wu, Wei Pan |
Maintainer: | Chong Wu <[email protected]> |
License: | GPL-2 |
Version: | 1.0 |
Built: | 2024-11-10 04:44:51 UTC |
Source: | https://github.com/chongwu-biostat/mispu |
There is an increasing interest in investigating how the compositions of microbial communities are associated with human health and disease. In this package, we present a novel global testing method called aMiSPU, that is highly adaptive and thus high powered across various scenarios, alleviating the issue with the choice of a phylogenetic distance. Our simulations and real data analysis demonstrated that aMiSPU test was often more powerful than several competing methods while correctly controlling type I error rates.
Package: | MiSPU |
Type: | Package |
Version: | 1.0 |
Date: | 2016-02-29 |
License: | GPL-2 |
Chong Wu, Wei Pan Maintainer: Chong Wu <[email protected]>
Pan, W., et al.(2014) A powerful and adaptive association test for rare variants, Genetics, 197(4), 1081-95
Chong, W., Pan, W. (2015) An Adaptive Association Test for Microbiome Data, submitted.
data(throat.otu.tab) data(throat.tree) data(throat.meta) Y.tmp =throat.meta[,3] Y = rep(0,dim(throat.meta)[1]) Y[Y.tmp=="Smoker"] = 1 cov.tmp = throat.meta[,c(10,12)] cov = matrix(1,dim(throat.meta)[1],2) cov[cov.tmp[,1]== "None",1] = 0 cov[cov.tmp[,2]== "Male",2] = 0 start.time = proc.time() X = as.matrix(throat.otu.tab) out = MiSPU(Y,X, throat.tree,cov,model = "binomial", pow = c(2:8, Inf), n.perm = 1000) out
data(throat.otu.tab) data(throat.tree) data(throat.meta) Y.tmp =throat.meta[,3] Y = rep(0,dim(throat.meta)[1]) Y[Y.tmp=="Smoker"] = 1 cov.tmp = throat.meta[,c(10,12)] cov = matrix(1,dim(throat.meta)[1],2) cov[cov.tmp[,1]== "None",1] = 0 cov[cov.tmp[,2]== "Male",2] = 0 start.time = proc.time() X = as.matrix(throat.otu.tab) out = MiSPU(Y,X, throat.tree,cov,model = "binomial", pow = c(2:8, Inf), n.perm = 1000) out
Finding the descendants for each taxa.
correspLeaves(tree)
correspLeaves(tree)
tree |
Rooted phylogenetic tree of R class “phylo” |
A list containing the descendants for each taxa.
Chong Wu
Chong, W., Pan, W. (2015) An Adaptive Association Test for Microbiome Data, submitted.
data(throat.tree) correspLeaves(throat.tree)
data(throat.tree) correspLeaves(throat.tree)
Density function and random number generation for the Dirichlet distribution
rdirichlet(n, alpha)
rdirichlet(n, alpha)
n |
number of random observations to draw |
alpha |
the Dirichlet distribution's parameters. Can be a vector (one set of parameters for all observations) or a matrix (a different set of parameters for each observation), see “Details” |
The Dirichlet distribution is a multidimensional generalization of the Beta distribution where each dimension is governed by an -parameter.
Formally this is
Usually, alpha
is a vector thus the same parameters will be used for all observations.
If alpha
is a matrix, a complete set of -parameters must be supplied for each observation.
returns a matrix with random numbers according to the supplied alpha vector or matrix.
Chong Wu
X1 <- rdirichlet(100, c(5, 5, 10)) X1
X1 <- rdirichlet(100, c(5, 5, 10)) X1
The estimate of Dirichlet-multinomial distribution. It just intermidates estimate.
data(dd)
data(dd)
The format is: matrix
It just intermidates estimate. Not very useful.
data(dd)
data(dd)
A generalized version of commonly used UniFrac distances. It is defined as:
where is the number of branches,
is the length of
th branch,
are the branch
proportion for community A and B.
Generalized UniFrac distance contains an extra parameter
controlling the weight on abundant lineages so the distance is not dominated
by highly abundant lineages.
has overall the best
power.
GUniFrac(otu.tab, tree,alpha = c(0,0.5,1))
GUniFrac(otu.tab, tree,alpha = c(0,0.5,1))
otu.tab |
OTU count table, row - n sample, column - q OTU |
tree |
Rooted phylogenetic tree of R class “phylo” |
alpha |
Parameter controlling weight on abundant lineages |
Return a list containing
d0 |
UniFrac(0) |
d5 |
UniFrac(0.5) |
d1 |
UniFrac(1), weighted UniFrac |
or a list containing
GUniFrac |
The distance matrix for different alpha |
alpha |
The weight |
The time consuming part is written in C and faster than the original one. The function only accepts rooted tree.
Chong Wu <[email protected]>
Chen, Jun, et al (2012). "Associating microbiome composition with environmental covariates using generalized UniFrac distances." Bioinformatics 28(16):2106-2113.
data(throat.otu.tab) data(throat.tree) data(throat.meta) groups <- throat.meta$SmokingStatus # Calculate the UniFracs unifracs <- GUniFrac(throat.otu.tab, throat.tree) unifracs
data(throat.otu.tab) data(throat.tree) data(throat.meta) groups <- throat.meta$SmokingStatus # Calculate the UniFracs unifracs <- GUniFrac(throat.otu.tab, throat.tree) unifracs
We propose a class of microbiome based sum of powered score (MiSPU) tests based on a newly defined generalized taxon proportion that combines observed microbial composition information with phylogenetic tree information. Different from the existing methods, a MiSPU test is based on a weighted score of the generalized taxon proportion in a general framework of regression, upweighting more likely to be associated microbial lineages. Our simulations demonstrated that one or more MiSPU tests were more powerful than MiRKAT while correctly controlling type I error rates. An adaptive MiSPU (aMiSPU) test is proposed to combine multiple MiSPU tests with various weights, approximating the most powerful MiSPU for a given scenario, consequently being highly adaptive and high powered across various scenarios.
MiSPU(y, X, tree, cov = NULL,model = c("gaussian", "binomial"), pow = c(2:8, Inf), n.perm = 1000)
MiSPU(y, X, tree, cov = NULL,model = c("gaussian", "binomial"), pow = c(2:8, Inf), n.perm = 1000)
y |
Outcome of interest. It can be a disease indicator; =0 for controls, =1 for cases. Or it can be a quantitative trait. A vector with length n (number of observations). |
X |
OTU count table, row - n sample, column - q OTU |
tree |
Rooted phylogenetic tree of R class “phylo” |
cov |
Covariates. A matrix with dimension n by p (n :number of observation, p : number of covariates). |
model |
Use "gaussian" for a quantitative trait, and use "binomial" for a binary trait. |
pow |
The gamma which controls the weight. Larger pow puts more weight on the variables that have larger absolute score. |
n.perm |
number of permutations or bootstraps. |
A list object, including the results for MiSPU_u, MiSPU_w and aMiSPU.
Chong Wu
Pan, W., et al.(2014) A powerful and adaptive association test for rare variants, Genetics, 197(4), 1081-95
Chong, W., Pan, W. (2015) An Adaptive Association Test for Microbiome Data, submitted.
data(throat.otu.tab) data(throat.tree) data(throat.meta) Y.tmp =throat.meta[,3] Y = rep(0,dim(throat.meta)[1]) Y[Y.tmp=="Smoker"] = 1 cov.tmp = throat.meta[,c(10,12)] cov = matrix(1,dim(throat.meta)[1],2) cov[cov.tmp[,1]== "None",1] = 0 cov[cov.tmp[,2]== "Male",2] = 0 start.time = proc.time() X = as.matrix(throat.otu.tab) out = MiSPU(Y,X, throat.tree,cov,model = "binomial", pow = c(2:8, Inf), n.perm = 1000) out
data(throat.otu.tab) data(throat.tree) data(throat.meta) Y.tmp =throat.meta[,3] Y = rep(0,dim(throat.meta)[1]) Y[Y.tmp=="Smoker"] = 1 cov.tmp = throat.meta[,c(10,12)] cov = matrix(1,dim(throat.meta)[1],2) cov[cov.tmp[,1]== "None",1] = 0 cov[cov.tmp[,2]== "Male",2] = 0 start.time = proc.time() X = as.matrix(throat.otu.tab) out = MiSPU(Y,X, throat.tree,cov,model = "binomial", pow = c(2:8, Inf), n.perm = 1000) out
Ranking the importance of each taxa.
ranking(y, X, tree, cov = NULL,gamma,g.taxon.index,model = "binomial")
ranking(y, X, tree, cov = NULL,gamma,g.taxon.index,model = "binomial")
y |
Outcome of interest. It can be a disease indicator; =0 for controls, =1 for cases. Or it can be a quantitative trait. A vector with length n (number of observations). |
X |
OTU count table, row - n sample, column - q OTU |
tree |
Rooted phylogenetic tree of R class “phylo” |
cov |
Covariates. A matrix with dimension n by p (n :number of observation, p : number of covariates). |
gamma |
The best gamma selected by aMiSPU test. |
g.taxon.index |
g.taxon.index = 1 stands for weigted generalized taxon proportion; otherwise means unweighted generalized taxon proportion. |
model |
Use "gaussian" for a quantitative trait, and use "binomial" for a binary trait. |
A matrix containing the ranking score, the higher the more important.
Chong Wu
Chong, W., Pan, W. (2015) An Adaptive Association Test for Microbiome Data, submitted.
data(throat.otu.tab) data(throat.tree) data(throat.meta) Y.tmp =throat.meta[,3] Y = rep(0,dim(throat.meta)[1]) Y[Y.tmp=="Smoker"] = 1 cov.tmp = throat.meta[,c(10,12)] cov = matrix(1,dim(throat.meta)[1],2) cov[cov.tmp[,1]== "None",1] = 0 cov[cov.tmp[,2]== "Male",2] = 0 start.time = proc.time() X = as.matrix(throat.otu.tab) #out = MiSPU(Y,X, throat.tree,cov,model = "binomial", pow = c(2:8, Inf), n.perm = 1000) out = ranking(Y,X, throat.tree,cov,gamma = 2, g.taxon.index =1)
data(throat.otu.tab) data(throat.tree) data(throat.meta) Y.tmp =throat.meta[,3] Y = rep(0,dim(throat.meta)[1]) Y[Y.tmp=="Smoker"] = 1 cov.tmp = throat.meta[,c(10,12)] cov = matrix(1,dim(throat.meta)[1],2) cov[cov.tmp[,1]== "None",1] = 0 cov[cov.tmp[,2]== "Male",2] = 0 start.time = proc.time() X = as.matrix(throat.otu.tab) #out = MiSPU(Y,X, throat.tree,cov,model = "binomial", pow = c(2:8, Inf), n.perm = 1000) out = ranking(Y,X, throat.tree,cov,gamma = 2, g.taxon.index =1)
We used a phylogenetic tree of OTUs from a real throat microbiome data set, which consists of 856 OTUs after discarding singleton OTUs. Then based on a complicate statistical meodel, we generated the OTU counts for each individual to simulate the feature of real microbiome data.
simulateData(nSam=100, s=12, ncluster = 20, mu = 1000, size = 25)
simulateData(nSam=100, s=12, ncluster = 20, mu = 1000, size = 25)
nSam |
Sample size, the default value is 100. |
s |
The indicator of informative cluster. |
ncluster |
Positive integer specifying the number of clusters of they phylogenetic tree. |
mu |
The mean of the negative binomial distribution. |
size |
The size of the negative binomial distribution |
A list object, including the informative OTU counts table and whole OTU counts table.
Chong Wu
Chong, W., Pan, W. (2015) An Adaptive Association Test for Microbiome Data, submitted.
OTU = simulateData() OTU
OTU = simulateData() OTU
It is part of a microbiome data set for studying the effect of smoking on the upper respiratory tract microbiome. The original data set contains samples from both throat and nose microbiomes, and from both body sides. This data set comes from the throat microbiome of left body side. It contains 60 subjects consisting of 32 nonsmokers and 28 smokers.
data(throat.meta)
data(throat.meta)
The format is: chr "throat.meta"
Charlson ES, Chen J, Custers-Allen R, Bittinger K, Li H, et al. (2010) Disordered Microbial Communities in the Upper Respiratory Tract of Cigarette Smokers. PLoS ONE 5(12): e15216.
data(throat.meta) ## maybe str(throat.meta) ; plot(throat.meta) ...
data(throat.meta) ## maybe str(throat.meta) ; plot(throat.meta) ...
It is part of a microbiome data set for studying the effect of smoking on the upper respiratory tract microbiome. The original data set contains samples from both throat and nose microbiomes, and from both body sides. This data set comes from the throat microbiome of left body side. It contains 60 subjects consisting of 32 nonsmokers and 28 smokers.
data(throat.otu.tab)
data(throat.otu.tab)
The format is: chr "throat.otu.tab"
The OTU table is produced by the QIIME software. Singleton OTUs have been discarded.
Charlson ES, Chen J, Custers-Allen R, Bittinger K, Li H, et al. (2010) Disordered Microbial Communities in the Upper Respiratory Tract of Cigarette Smokers. PLoS ONE 5(12): e15216.
data(throat.otu.tab) ## maybe str(throat.otu.tab) ; plot(throat.otu.tab) ...
data(throat.otu.tab) ## maybe str(throat.otu.tab) ; plot(throat.otu.tab) ...
The OTU tree is constructed using UPGMA on the K80 distance matrice of the OTUs. It is a rooted tree of class "phylo".
data(throat.tree)
data(throat.tree)
The format is: chr "throat.tree"
The OTUs are produced by the QIIME software. Singleton OTUs have been discarded.
Charlson ES, Chen J, Custers-Allen R, Bittinger K, Li H, et al. (2010) Disordered Microbial Communities in the Upper Respiratory Tract of Cigarette Smokers. PLoS ONE 5(12): e15216.
data(throat.tree) ## maybe str(throat.tree) ; plot(throat.tree) ...
data(throat.tree) ## maybe str(throat.tree) ; plot(throat.tree) ...