Title: | Selecting Combinations of Predictors by Leveraging Multiple AUCs for an Ordered Multilevel Outcome |
---|---|
Description: | Uses multiple AUCs to select a combination of predictors when the outcome has multiple (ordered) levels and the focus is discriminating one particular level from the others. This method is most naturally applied to settings where the outcome has three levels. (Meisner, A, Parikh, CR, and Kerr, KF (2017) <http://biostats.bepress.com/uwbiostat/paper423/>.) |
Authors: | Allison Meisner |
Maintainer: | Allison Meisner <[email protected]> |
License: | GPL-2 |
Version: | 0.1.0 |
Built: | 2025-03-07 03:18:59 UTC |
Source: | https://github.com/cran/multiselect |
When several predictors are available, there is often interest in combining a subset of predictors to diagnose disease or predict risk of a clinical outcome, . In the context of an ordered outcome with
levels, where interest is in predicting
, there are multiple ways to select a combination. The traditional approach involves dichotomizing the outcome and using logistic regression to construct the combinations, then selecting a combination based on the estimated AUC for
vs.
for each fitted combination. An alternative approach, implemented here, constructs the combinations in the same way, but uses both the AUC for
vs.
and the AUC for
vs.
. The combination with the best combined performance is then chosen. This function provides (i) the best combination defined solely by the AUC for
vs.
and (ii) the best combination defined by both the AUC for
vs.
and the AUC for
vs.
. In the context where
indicates no, mild, or severe disease (
=3), this is equivalent to (i) selecting a combination in terms of its ability to discriminate between individuals with severe vs. no or mild disease and (ii) selecting a combination in terms of its ability to discriminate between individuals with severe vs. no or mild disease and its ability to discriminate between individuals with mild vs. no disease.
multiselect(data, size=2, Breps=40, nummod=10)
multiselect(data, size=2, Breps=40, nummod=10)
data |
The name of the dataset to be used. An object of class ‘data.frame’ where the first column is the outcome, and the subsequent columns are the predictors. All columns must be numeric. The outcome must be take values 1,..., |
size |
The size of the combinations. The function considers all possible subsets of the predictors of size |
Breps |
The number of bootstrap replicates used to estimate the optimism due to resubstitution bias in the AUCs. For each combination, the function estimates the apparent AUCs for each fitted combination. These apparent AUCs are then corrected by substracting the optimism due to resubstitution bias, which is estimated using a bootstrap procedure. Default 40. |
nummod |
The number of predictor combinations to return. Using the optimism-corrected estimate of the AUC for |
For each possible predictor combination of size size
, the function fits the predictor combination using logistic regression comparing outcome to
. The apparent AUCs for (a)
vs.
and (b)
vs.
are calculated. A bootstrapping procedure is then used to estimate the optimism due to resubstitution bias in these apparent AUCs. The AUCs are corrected by subtracting the estimated optimism due to resubstitution bias. Two combinations are then selected: the combination with the highest AUC for
vs.
("single AUC" approach) and the combination with the best sum of ranks for the AUC for
vs.
and the AUC for
vs.
("multi-AUC" approach). The selected combinations may be the same for the two approaches. The top
nummod
combinations, in terms of the AUC for vs.
(corrected for optimism due to resubstitution bias), are also provided.
If more than one combination is "best" in terms of either the AUC for vs.
or the sum of ranks for the AUC for
vs.
and the AUC for
vs.
(i.e., in the event of ties) the first combination is returned. The order of the combinations for
candidate predictors is given by
combn(1:p, size)
. If ties occur for either (i) the AUC for vs.
or (ii) the sum of ranks for the AUC for
vs.
and the AUC for
vs.
, a warning is given.
A given bootstrap sample may not have observations from each of the outcome levels; if this occurs, a warning is given and the estimated optimism for that bootstrap sample for both the AUC for
vs.
and the AUC for
vs.
will be NA. NAs are removed in the calculation of the mean optimism (used to correct the AUC estimates for resubstitution bias), and the total number of NAs across the
Breps
(for either the AUC for vs.
or the AUC for
vs.
) is indicated by "numNA" in the output.
A list with the following components:
Best.Single |
The best predictor combination as chosen by the "single AUC" approach. The first |
Best.Multi |
The best predictor combination as chosen by the "multi-AUC" approach. The elements of |
Ranked.Rslts |
The results for the |
Meisner, A., Parikh, C.R., and Kerr, K.F. (2017). Using multilevel outcomes to construct and select biomarker combinations for single-level prediction. UW Biostatistics Working Paper Series, Working Paper 423.
library(MASS) ## example takes ~1 minute to run set.seed(15) p = 16 ## number of predictors matX <- matrix(rep(0.3,p*p), nrow=p, ncol=p) ## covariance matrix for the predictors diag(matX) <- rep(1,p) simD <- apply(rmultinom(400, 1, c(0.6,0.335,0.065)),2,which.max) simDord <- simD[order(simD)] numobs <- table(simDord) simX1 <- mvrnorm(numobs[1], rep(0,p), 2*matX) simX2 <- mvrnorm(numobs[2], c(1.5, 1, rep(0.5,(p-2)/2), rep(0.1,(p-2)/2)), 2*matX) simX3 <- mvrnorm(numobs[3], c(rep(2,2), rep(0.8,(p-2)/2), rep(0.1,(p-2)/2)), 2*matX) simX <- rbind(simX1, simX2, simX3) exdata <- data.frame("D"=simDord, simX) multiselect(data=exdata, size=2, Breps=20, nummod=10)
library(MASS) ## example takes ~1 minute to run set.seed(15) p = 16 ## number of predictors matX <- matrix(rep(0.3,p*p), nrow=p, ncol=p) ## covariance matrix for the predictors diag(matX) <- rep(1,p) simD <- apply(rmultinom(400, 1, c(0.6,0.335,0.065)),2,which.max) simDord <- simD[order(simD)] numobs <- table(simDord) simX1 <- mvrnorm(numobs[1], rep(0,p), 2*matX) simX2 <- mvrnorm(numobs[2], c(1.5, 1, rep(0.5,(p-2)/2), rep(0.1,(p-2)/2)), 2*matX) simX3 <- mvrnorm(numobs[3], c(rep(2,2), rep(0.8,(p-2)/2), rep(0.1,(p-2)/2)), 2*matX) simX <- rbind(simX1, simX2, simX3) exdata <- data.frame("D"=simDord, simX) multiselect(data=exdata, size=2, Breps=20, nummod=10)