| Title: | Estimating the Cluster Specific Treatment Effects in Partially Nested Designs |
|---|---|
| Description: | Implements the methods for assessing heterogeneous cluster-specific treatment effects in partially nested designs as described in Liu (2024) <doi:10.1037/met0000723>. The estimation uses the multiply robust method, allowing for the use of machine learning methods in model estimation (e.g., random forest, neural network, and the super learner ensemble). Partially nested designs (also known as partially clustered designs) are designs where individuals in the treatment arm are assigned to clusters (e.g., teachers, tutoring groups, therapists), whereas individuals in the control arm have no such clustering. |
| Authors: | Xiao Liu [aut, cre] |
| Maintainer: | Xiao Liu <[email protected]> |
| License: | GPL-2 |
| Version: | 0.1.0 |
| Built: | 2026-06-01 11:40:05 UTC |
| Source: | https://github.com/xliu12/pnd.heter |
Estimation of the cluster-specific treatment effects in the partially nested design.
atekCl( data_in, ttname, Kname, Yname, Xnames, Yfamily = "gaussian", learners_tt = c("SL.glm"), learners_k = c("SL.multinom"), learners_y = c("SL.glm"), sensitivity = NULL, cv_folds = 4L, seed = NULL )atekCl( data_in, ttname, Kname, Yname, Xnames, Yfamily = "gaussian", learners_tt = c("SL.glm"), learners_k = c("SL.multinom"), learners_y = c("SL.glm"), sensitivity = NULL, cv_folds = 4L, seed = NULL )
data_in |
A |
ttname |
[ |
Kname |
[ |
Yname |
[ |
Xnames |
[ |
Yfamily |
[ |
learners_tt |
[ |
learners_k |
[ |
learners_y |
[ |
sensitivity |
Specification for sensitivity parameter values on the standardized mean difference scale, which can be |
cv_folds |
[ |
seed |
An integer that is used as argument by the |
A list containing the following components:
ate_K |
A The columns "ate_k", "std_error", "CI_lower", and "CI_upper" contain the estimate, standard error estimate, and lower and upper bounds of the 0.95 confidence interval of the cluster-specific treatment effect for the cluster (indicated by column "cluster") in the same row. |
cv_components |
A |
sens_results |
If the argument |
library(tidyverse) library(SuperLearner) library(glue) library(nnet) # data data(data_in) data_in <- data_in # baseline covariates Xnames <- c(grep("X_dat", colnames(data_in), value = TRUE)) estimates_ate_K <- PND.heter.cluster::atekCl( data_in = data_in, ttname = "tt", # treatment variable Kname = "K", # cluster assignment variable, coded as 0 for # individuals in the (non-clustered) control arm Yname = "Y", # outcome variable Xnames = Xnames, seed = 12345 ) estimates_ate_K$ate_Klibrary(tidyverse) library(SuperLearner) library(glue) library(nnet) # data data(data_in) data_in <- data_in # baseline covariates Xnames <- c(grep("X_dat", colnames(data_in), value = TRUE)) estimates_ate_K <- PND.heter.cluster::atekCl( data_in = data_in, ttname = "tt", # treatment variable Kname = "K", # cluster assignment variable, coded as 0 for # individuals in the (non-clustered) control arm Yname = "Y", # outcome variable Xnames = Xnames, seed = 12345 ) estimates_ate_K$ate_K
Checking covariate balance based on estimated cluster assignment probabilities (principal score) and treatment assignment probabilities (propensity score).
balance(data_in, atekCl_results, covariate_names = "X_dat.1", ttname, Kname)balance(data_in, atekCl_results, covariate_names = "X_dat.1", ttname, Kname)
data_in |
A |
atekCl_results |
[ |
covariate_names |
[ |
ttname |
[ |
Kname |
[ |
A data.frame containing the covariate balance measures (smd, standardized mean difference) between each cluster in the treatment arm and the control arm, both before and after the weighting adjustment.
Estimation of the cluster-specific treatment effects in the partially nested design
cluster.specific.ate( data_in, Xnames, estimator = c("trt-cluster", "trt-y", "cluster-y", "triply-robust (linear)", "triply-robust (dml)"), y1model_lme = FALSE, randomized.tt = FALSE, randomized.ttprop = NULL )cluster.specific.ate( data_in, Xnames, estimator = c("trt-cluster", "trt-y", "cluster-y", "triply-robust (linear)", "triply-robust (dml)"), y1model_lme = FALSE, randomized.tt = FALSE, randomized.ttprop = NULL )
data_in |
A |
Xnames |
A character vector of the names of the columns in "data_in" that correspond to baseline covariates (X) |
estimator |
A character vector of the names of the estimators to use for estimating the cluster-specific treatment effects (ATE_k, k = 1,...,J). The estimators currently supported include those described in Liu (2023), including (i) trt-cluster, (ii) trt-y, (iii) cluster-y, (iv) triply-robust (linear), and (v) triply-robust (dml). The estimators (i)-(iv) are implemented with the parametric models where linear terms of the baseline covariates are included. the estimator (v) triply-robust (dml) is implemented with the double machine learning procedure (Chernozhukov et al., 2018 <https://doi.org/10.1111/ectj.12097>) with two-fold cross-fitting and data-adaptive packages; specifically, for estimating the cluster assignment probability, the "xgboost" R package is used <https://cran.r-project.org/package=xgboost>; for estimating the treatment probability and the outcome mean, an ensemble of algorithms is used, including boosted trees (via the “xgboost” R package), random forest (via the "ranger" R package <https://cran.r-project.org/package=ranger>), and generalized additive model (via the "gam" R package <https://cran.r-project.org/package=xgboost>.), implemented with the super learner ensembling procedure (via the "SuperLearner" R package <https://cran.r-project.org/package=SuperLearner>). |
y1model_lme |
Whether to use the random-effects outcome regression for the estimators (ii) trt-y, (iii) cluster-y, and (iv) triply-robust (linear), where the outcome mean is involved. If "y1model_lme = TRUE", the random-intercept outcome regression (with the cluster-mean covariates and cluster-mean centered covariates) will be used; if "y1model_lme = FALSE", the fixed-effects outcome regression will be used. |
randomized.tt |
Whether the treatment assignment is randomized. If "randomized.tt = TRUE", the treatment probability will be a constant specified by the argument "randomized.ttprop", which is the proportion of individuals randomized to the treatment arm. |
randomized.ttprop |
The proportion of individuals randomized to the treatment arm. |
data(data_in) data_in <- data_in Xnames <- c(grep("X_dat", colnames(data_in), value = TRUE)) # estimates_ate_K <- cluster.specific.ate( # data_in = data_in, # Xnames = Xnames, # estimator = c("trt-cluster", # "trt-y", # "cluster-y", # "triply-robust (linear)", # "triply-robust (dml)"), # y1model_lme = FALSE, # randomized.tt = FALSE, randomized.ttprop = NULL # )data(data_in) data_in <- data_in Xnames <- c(grep("X_dat", colnames(data_in), value = TRUE)) # estimates_ate_K <- cluster.specific.ate( # data_in = data_in, # Xnames = Xnames, # estimator = c("trt-cluster", # "trt-y", # "cluster-y", # "triply-robust (linear)", # "triply-robust (dml)"), # y1model_lme = FALSE, # randomized.tt = FALSE, randomized.ttprop = NULL # )
A simulated dataset from the 2/1 partially nested design with treatment-incuded clustering
data_indata_in
A data frame with 400 rows and 8 variables:
Outcome.
Cluster assignment in the treatment arm.
Treatment assignment. 1 for individuals assigned to the treatment arm. 0 for individuals assigned to the control arm. The control arm is unclustered.
Baseline covariates.
Baseline covariates.
Baseline covariates.
Baseline covariates.
Individual id.
An example dataset with the 2/1 partially nested design where the clustering is induced by treatment delivery. The example was based on the public-use data of the National Center for Research on Early Childhood Education Teacher Professional Development Study (2007-2011; for details about the study, see this [website](https://www.childandfamilydataarchive.org/cfda/archives/cfda/studies/34848/versions/V2)). The participants were assigned to either the treatment or control arms. The treatment arm was a one-on-one, web-mediated consultancy intervention in which the participants received online coaching from one of J = 12 coaches; that is, each coach represents a cluster in this example. The control arm participants had no such clustering.
partially_nested_data_examplepartially_nested_data_example
A data frame with 308 rows and 8 variables:
The outcome variable, measuring the instructional support quality after the intervention program.
Coach (i.e., cluster) assignment for participants in the treatment arm.
Treatment assignment. 1 for participants assigned to the treatment arm to receive the intervention program. 0 for participants assigned to the control arm. The control arm is unclustered.
Baseline covariates.
Baseline covariates.
Baseline covariates.
Baseline covariates.
Baseline covariates.
Baseline covariates.
Baseline covariates.
Baseline covariates.
Baseline covariates.
Baseline covariates.
Baseline covariates.
Baseline covariates.
Baseline covariates.
Baseline covariates.
Baseline covariates.
Baseline covariates.
Baseline covariates.
Baseline covariates.
Baseline covariates.
Baseline covariates.
Participant id.