This Notebook replicates the main results of the paper using a different seed (results in the paper are obtained with 1234, below 12345 is specified). The notebook should help to …
show the underlying code for researchers interested in replicating the analysis using the same data or adapting it to new datasets
show that most of the results reported in the paper are not sensitive to the seed used to initialize the random number generators that are called for defining the cross-fitting folds and within random forests
The explanations throughout the notebook are kept short. Please see the paper for more explanations regarding the parameters of interest and their Double Machine Learning based estimation.
Click on the “Code” button on the top right of this notebook to “Hide All Code” and see only the results or “Download Rmd” to extract the underlying code.
Running the analyses in this notebook took roughly two days on a SWITCHengine with eight cores and 32 GB RAM.
The analysis is mostly based on the causalDML R-package that was developed in parallel to the paper. The preparation of the “Database_SALMP_MCK2020.Rdata” that is loaded can be replicated by requesting the data via FORSbase and running the data preparation file.
## Define and set seed
seed = 12345
set.seed(seed)
## Load relevant packages, in case you did not yet install causalDML uncomment the next two lines
# library(devtools)
# install_github(repo="MCKnaus/causalDML")
library(causalDML)
library(policytree)
library(tidyverse)
library(lmtest)
library(sandwich)
library(matrixStats)
## Set to path where data are stored for replication
# setwd("C:/Users/Administrator/switchdrive/Papers/Multi Treat Labor/Paper/Data")
load("Database_SALMP_MCK2020.RData")
The causalDML command estimates first the cross-fitted nuisance parameters and uses them to calculate the double robust scores for APO and ATE, which are then used to estimate the APO and ATE(T).
# Create the ML method to be used for nuisance parameter prediction
forest = create_method("forest_grf",args=list(tune.parameters = "all",seed=seed))
# Run the main function that outputs nuisance parameters, APO and ATE
cDML = causalDML(y,w,x,ml_w=list(forest),ml_y=list(forest))
The following table and plot shows the DML estimated APOs.
summary(cDML$APO)
APO SE
no program 14.79627 0.05607573
job search 13.75765 0.11484011
vocational 18.01463 0.42204324
computer 18.03404 0.42602710
language 17.28424 0.37482108
plot(cDML$APO,label_w)
The following table shows the results of all possible ATEs given the five different treatments:
summary(cDML$ATE)
ATE SE t p
job search - no program -1.038615 0.125267 -8.2912 < 2.2e-16 ***
vocational - no program 3.218358 0.425095 7.5709 3.757e-14 ***
computer - no program 3.237774 0.429024 7.5468 4.520e-14 ***
language - no program 2.487976 0.377771 6.5859 4.556e-11 ***
vocational - job search 4.256973 0.436936 9.7428 < 2.2e-16 ***
computer - job search 4.276389 0.440728 9.7030 < 2.2e-16 ***
language - job search 3.526591 0.391096 9.0172 < 2.2e-16 ***
computer - vocational 0.019416 0.599454 0.0324 0.9742
language - vocational -0.730382 0.563997 -1.2950 0.1953
language - computer -0.749798 0.566864 -1.3227 0.1859
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
The following tables show the results for all possible ATETs given the five different treatments. The first line shows for example an effect of -1.045 for being in a job search program vs. no program for those who were actually observed in no program, or formally adapting the notation of the paper \(E[Y(jo b~search) - Y(no~program) \mid W = no~program]\). As we can condition on all five programs, we get five tables:
# Estimate ATETs for the programs
atets = matrix(NA,4,2)
for (i in 1:5) {
print(label_w[i])
APO_atet = APO_dml_atet(y,cDML$APO$m_mat,cDML$APO$w_mat,cDML$APO$e_mat,cDML$APO$cf_mat,treated=i)
ATET = ATE_dml(APO_atet)
summary(ATET)
atets[i-1,] = ATET$results[i-1,1:2]
}
[1] "No program"
ATET SE t p
job search - no program -1.061727 0.130743 -8.1207 4.717e-16 ***
vocational - no program 3.250641 0.425757 7.6350 2.291e-14 ***
computer - no program 3.195185 0.432797 7.3826 1.571e-13 ***
language - no program 2.449338 0.383402 6.3884 1.688e-10 ***
vocational - job search 4.312368 0.438981 9.8236 < 2.2e-16 ***
computer - job search 4.256912 0.445830 9.5483 < 2.2e-16 ***
language - job search 3.511065 0.398179 8.8178 < 2.2e-16 ***
computer - vocational -0.055456 0.602127 -0.0921 0.9266
language - vocational -0.801303 0.568047 -1.4106 0.1584
language - computer -0.745847 0.573260 -1.3011 0.1932
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
[1] "Job search"
ATET SE t p
job search - no program -0.98007 0.12116 -8.0891 6.116e-16 ***
vocational - no program 3.02888 0.47650 6.3565 2.078e-10 ***
computer - no program 3.43079 0.46849 7.3231 2.452e-13 ***
language - no program 2.75514 0.42315 6.5110 7.520e-11 ***
vocational - job search 4.00895 0.48375 8.2873 < 2.2e-16 ***
computer - job search 4.41086 0.47582 9.2699 < 2.2e-16 ***
language - job search 3.73521 0.43112 8.6639 < 2.2e-16 ***
computer - vocational 0.40191 0.66017 0.6088 0.5427
language - vocational -0.27374 0.62994 -0.4345 0.6639
language - computer -0.67565 0.62361 -1.0834 0.2786
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
[1] "Vocational"
ATET SE t p
job search - no program -1.382461 0.155460 -8.8927 < 2.2e-16 ***
vocational - no program 2.787897 0.392552 7.1020 1.243e-12 ***
computer - no program 2.575450 0.467496 5.5090 3.622e-08 ***
language - no program 2.867676 0.402898 7.1176 1.110e-12 ***
vocational - job search 4.170358 0.423084 9.8570 < 2.2e-16 ***
computer - job search 3.957911 0.485932 8.1450 3.863e-16 ***
language - job search 4.250137 0.426540 9.9642 < 2.2e-16 ***
computer - vocational -0.212447 0.587166 -0.3618 0.7175
language - vocational 0.079779 0.545000 0.1464 0.8836
language - computer 0.292226 0.591884 0.4937 0.6215
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
[1] "Computer"
ATET SE t p
job search - no program -1.26680 0.15684 -8.0769 6.761e-16 ***
vocational - no program 3.10856 0.48660 6.3884 1.688e-10 ***
computer - no program 3.56250 0.39968 8.9134 < 2.2e-16 ***
language - no program 3.37081 0.47934 7.0322 2.053e-12 ***
vocational - job search 4.37536 0.50556 8.6545 < 2.2e-16 ***
computer - job search 4.82930 0.42993 11.2329 < 2.2e-16 ***
language - job search 4.63761 0.50019 9.2717 < 2.2e-16 ***
computer - vocational 0.45394 0.60194 0.7541 0.4508
language - vocational 0.26225 0.66088 0.3968 0.6915
language - computer -0.19169 0.59775 -0.3207 0.7485
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
[1] "Language"
ATET SE t p
job search - no program -0.60883 0.19432 -3.1332 0.0017301 **
vocational - no program 4.02450 0.79810 5.0426 4.605e-07 ***
computer - no program 3.36646 0.73306 4.5924 4.391e-06 ***
language - no program 1.04701 0.29307 3.5725 0.0003538 ***
vocational - job search 4.63333 0.80990 5.7208 1.065e-08 ***
computer - job search 3.97529 0.74590 5.3296 9.880e-08 ***
language - job search 1.65585 0.33816 4.8966 9.776e-07 ***
computer - vocational -0.65804 1.05962 -0.6210 0.5345920
language - vocational -2.97748 0.84299 -3.5320 0.0004127 ***
language - computer -2.31944 0.78089 -2.9703 0.0029766 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Like in the paper, we focus here and in the following on the comparison of programs vs. no program and ignore the comparisons between programs. The average effects are very similar to those reported in the paper, which shows that the average effects are robust to the alternative seed.
# Plot ATE and ATET together
ates = cDML$ATE$results[1:4,1:2]
df = data.frame(Estimand = c(rep("ATE",4),rep("ATET",4)),
program = factor(rep(label_w[-1],2),label_w[-1]),
effect = c(ates[,1],atets[,1]))
df$low = df$effect - 1.96 * c(ates[,2],atets[,2])
df$up = df$effect + 1.96 * c(ates[,2],atets[,2])
ggplot(data = df,mapping = aes(x = program, y = effect,ymin = low,
ymax = up,group = Estimand, linetype = Estimand)) +
geom_point(size=2,position = position_dodge(width = 0.4)) +
geom_errorbar(width=0.2,position = position_dodge(width = 0.4)) +
theme_bw() + theme(axis.text.x = element_text(angle = 45, hjust = 1),axis.title.x=element_blank()) +
ylab("Average treatment effects") + geom_hline(yintercept = 0) +
theme(text=element_text(family="serif",size = 14, colour="black"))
The output of the causalDML command contains an APO_dml object and an ATE_dml object that carry a gamma and delta matrix storing the doubly robust APO and ATE scores, respectively. The delta matrix of the ATE_dml object can be used to conveniently estimate heterogeneous effects.
Group average treatment effects (GATEs) represent the average effects in different subgroups. Such subgroup analyses are an integral part of most program evaluations and usually require to rerun the preferred method in subsamples. Using the doubly robust ATE score as pseudo outcome in a standard OLS regression and a group indicator as explanatory variable estimates GATEs.
The results shown in the following are very similar to those presented in Table 4 of the paper, suggesting that also the GATEs are not sensitive to the choice of the seed.
## GATEs gender
for (i in 1:4) {
print(label_w[i+1])
temp_ols = lm(cDML$ATE$delta[,i] ~ z_blp$female)
print(coeftest(temp_ols, vcov = vcovHC(temp_ols)))
}
[1] "Job search"
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.30196 0.16520 -7.8811 3.296e-15 ***
z_blp$female 0.59799 0.25324 2.3613 0.01821 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
[1] "Vocational"
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.76423 0.54987 6.8457 7.679e-12 ***
z_blp$female -1.23955 0.86366 -1.4352 0.1512
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
[1] "Computer"
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.15071 0.59528 3.6129 0.000303 ***
z_blp$female 2.46847 0.85507 2.8869 0.003892 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
[1] "Language"
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.24428 0.46511 6.9753 3.082e-12 ***
z_blp$female -1.71740 0.77643 -2.2119 0.02698 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
## GATEs foreigner
for (i in 1:4) {
print(label_w[i+1])
temp_ols = lm(cDML$ATE$delta[,i] ~ z_blp$foreigner)
print(coeftest(temp_ols, vcov = vcovHC(temp_ols)))
}
[1] "Job search"
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.29340 0.15933 -8.1175 4.843e-16 ***
z_blp$foreigner 0.70197 0.25746 2.7265 0.006403 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
[1] "Vocational"
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.41846 0.53053 4.5586 5.16e-06 ***
z_blp$foreigner 2.20381 0.88661 2.4857 0.01293 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
[1] "Computer"
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.66128 0.49838 7.3464 2.061e-13 ***
z_blp$foreigner -1.16680 0.93833 -1.2435 0.2137
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
[1] "Language"
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.47048 0.52174 6.6518 2.919e-11 ***
z_blp$foreigner -2.70691 0.71890 -3.7654 0.0001665 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
## GATEs employability
for (i in 1:4) {
print(label_w[i+1])
temp_ols = lm(cDML$ATE$delta[,i] ~ z_blp$employability)
print(coeftest(temp_ols, vcov = vcovHC(temp_ols)))
}
[1] "Job search"
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.20813 0.32970 -0.6313 0.52786
z_blp$employability2 -0.90582 0.36020 -2.5148 0.01191 *
z_blp$employability3 -1.62687 0.49524 -3.2850 0.00102 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
[1] "Vocational"
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.6555 1.0312 5.4845 4.164e-08 ***
z_blp$employability2 -2.6602 1.1489 -2.3156 0.020585 *
z_blp$employability3 -4.7576 1.5087 -3.1534 0.001614 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
[1] "Computer"
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.3963 1.0863 4.9676 6.796e-07 ***
z_blp$employability2 -2.4153 1.1966 -2.0186 0.04353 *
z_blp$employability3 -3.7462 1.6566 -2.2613 0.02374 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
[1] "Language"
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.86716 0.84162 3.4067 0.0006579 ***
z_blp$employability2 -0.49344 0.95351 -0.5175 0.6048096
z_blp$employability3 -0.11146 1.46490 -0.0761 0.9393486
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
We can also model effect heterogeneity in a multivariate OLS. Again the ATE score is reused as a pseudo-outcome in a plain vanilla OLS regression with heteroscedasticity robust standard errors. The estimated coefficients are again mostly in line with the ones reported in Table 5 of the paper.
## BLP CATEs
for (i in 1:4) {
print(label_w[i+1])
temp_df = data.frame(y = cDML$ATE$delta[,i])
temp_df = cbind(temp_df,z_blp)
temp_ols = lm("y ~ .",data = temp_df)
print(coeftest(temp_ols, vcov = vcovHC(temp_ols)))
}
[1] "Job search"
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.427987 0.703382 -0.6085 0.54288
female 0.236418 0.266986 0.8855 0.37589
age 0.022365 0.014863 1.5048 0.13238
foreigner 0.489428 0.264692 1.8491 0.06446 .
employability2 -0.617320 0.369749 -1.6696 0.09501 .
employability3 -1.148397 0.510107 -2.2513 0.02437 *
past_incomein10000 -0.264551 0.064185 -4.1217 3.766e-05 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
[1] "Vocational"
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.573663 2.386915 1.9161 0.055352 .
female -2.169673 0.921052 -2.3556 0.018494 *
age 0.100046 0.049381 2.0260 0.042770 *
foreigner 1.583702 0.899843 1.7600 0.078417 .
employability2 -1.821302 1.169122 -1.5578 0.119277
employability3 -3.365911 1.538792 -2.1874 0.028719 *
past_incomein10000 -0.687101 0.230396 -2.9823 0.002862 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
[1] "Computer"
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.958203 2.447231 2.0260 0.04276 *
female 1.837452 0.911824 2.0151 0.04389 *
age 0.045730 0.050252 0.9100 0.36281
foreigner -1.507276 0.962790 -1.5655 0.11746
employability2 -2.129763 1.219728 -1.7461 0.08080 .
employability3 -3.350109 1.684136 -1.9892 0.04668 *
past_incomein10000 -0.405734 0.187346 -2.1657 0.03034 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
[1] "Language"
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.026019 2.091576 2.8811 0.003964 **
female -1.344921 0.809590 -1.6612 0.096671 .
age -0.067032 0.044410 -1.5094 0.131202
foreigner -2.649330 0.741761 -3.5717 0.000355 ***
employability2 -1.092292 0.978464 -1.1163 0.264284
employability3 -1.160974 1.502992 -0.7724 0.439856
past_incomein10000 0.324794 0.179486 1.8096 0.070366 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
rm(temp_df,temp_ols)
Next, we reuse the ATE score as pseudo outcome in a one-dimensional kernel regression. The causalDML package implements this based on the np package.
The figures show the results in the following order: job search, vocational, computer and language. They are all very similar to the ones provided in the paper.
## Kernel regression CATEs
list_kr_cates_inc = vector("list",4)
for (i in 1:4) {
list_kr_cates_inc[[i]] = kr_cate(cDML$ATE$delta[,i],x[,"past_income"])
}
# Plot Kernel regression CATEs
for (i in 1:4) {
print(plot(list_kr_cates_inc[[i]],z_label="Past income",yrange=c(-5,10)))
}
Next, we reuse the ATE score as pseudo outcome in a one-dimensional series regression. The causalDML package implements this based on the crs package.
The figures show the results in the following order: job search, vocational, computer and language. The only notable difference occurs for language programs. The estmated function in the paper is S-shaped, while here it is linear. However, the observation that job seekers with lower previous income benefit less from language training is by and large confirmed.
## Series regression GATEs
list_sr_cates_inc = vector("list",4)
for (i in 1:4) {
list_sr_cates_inc[[i]] = spline_cate(cDML$ATE$delta[,i],x[,"past_income"])
}
# Plot Series regression GATEs
for (i in 1:4) {
print(plot(list_sr_cates_inc[[i]],z_label="Past income",yrange=c(-5,10)))
}
We further predict the individualized average treatment effects (IATEs) using the DR- and the NDR-learner. Like in the paper, we focus on the out-of-sample variant described in Algorithms 1 and 2 of the paper that are implemented using the ndr_learner function of the causalDML package.
The results confirm the potentially erratic behavior of the DR-learner. Like in the paper, predicted IATEs can take values outside of the possible bounds. However, the outliers here occur also for vocational training. Most importantly, the NDR-learner stabilizes the estimates substantially also in this replication.
## (N)DR-learner
ndr = ndr_learner(y,w,x,ml_w = list(forest),ml_y = list(forest),ml_tau = list(forest),compare_all = FALSE)
# Plot the results
df_box = NULL
for (i in 1:4) {
df = data.frame("DRL" = ndr$cates[i,,1], "NDRL" = ndr$cates[i,,2])
df = gather(df)
df = cbind(label_w[i+1],df)
colnames(df)[1] = "label"
df_box = rbind(df_box,df)
}
ggplot(data=df_box) + geom_boxplot( aes(x=factor(label,label_w[-1]),y=value,fill=key)) +
theme_bw() + theme(axis.title.x=element_blank(),legend.title = element_blank()) +
ylab("Individualized average treatment effect") + geom_hline(yintercept = 0) + geom_hline(yintercept = -31,linetype="dashed") +
geom_hline(yintercept = 31,linetype="dashed") +
theme(text=element_text(family="serif",size = 16, colour="black"),axis.text.x = element_text(angle = 45, hjust = 1)) +
scale_fill_grey(start = 0.9,end=0.4)
Also the classification analysis shows that the same eight variables as in Table 6 of the paper are most predictive of being in the highest or lowest quintile of the IATE distribution:
# Classification analysis
diff = matrix(NA,ncol(x),4)
rownames(diff) = label_x
colnames(diff) = label_w[-1]
for (i in 1:4) {
diff[,i] = clan(ndr$cates[i,,2],scale(x))[,5][-1] - clan(ndr$cates[i,,2],scale(x))[,1][-1]
}
round(diff[order(rowMaxs(abs(diff)),decreasing=T),][1:8,],digits=2)
Job search Vocational Computer Language
Previous job: unskilled worker 1.03 0.58 0.34 -1.32
Mother tongue other than German, French, Italian 0.74 0.57 0.00 -1.30
Past income -1.29 -0.92 -1.14 1.17
Swiss citizen -0.73 -0.49 0.07 1.25
Qualification: some degree -0.96 -0.54 -0.40 1.24
Qualification: unskilled 0.84 0.34 0.31 -1.14
Previous job: skilled worker -0.81 -0.31 -0.12 0.96
Fraction of months employed last 2 years -0.92 -0.34 -0.45 0.31
Finally, the gamma matrix of the APO_dml object can be used as input for the policytree R package. Figure 5 in the paper displayed the chosen program in the final leaf, which required messing with the code of the package. Here, we apply the standard package plot function that numerates the programs: (1) no program, (2) job search, (3) vocational, (4) computer and (5) language.
We observe that the depth one trees with five covariates (first graph below) is identical to Figure 5a. However, the other trees chose different splits indicating that the choice of the seeds might matter for the estimated assignment rules.
## Estimate optimal assignment rule estimation
# Depth one
pt_low1 = policy_tree(x_pt_low,cDML$APO$gamma,depth = 1)
pt_high1 = policy_tree(x_pt_high,cDML$APO$gamma,depth = 1)
# Depth two
pt_low2 = policy_tree(x_pt_low,cDML$APO$gamma,depth = 2)
pt_high2 = policy_tree(x_pt_high,cDML$APO$gamma,depth = 2)
## Plot optimal assignment rule estimation
plot(pt_low1)
plot(pt_high1)
plot(pt_low2)
plot(pt_high2)
# save.image("Notebook_image.RData")