Goals:
Introducing the data
The Application Notebook builds on the dataset that is kindly
provided in the hdm
package. The data was used in Chernozhukov
and Hansen (2004). Their paper investigates the effect of
participation in the employer-sponsored 401(k) retirement savings plan
(p401
) on net assets (net_tfa
). Since then the
data was used to showcase many new methods. It is not the most
comprehensive dataset with basically ten
covariates/regressors/predictors:
age: age
db: defined benefit pension
educ: education (in years)
fsize: family size
hown: home owner
inc: income (in US $)
male: male
marr: married
pira: participation in individual retirement account
(IRA)
twoearn: two earners
However, it is publicly available and the relatively few covariates
ensure that the programs do not run too long.
# Load the packages required for later
library(hdm)
library(tidyverse)
set.seed(1234) # for replicability
options(scipen = 10) # Switch off scientific notation
# Get data
data(pension)
Handcoding OLS
Run the OLS regression \(net\_tfa = \beta_0
+ \beta_1 age + \beta_2 db + \beta_3 educ + \beta_4 fsize + \beta_5 hown
+ \beta_6 inc + \beta_7 male + \beta_8 marr + \beta_9 pira + \beta_{10}
twoearn + \epsilon\) using the base command:
ols = lm(data = pension,
net_tfa ~ age + db + educ + fsize + hown + inc + male + marr + pira + twoearn )
summary(ols)
Call:
lm(formula = net_tfa ~ age + db + educ + fsize + hown + inc +
male + marr + pira + twoearn, data = pension)
Residuals:
Min 1Q Median 3Q Max
-509020 -17047 -2203 10232 1436184
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -31557.69509 4422.53582 -7.136 0.00000000000103 ***
age 610.20621 60.05623 10.161 < 2e-16 ***
db -3517.95396 1331.34350 -2.642 0.00824 **
educ -621.90195 228.96898 -2.716 0.00662 **
fsize -1092.95457 456.73577 -2.393 0.01673 *
hown 1619.33449 1322.02599 1.225 0.22065
inc 0.96194 0.02989 32.186 < 2e-16 ***
male -1141.88008 1517.82867 -0.752 0.45188
marr 611.68490 1820.21834 0.336 0.73684
pira 29662.86017 1467.19707 20.217 < 2e-16 ***
twoearn -19112.13608 1579.31262 -12.102 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 55790 on 9904 degrees of freedom
Multiple R-squared: 0.2295, Adjusted R-squared: 0.2287
F-statistic: 295 on 10 and 9904 DF, p-value: < 2.2e-16
Now implement the point estimator manually using the matrix
representation of the parameters \(\boldsymbol{\hat{\beta}} =
\boldsymbol{(X'X)^{-1}X'Y}\):
X = model.matrix(data = pension,
~ age + db + educ + fsize + hown + inc + male + marr + pira + twoearn )
Y = pension[,"net_tfa"]
hand_ols = solve(t(X) %*% X) %*% t(X) %*% Y
hand_ols
[,1]
(Intercept) -31557.6950949
age 610.2062081
db -3517.9539554
educ -621.9019512
fsize -1092.9545659
hown 1619.3344926
inc 0.9619398
male -1141.8800757
marr 611.6848981
pira 29662.8601743
twoearn -19112.1360850
The point estimates look very similar, but let`s check explicitly
whether they are equal:
all.equal(as.numeric(hand_ols),as.numeric(ols$coefficients))
[1] TRUE
Frisch-Waugh
The Frisch-Waugh
theorem tells us that we can get, e.g., \(\hat{\beta_1}\) also in the following
way
- Run regression \(age = \alpha_0 + \alpha_1
db + \alpha_2 educ + \alpha_3 fsize + \alpha_4 hown + \alpha_5 inc +
\alpha_6 male + \alpha_7 marr + \alpha_8 pira + \alpha_{9} twoearn +
V\)
ols_age = lm(data = pension,
age ~ db + educ + fsize + hown + inc + male + marr + pira + twoearn )
Vhat = ols_age$residuals
- Run regression \(net\_tfa = \gamma_0 +
\gamma_1 db + \gamma_2 educ + \gamma_3 fsize + \gamma_4 hown + \gamma_5
inc + \gamma_6 male + \gamma_7 marr + \gamma_8 pira + \gamma_{9} twoearn
+ U\)
ols_net_tfa = lm(data = pension,
net_tfa ~ db + educ + fsize + hown + inc + male + marr + pira + twoearn )
Uhat = ols_net_tfa$residuals
- Run residual-on-residual regression \(\hat{U} = \tau \hat{V} + \epsilon\)
summary(lm(Uhat ~ 0 + Vhat))
Call:
lm(formula = Uhat ~ 0 + Vhat)
Residuals:
Min 1Q Median 3Q Max
-509020 -17047 -2203 10232 1436184
Coefficients:
Estimate Std. Error t value Pr(>|t|)
Vhat 610.21 60.03 10.17 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 55760 on 9914 degrees of freedom
Multiple R-squared: 0.01032, Adjusted R-squared: 0.01022
F-statistic: 103.3 on 1 and 9914 DF, p-value: < 2.2e-16
This looks indeed very similar to the 610.2062081 estimated by the
lm()
function, but let’s check it formally:
all.equal(as.numeric(lm(Uhat ~ 0 + Vhat)$coefficients),
as.numeric(ols$coefficients[2]))
[1] TRUE
\(\Rightarrow\) The standard and the
manual Frisch-Waugh \(\hat{\beta}_1\)
are identical.
(optional): Check also the other parameters \(\hat{\beta}_2\) to \(\hat{\beta}_{10}\).
LS0tDQp0aXRsZTogIkJhc2ljczogT0xTIGFuZCBGcmlzY2gtV2F1Z2giDQpzdWJ0aXRsZTogIkFwcGxpY2F0aW9uIG5vdGVib29rIg0KYXV0aG9yOiAiTWljaGFlbCBLbmF1cyINCmRhdGU6ICJgciBmb3JtYXQoU3lzLnRpbWUoKSwgJyVtLyV5JylgIg0Kb3V0cHV0OiANCiAgaHRtbF9ub3RlYm9vazoNCiAgICB0b2M6IHRydWUNCiAgICB0b2NfZmxvYXQ6IHRydWUNCiAgICBjb2RlX2ZvbGRpbmc6IHNob3cNCi0tLQ0KDQoNCkdvYWxzOg0KDQotIEhhbmRjb2RlIE9MUw0KDQotIEhhbmRjb2RlIEZyaXNjaC1XYXVnaCBUaGVvcmVtDQoNCjxicj4NCg0KIyBJbnRyb2R1Y2luZyB0aGUgZGF0YQ0KDQpUaGUgQXBwbGljYXRpb24gTm90ZWJvb2sgYnVpbGRzIG9uIHRoZSBkYXRhc2V0IHRoYXQgaXMga2luZGx5IHByb3ZpZGVkIGluIHRoZSBgaGRtYCBwYWNrYWdlLiBUaGUgZGF0YSB3YXMgdXNlZCBpbiBbQ2hlcm5vemh1a292IGFuZCBIYW5zZW4gKDIwMDQpXShodHRwczovL2RpcmVjdC5taXQuZWR1L3Jlc3QvYXJ0aWNsZS84Ni8zLzczNS81NzU4Ni9UaGUtRWZmZWN0cy1vZi00MDEtSy1QYXJ0aWNpcGF0aW9uLW9uLXRoZS1XZWFsdGgpLiBUaGVpciBwYXBlciBpbnZlc3RpZ2F0ZXMgdGhlIGVmZmVjdCBvZiBwYXJ0aWNpcGF0aW9uIGluIHRoZSBlbXBsb3llci1zcG9uc29yZWQgNDAxKGspIHJldGlyZW1lbnQgc2F2aW5ncyBwbGFuIChgcDQwMWApIG9uIG5ldCBhc3NldHMgKGBuZXRfdGZhYCkuIFNpbmNlIHRoZW4gdGhlIGRhdGEgd2FzIHVzZWQgdG8gc2hvd2Nhc2UgbWFueSBuZXcgbWV0aG9kcy4gSXQgaXMgbm90IHRoZSBtb3N0IGNvbXByZWhlbnNpdmUgZGF0YXNldCB3aXRoIGJhc2ljYWxseSB0ZW4gY292YXJpYXRlcy9yZWdyZXNzb3JzL3ByZWRpY3RvcnM6DQoNCi0gKmFnZSo6IGFnZQ0KDQotICpkYio6IGRlZmluZWQgYmVuZWZpdCBwZW5zaW9uDQoNCi0gKmVkdWMqOiBlZHVjYXRpb24gKGluIHllYXJzKQ0KDQotICpmc2l6ZSo6IGZhbWlseSBzaXplDQoNCi0gKmhvd24qOiBob21lIG93bmVyDQoNCi0gKmluYyo6IGluY29tZSAoaW4gVVMgJCkNCg0KLSAqbWFsZSo6IG1hbGUNCg0KLSAqbWFycio6IG1hcnJpZWQNCg0KLSAqcGlyYSo6IHBhcnRpY2lwYXRpb24gaW4gaW5kaXZpZHVhbCByZXRpcmVtZW50IGFjY291bnQgKElSQSkNCg0KLSAqdHdvZWFybio6IHR3byBlYXJuZXJzDQoNCkhvd2V2ZXIsIGl0IGlzIHB1YmxpY2x5IGF2YWlsYWJsZSBhbmQgdGhlIHJlbGF0aXZlbHkgZmV3IGNvdmFyaWF0ZXMgZW5zdXJlIHRoYXQgdGhlIHByb2dyYW1zIGRvIG5vdCBydW4gdG9vIGxvbmcuDQoNCmBgYHtyLCB3YXJuaW5nID0gRiwgbWVzc2FnZSA9IEZ9DQojIExvYWQgdGhlIHBhY2thZ2VzIHJlcXVpcmVkIGZvciBsYXRlcg0KbGlicmFyeShoZG0pDQpsaWJyYXJ5KHRpZHl2ZXJzZSkNCg0Kc2V0LnNlZWQoMTIzNCkgIyBmb3IgcmVwbGljYWJpbGl0eQ0Kb3B0aW9ucyhzY2lwZW4gPSAxMCkgIyBTd2l0Y2ggb2ZmIHNjaWVudGlmaWMgbm90YXRpb24NCg0KIyBHZXQgZGF0YQ0KZGF0YShwZW5zaW9uKQ0KYGBgDQoNCjxicj4NCjxicj4NCg0KDQojIEhhbmRjb2RpbmcgT0xTDQoNClJ1biB0aGUgT0xTIHJlZ3Jlc3Npb24gJG5ldFxfdGZhID0gXGJldGFfMCArIFxiZXRhXzEgYWdlICsgXGJldGFfMiBkYiArIFxiZXRhXzMgZWR1YyArIFxiZXRhXzQgZnNpemUgKyBcYmV0YV81IGhvd24gKyBcYmV0YV82IGluYyArIFxiZXRhXzcgbWFsZSArIFxiZXRhXzggbWFyciArIFxiZXRhXzkgcGlyYSArIFxiZXRhX3sxMH0gdHdvZWFybiArIFxlcHNpbG9uJCB1c2luZyB0aGUgYmFzZSBjb21tYW5kOg0KDQpgYGB7cn0NCm9scyA9IGxtKGRhdGEgPSBwZW5zaW9uLA0KICAgICAgICAgIG5ldF90ZmEgfiBhZ2UgKyBkYiArIGVkdWMgKyBmc2l6ZSArIGhvd24gKyBpbmMgKyBtYWxlICsgbWFyciArIHBpcmEgKyB0d29lYXJuICkNCnN1bW1hcnkob2xzKQ0KYGBgDQoNCk5vdyBpbXBsZW1lbnQgdGhlIHBvaW50IGVzdGltYXRvciBtYW51YWxseSB1c2luZyB0aGUgbWF0cml4IHJlcHJlc2VudGF0aW9uIG9mIHRoZSBwYXJhbWV0ZXJzICRcYm9sZHN5bWJvbHtcaGF0e1xiZXRhfX0gPSBcYm9sZHN5bWJvbHsoWCdYKV57LTF9WCdZfSQ6DQoNCmBgYHtyfQ0KWCA9IG1vZGVsLm1hdHJpeChkYXRhID0gcGVuc2lvbiwNCiAgICAgICAgICB+IGFnZSArIGRiICsgZWR1YyArIGZzaXplICsgaG93biArIGluYyArIG1hbGUgKyBtYXJyICsgcGlyYSArIHR3b2Vhcm4gKQ0KWSA9IHBlbnNpb25bLCJuZXRfdGZhIl0NCg0KaGFuZF9vbHMgPSBzb2x2ZSh0KFgpICUqJSBYKSAlKiUgdChYKSAlKiUgWQ0KDQpoYW5kX29scw0KYGBgDQoNClRoZSBwb2ludCBlc3RpbWF0ZXMgbG9vayB2ZXJ5IHNpbWlsYXIsIGJ1dCBsZXRgcyBjaGVjayBleHBsaWNpdGx5IHdoZXRoZXIgdGhleSBhcmUgZXF1YWw6DQoNCmBgYHtyfQ0KYWxsLmVxdWFsKGFzLm51bWVyaWMoaGFuZF9vbHMpLGFzLm51bWVyaWMob2xzJGNvZWZmaWNpZW50cykpDQoNCmBgYA0KDQo8YnI+DQo8YnI+DQoNCiMgRnJpc2NoLVdhdWdoDQoNClRoZSBbRnJpc2NoLVdhdWdoIHRoZW9yZW1dKGh0dHBzOi8vZW4ud2lraXBlZGlhLm9yZy93aWtpL0ZyaXNjaCVFMiU4MCU5M1dhdWdoJUUyJTgwJTkzTG92ZWxsX3RoZW9yZW0pIHRlbGxzIHVzIHRoYXQgd2UgY2FuIGdldCwgZS5nLiwgJFxoYXR7XGJldGFfMX0kIGFsc28gaW4gdGhlIGZvbGxvd2luZyB3YXkNCg0KMS4gUnVuIHJlZ3Jlc3Npb24gJGFnZSA9IFxhbHBoYV8wICsgXGFscGhhXzEgZGIgKyBcYWxwaGFfMiBlZHVjICsgXGFscGhhXzMgZnNpemUgKyBcYWxwaGFfNCBob3duICsgXGFscGhhXzUgaW5jICsgXGFscGhhXzYgbWFsZSArIFxhbHBoYV83IG1hcnIgKyBcYWxwaGFfOCBwaXJhICsgXGFscGhhX3s5fSB0d29lYXJuICsgViQNCg0KYGBge3J9DQpvbHNfYWdlID0gbG0oZGF0YSA9IHBlbnNpb24sDQogICAgICAgICAgYWdlIH4gZGIgKyBlZHVjICsgZnNpemUgKyBob3duICsgaW5jICsgbWFsZSArIG1hcnIgKyBwaXJhICsgdHdvZWFybiApDQpWaGF0ID0gb2xzX2FnZSRyZXNpZHVhbHMNCmBgYA0KDQoyLiBSdW4gcmVncmVzc2lvbiAkbmV0XF90ZmEgPSBcZ2FtbWFfMCArIFxnYW1tYV8xIGRiICsgXGdhbW1hXzIgZWR1YyArIFxnYW1tYV8zIGZzaXplICsgXGdhbW1hXzQgaG93biArIFxnYW1tYV81IGluYyArIFxnYW1tYV82IG1hbGUgKyBcZ2FtbWFfNyBtYXJyICsgXGdhbW1hXzggcGlyYSArIFxnYW1tYV97OX0gdHdvZWFybiArIFUkDQoNCmBgYHtyfQ0Kb2xzX25ldF90ZmEgPSBsbShkYXRhID0gcGVuc2lvbiwNCiAgICAgICAgICBuZXRfdGZhIH4gZGIgKyBlZHVjICsgZnNpemUgKyBob3duICsgaW5jICsgbWFsZSArIG1hcnIgKyBwaXJhICsgdHdvZWFybiApDQpVaGF0ID0gb2xzX25ldF90ZmEkcmVzaWR1YWxzDQpgYGANCg0KMy4gUnVuIHJlc2lkdWFsLW9uLXJlc2lkdWFsIHJlZ3Jlc3Npb24gJFxoYXR7VX0gPSBcdGF1IFxoYXR7Vn0gKyBcZXBzaWxvbiQNCg0KYGBge3J9DQpzdW1tYXJ5KGxtKFVoYXQgfiAwICsgVmhhdCkpDQpgYGANCg0KVGhpcyBsb29rcyBpbmRlZWQgdmVyeSBzaW1pbGFyIHRvIHRoZSBgciBvbHMkY29lZmZpY2llbnRzWzJdYCBlc3RpbWF0ZWQgYnkgdGhlIGBsbSgpYCBmdW5jdGlvbiwgYnV0IGxldCdzIGNoZWNrIGl0IGZvcm1hbGx5Og0KDQpgYGB7cn0NCmFsbC5lcXVhbChhcy5udW1lcmljKGxtKFVoYXQgfiAwICsgVmhhdCkkY29lZmZpY2llbnRzKSwNCiAgICAgICAgICBhcy5udW1lcmljKG9scyRjb2VmZmljaWVudHNbMl0pKQ0KYGBgDQoNCiRcUmlnaHRhcnJvdyQgVGhlIHN0YW5kYXJkIGFuZCB0aGUgbWFudWFsIEZyaXNjaC1XYXVnaCAkXGhhdHtcYmV0YX1fMSQgYXJlIGlkZW50aWNhbC4NCg0KPGJyPg0KDQoqKG9wdGlvbmFsKSo6IENoZWNrIGFsc28gdGhlIG90aGVyIHBhcmFtZXRlcnMgJFxoYXR7XGJldGF9XzIkIHRvICRcaGF0e1xiZXRhfV97MTB9JC4gDQoNCjxicj4NCg0KDQoNCg0KDQoNCg==