DMLパッケージで試してみる#
Partially Linear Regression Model#
\[\begin{split}
\begin{align}\begin{aligned}Y = D \theta_0 + g_0(X) + \zeta, & &\mathbb{E}(\zeta | D,X) = 0,
\\
D = m_0(X) + V, & &\mathbb{E}(V | X) = 0,\end{aligned}\end{align}
\end{split}\]
doubleml.DoubleMLPLR — DoubleML documentation
class doubleml.DoubleMLPLR(obj_dml_data, ml_l, ml_m, ml_g=None, n_folds=5, n_rep=1, score='partialling out', draw_sample_splitting=True)
nuisance functions
ml_l
は \(\ell_0(X) = E[Y|X]\)ml_m
は \(m_0(X) = E[D|X]\)ml_g
は \(g_0(X) = E[Y - D \theta_0|X]\)で、scoreが'IV-type'
のときのみ使われる
デフォルトのscoreが'partialling out'
で、これはRobinson (1988)の
\[
\psi(W ; \theta, \eta):=\{Y-\ell(X)-\theta(D-m(X))\}(D-m(X)), \quad \eta=(\ell, m)
\]
というタイプのスコア関数であり、推定量としては
\[
Y - \underbrace{ E[Y|X] }_{\ell_0(X)} = \theta_0 (D - \underbrace{ E[D|X]}_{m_0(X)} ) + U
\]
という、残差回帰タイプの推定量をもたらす。
import numpy as np
import doubleml as dml
from doubleml.datasets import make_plr_CCDDHNR2018
from sklearn.ensemble import RandomForestRegressor
from sklearn.base import clone
np.random.seed(0)
learner = RandomForestRegressor(n_estimators=100, max_features=20, max_depth=5, min_samples_leaf=2, random_state=0)
ml_g = learner
ml_m = learner
obj_dml_data = make_plr_CCDDHNR2018(alpha=0.5, n_obs=500, dim_x=20)
dml_plr_obj = dml.DoubleMLPLR(obj_dml_data, ml_g, ml_m)
dml_plr_obj.fit().summary
coef | std err | t | P>|t| | 2.5 % | 97.5 % | |
---|---|---|---|---|---|---|
d | 0.438602 | 0.048179 | 9.10358 | 8.740236e-20 | 0.344173 | 0.533032 |
doubleml-for-py/doubleml/plm/plr.py at main · DoubleML/doubleml-for-py