DMLパッケージで試してみる

DMLパッケージで試してみる#

Partially Linear Regression Model#

\[\begin{split} \begin{align}\begin{aligned}Y = D \theta_0 + g_0(X) + \zeta, & &\mathbb{E}(\zeta | D,X) = 0, \\ D = m_0(X) + V, & &\mathbb{E}(V | X) = 0,\end{aligned}\end{align} \end{split}\]

doubleml.DoubleMLPLR — DoubleML documentation

class doubleml.DoubleMLPLR(obj_dml_data, ml_l, ml_m, ml_g=None, n_folds=5, n_rep=1, score='partialling out', draw_sample_splitting=True)

nuisance functions

  • ml_l\(\ell_0(X) = E[Y|X]\)

  • ml_m\(m_0(X) = E[D|X]\)

  • ml_g\(g_0(X) = E[Y - D \theta_0|X]\)で、scoreが'IV-type'のときのみ使われる

デフォルトのscoreが'partialling out'で、これはRobinson (1988)の

\[ \psi(W ; \theta, \eta):=\{Y-\ell(X)-\theta(D-m(X))\}(D-m(X)), \quad \eta=(\ell, m) \]

というタイプのスコア関数であり、推定量としては

\[ Y - \underbrace{ E[Y|X] }_{\ell_0(X)} = \theta_0 (D - \underbrace{ E[D|X]}_{m_0(X)} ) + U \]

という、残差回帰タイプの推定量をもたらす。

import numpy as np
import doubleml as dml
from doubleml.datasets import make_plr_CCDDHNR2018
from sklearn.ensemble import RandomForestRegressor
from sklearn.base import clone
np.random.seed(0)
learner = RandomForestRegressor(n_estimators=100, max_features=20, max_depth=5, min_samples_leaf=2, random_state=0)
ml_g = learner
ml_m = learner
obj_dml_data = make_plr_CCDDHNR2018(alpha=0.5, n_obs=500, dim_x=20)
dml_plr_obj = dml.DoubleMLPLR(obj_dml_data, ml_g, ml_m)
dml_plr_obj.fit().summary
coef std err t P>|t| 2.5 % 97.5 %
d 0.438602 0.048179 9.10358 8.740236e-20 0.344173 0.533032

doubleml-for-py/doubleml/plm/plr.py at main · DoubleML/doubleml-for-py