Causal Tree

Causal Tree#

Causal TreeはCATE推定に使えるよう改良された決定木。

ただし、観察研究データに使用するためには選択バイアスを除去する必要があり、Atheyらによって傾向スコアを用いた改良手法Causal Tree-Transformed Outcome（CT-TO）が提案されている

前提 / Notation#

Potential outcome framework

\(Y_{a=1}, Y_{a=0} \in \mathbb{R}\) ：潜在結果変数
\(X_j\)： \(p\)次元の pre-treatment 共変量（ \(j=1:p\) ）
\(A=\{0,1\}\) ：処置変数
\(\pi(x)=\operatorname{Pr}(A=1 \mid X=x)\) ：傾向スコア

Assumptions

Consistency: \(Y=A Y_{a=1}+(1-A) Y_{a=0}\)
Unconfoundedness: \(A \perp Y_a \mid X \text { for } a=0,1\)
Posititvity: \(0<\pi(x)<1\)

Definitions

Average Treatment Effect (ATE): \(\theta^{A T E}=\mathrm{E}\left[Y_{a=1}-Y_{a=0}\right]\)
Heterogeneous Treatment Effect (HTE): \(\theta^{H T E}(x)=\mathrm{E}\left[Y_{a=1}-Y_{a=0} \mid X=x\right]\)

Honest#

Causal Treesはrecursive partitioningを用いてHeterogeneous Treatment Effectを推定する手法。
honest性という概念がcausal forestsやGeneralized Random Forestの証明において重要な役割を果たす。
またhonest性を満たすTreeはCARTと比較して過学習を起こしにくいという性質もある。

honest

「木の分割（partitioning）をするために用いるサンプル」と「TreeのLeafごとの推定量の計算に用いるサンプル」に別々のサンプルを用いることで、partition \(\Pi\) と推定量\(\hat{\mu}\) が独立になったTree を honest なTreeであるという

honestな木はCARTと異なる目的関数をもつ#

honestな木はpartition \(\Pi\)のもとで estimation sample \(\mathcal{S}^{e s t}\) を用いて推定された条件付き平均\(\hat{\mu}\left(X_i ; \mathcal{S}^{e s t}, \Pi\right)\)とテストデータ\(\mathcal{S}^{t e}\)の平均二乗誤差

\[ \operatorname{MSE}\left(\mathcal{S}^{t e}, \mathcal{S}^{e s t}, \Pi\right)=\frac{1}{\#\left(\mathcal{S}^{t e}\right)} \sum_{i \in \mathcal{S}^{t e}} \left\{\left(Y_i-\hat{\mu}\left(X_i ; \mathcal{S}^{e s t}, \Pi\right)\right)^2-Y_i^2\right\} \]

の期待値をとったものを最小化する。

\[ \Pi^{\text{honest}}= \arg\min_\Pi \mathrm{E}_{\mathcal{S}^{\text{te}}, \mathcal{S}^{\text{est}}, \mathcal{S}^{\text{tr}}} \left[\operatorname{MSE}(\mathcal{S}^{\text{te}}, \mathcal{S}^{\text{est}}, \Pi(\mathcal{S}^{\text{tr}})\right] \]

一方で一般的なCARTでは、訓練サンプル \(\mathcal{S}^{\text{te}}\) を使ってpartition \(\Pi\)と推定量\(\hat{\mu}\)を作って誤差を最小化する

\[ \Pi^{\text{CART}}= \arg\min_\Pi \mathrm{E}_{\mathcal{S}^{\text{te}}, \mathcal{S}^{\text{tr}}} \left[\operatorname{MSE}(\mathcal{S}^{\text{te}}, \mathcal{S}^{\text{tr}}, \Pi(\mathcal{S}^{\text{tr}})\right] \]

honestな木は過学習しにくい#

MSEの期待値を取ったものをEMSE

\[ \operatorname{EMSE}(\Pi) := \mathrm{E}_{\mathcal{S}^{t e}, \mathcal{S}^{\text {est }}}\left[\operatorname{MSE}\left(\mathcal{S}^{t e}, \mathcal{S}^{\text {est }}, \Pi\right)\right] \]

とする。honestな木はこれを目的関数とする。

負のEMSEを展開すると

\[\begin{split} \begin{aligned} -\operatorname{EMSE}(\Pi) & =-\mathrm{E}_{\left(Y_i, X_j\right), \mathcal{S}^{\operatorname{est}}}\left[\left(Y_i-\mu\left(X_i ; \Pi\right)^2-Y_i\right]\right. \\ & -\mathrm{E}_{X_i, \mathcal{S}^{\text {est }}}\left[\left(\hat{\mu}\left(X_i ; \mathcal{S}^{\text {est }} ; \Pi\right)-\mu\left(X_i ; \Pi\right)\right)^2\right] \\ & =\mathrm{E}_{X_i}\left[\mu^2\left(X_i ; \Pi\right)\right]-\mathrm{E}_{\mathcal{S}^{\text {est }}, X_i}\left[\operatorname{Var}\left(\hat{\mu}\left(X_i ; \mathcal{S}^{\text {est }} ; \Pi\right)\right)\right] \end{aligned} \end{split}\]

となる。

これに対して訓練サンプル\(\mathcal{S}^{t r}\)から不偏推定量を構成すると

\[ \widehat{\operatorname{EMSE}}\left(\mathcal{S}^{t r}, \Pi\right) =\frac{1}{N^{t r}} \sum_{i \in \mathcal{S}^{t r}} \hat{\mu}^2\left(X_i ; \mathcal{S}^{t r}, \Pi\right) -\underbrace{ \frac{2}{N^{t r}} \cdot \sum_{\ell \in \Pi} S_{\mathcal{S}^{t r}}^2(\ell) }_{penalty} \]

となる。ここで\(S_{\mathcal{S}^{t r}}^2(\ell)\)は\(\ell \in \Pi\)におけるleaf内分散を意味する。

一方で、CARTにおいてはpenalty項がなく、分割を行えば行うほど\(-\operatorname{MSE}\)が改善するため、枝刈りが必要になる。

\[ -\operatorname{MSE}\left(\mathcal{S}^{t r}, \mathcal{S}^{t r}, \Pi\right)=\frac{1}{N^{t r}} \sum_{i \in \mathcal{S}^{t r}} \hat{\mu}^2\left(X_i ; \mathcal{S}^{t r}, \Pi\right) \]

leaf内分散はleaf内のサンプル数が多いうちは小さい（=CARTとhonest treeは似た挙動になる）が、leaf内サンプルが小さくなると高くなりやすい（分割を停止する方向に動く）。

HTEの推定#

問題：データ \(\left(Y_i, X_i, W_i\right) \in \mathbb{R} \times \mathbb{R}^\rho \times\{0,1\}\) が観測されたもとで、 \(\theta^{H T E}(x)=\mathrm{E}\left[Y_{a=1}-Y_{a=0} \mid X=x\right]\) を推定する問題

\[ \tau(x ; \Pi) \equiv \mathrm{E}\left[Y_{a=1}-Y_{a=0} \mid X \in \ell(x ; \Pi)\right] \]

\[ \mu(a, x ; \Pi) \equiv \mathrm{E}\left[Y_a \mid X \in \ell(x ; \Pi)\right] \]

Causal Treeヘの批判#

[2509.11381] The Honest Truth About Causal Trees: Accuracy Limits for Heterogeneous Treatment Effect Estimation

理論解析を行った結果、推定量の収束レートが遅く、あまりいい推定量じゃなさそうであることがわかった

Double Sample Trees#

参考文献#

解説記事

CATE推定のためのCausal Treeの仕組み｜Dentsu Digital Tech Blog