{ "cells": [ { "cell_type": "markdown", "id": "a5c186cc-e431-41cb-9033-8ee8264506da", "metadata": {}, "source": [ "# 欠落変数バイアス\n", "\n", "結果変数$Y$、処置変数$D$と、$Y,D$に影響を与える共変量(交絡因子)$X$があるとし、各変数の間には線形の関係があるとする。\n", "\n", "処置の効果を正しく推定するためには共変量をモデルに含める必要があるため、正しいモデルは次の式の形になる(「長いモデル」ということでLとつけている)。\n", "\n", "$$\n", "Y = \\alpha^L + \\beta^L D + \\gamma^L X + \\varepsilon^L\n", "$$\n", "\n", "\n", "\n", "ここで$X$を説明変数に入れない「短いモデル」\n", "\n", "$$\n", "Y = \\alpha^S + \\beta^S D + \\varepsilon^S\n", "$$\n", "\n", "を構築した場合、処置効果$\\beta^S$はどう推定されるのだろうか。" ] }, { "cell_type": "markdown", "id": "f7463dcb-24c4-4827-9f54-a0fcce9c114f", "metadata": {}, "source": [ "XをDに回帰する(Xの変動をDで説明する)モデルを立ててみる。\n", "\n", "$$\n", "X = \\alpha + \\beta D + \\varepsilon\n", "$$\n", "\n", "これを「長いモデル」に代入して整理すると、「短いモデル」との対応が見えてくる\n", "\n", "$$\n", "\\begin{align}\n", "Y &= \\alpha^L + \\beta^L D + \\gamma^L (\\alpha + \\beta D + \\varepsilon) + \\varepsilon^L \\\\\n", "&= \\alpha^L + \\gamma^L \\alpha + (\\beta^L + \\gamma^L \\beta) D + \\gamma^L \\varepsilon + \\varepsilon^L \\\\\n", "&= \\underbrace{\\alpha^L + \\gamma^L \\alpha}_{\\alpha^S}\n", " + \\underbrace{(\\beta^L + \\gamma^L \\beta)}_{\\beta^S} D\n", " + \\underbrace{\\gamma^L \\varepsilon + \\varepsilon^L}_{\\varepsilon^S} \\\\\n", "\\end{align}\n", "$$\n", "\n", "\n", "「短いモデル」の$\\beta^S$は\n", "\n", "$$\n", "\\beta^S = \\beta^L + \\gamma^L \\beta\n", "$$\n", "\n", "であるため、正しいモデルの推定量$\\beta^L$から$\\gamma^L \\beta$の分だけズレることがわかる。" ] }, { "cell_type": "markdown", "id": "4389bb9a-c9da-46e8-b116-fdfc58abc155", "metadata": {}, "source": [ "## 生成データで実験\n", "\n", "実際に回帰分析を行ってみる。乱数で生成する。" ] }, { "cell_type": "code", "execution_count": 1, "id": "f079993c-f12e-417f-a8dc-5d0de20fbc9a", "metadata": { "tags": [ "hide-input" ] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "真の係数:α_L=3, β_L=5, γ_L=7\n" ] }, { "data": { "text/html": [ "
\n", " | Y | \n", "D | \n", "X | \n", "
---|---|---|---|
0 | \n", "12.968330 | \n", "1 | \n", "0.548814 | \n", "
1 | \n", "11.926394 | \n", "1 | \n", "0.715189 | \n", "
2 | \n", "11.071875 | \n", "1 | \n", "0.602763 | \n", "
3 | \n", "11.376362 | \n", "1 | \n", "0.544883 | \n", "
4 | \n", "5.467551 | \n", "0 | \n", "0.423655 | \n", "