RDD#
回帰不連続デザイン(regression discontinuity design: RDD)は、ある連続変数(running variable)上のある地点を閾値(threshold, cut off point)として処置割り当てが変わる状況を利用し、閾値の直前と直後における結果変数の差を、閾値周辺の対象における局所的な平均処置効果とするデザイン
Show code cell source
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
n = 1000
np.random.seed(0)
ate = 4
x = np.random.uniform(0, 1, size=n)
e = np.random.normal(size=n)
cutoff = 0.5
d = 1 * (x >= cutoff)
y = 100 + 10 * x + -5 * x**2 + ate * d + e
df = pd.DataFrame(dict(y=y, x=x, d=d))
fig, ax = plt.subplots()
ax.scatter(x, y, alpha=.5, color="steelblue")
ax.set(xlabel="running variable", ylabel="outcome")
# sns.scatterplot(x="x", y="y", hue="d", data=df, ax=ax)
ax.axvline(x=cutoff, linestyle=":", color="gray")
ax.text(x=cutoff * 1.01, y=y.min(), s="Cutoff", color="gray")
import statsmodels.api as sm
for i in [0, 1]:
preds = sm.nonparametric.lowess(y[d == i], x[d == i])
ax.plot(preds[:, 0], preds[:, 1], linewidth=2, color="darkorange")
fig.show()
RDDの分類#
2つのframework#
continuity-based framework: 分断点(cutoff point)付近のジャンプを見る
local randomization framework: 分断点付近のサンプルがたまたま分断点より高くなったか低くなったかはランダムと判断し、処置群と対照群を比較する
2つのdesign#
Sharp RD: \(X > c\)のサンプルは100%の確率で処置を受ける
Fuzzy RD: \(X > c\)のサンプルは処置の確率が不連続に変化するが、100%ではない(処置割当に従わないことがありうる)
線形RD#
条件付き期待値\(E[Y(1)|X], E[Y(0)|X]\)がともに\(X\)に関して線形である場合、
と表すことができ、
と定義してすれば
と変形することができるため
のようにして線形回帰によってRD推定量を計算できる
多項式RD#
条件付き期待値が非線形の場合で、線形回帰によって扱いたい場合は多項式でモデリングする方法がある
局所回帰#
多項式よりはこちらが推奨される
Fuzzy RD#
処置の割当\(T = \mathbb{1}(X \geq c)\)に対して実際に処置を受けるかどうかを\(D(T) \in \{0, 1\}\)で表す。
処置が強制ではない場合でも、処置を受ける確率\(P(D=1|X)\)が\(X=c\)において不連続であればFuzzy RDによってRDによる推定ができる
continuity-based frameworkにおいて、Fuzzy RDのedtimandは次のように表される
Pythonによる推定#
{rdrobust}
パッケージはRだけでなくPython版も提供されている(RDROBUST · RD Packages)ので使っていく
from rdrobust import rdrobust, rdbwselect, rdplot
---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
Cell In[3], line 1
----> 1 from rdrobust import rdrobust, rdbwselect, rdplot
File /usr/local/lib/python3.10/site-packages/rdrobust/__init__.py:5
3 from rdrobust.rdrobust import rdrobust
4 from rdrobust.rdbwselect import rdbwselect
----> 5 from rdrobust.rdplot import rdplot
File /usr/local/lib/python3.10/site-packages/rdrobust/rdplot.py:13
11 import pandas as pd
12 from sklearn.linear_model import LinearRegression as LR
---> 13 from plotnine import *
14 from rdrobust.funs import *
17 def rdplot(y, x, c = 0, p = 4, nbins = None, binselect = "esmv", scale = None,
18 kernel = "uni", weights = None, h = None,
19 covs = None, covs_eval = "mean", covs_drop = True,
(...)
22 title = None, x_label = None, y_label = None, x_lim = None,
23 y_lim = None, col_dots = None, col_lines = None):
File /usr/local/lib/python3.10/site-packages/plotnine/__init__.py:1
----> 1 from .qplot import qplot # noqa: F401
2 from .ggplot import ggplot, ggsave # noqa: F401
3 from .ggplot import save_as_pdf_pages # noqa: F401
File /usr/local/lib/python3.10/site-packages/plotnine/qplot.py:9
6 import numpy as np
7 from patsy.eval import EvalEnvironment
----> 9 from .ggplot import ggplot
10 from .mapping.aes import aes, ALL_AESTHETICS, SCALED_AESTHETICS
11 from .labels import labs
File /usr/local/lib/python3.10/site-packages/plotnine/ggplot.py:17
14 from patsy.eval import EvalEnvironment
16 from .mapping.aes import aes, make_labels
---> 17 from .layer import Layers
18 from .facets import facet_null
19 from .facets.layout import Layout
File /usr/local/lib/python3.10/site-packages/plotnine/layer.py:6
3 import pandas as pd
5 from .exceptions import PlotnineError
----> 6 from .utils import array_kind, ninteraction
7 from .utils import check_required_aesthetics, defaults
8 from .mapping.aes import aes, NO_GROUP, SCALED_AESTHETICS
File /usr/local/lib/python3.10/site-packages/plotnine/utils.py:22
20 from matplotlib.patches import Rectangle
21 from mizani.bounds import zero_range
---> 22 from mizani.utils import multitype_sort
24 from .mapping import aes
25 from .exceptions import PlotnineError, PlotnineWarning
ImportError: cannot import name 'multitype_sort' from 'mizani.utils' (/usr/local/lib/python3.10/site-packages/mizani/utils.py)
from rdrobust import rdrobust, rdbwselect, rdplot
import pandas as pd
### Load data base
rdrobust_senate = pd.read_csv("https://raw.githubusercontent.com/rdpackages/rdrobust/master/Python/rdrobust_senate.csv")
# Define the variblrs
margin = rdrobust_senate.margin
vote = rdrobust_senate.vote
### rdplot with 95% confidence intervals
rdplot(y=vote, x=margin, binselect="es", ci=95,
title="RD Plot: U.S. Senate Election Data",
y_label="Vote Share in Election at time t+2",
x_label="Vote Share in Election at time t")
Call: rdplot
Number of Observations: 1297
Kernel: Uniform
Polynomial Order Est. (p): 4
Left Right
------------------------------------------------
Number of Observations 595 702
Number of Effective Obs 595 702
Bandwith poly. fit (h) 100.0 100.0
Number of bins scale 1 1
Bins Selected 8 9
Average Bin Length 12.5 11.111
Median Bin Length 12.5 11.111
IMSE-optimal bins 8.0 9.0
Mimicking Variance bins 15.0 35.0
Relative to IMSE-optimal:
Implied scale 1.0 1.0
WIMSE variance weight 0.5 0.5
WIMSE bias weight 0.5 0.5
Interrupted Time-Series#
分割時系列(interrupted time-series)はRDDのrunning variableを時間にしたもの。 時系列特有の変動(自己相関性など)がバイアスになりうるので考慮が必要
参考文献#
Introductions#
Cattaneo and Titiunik (2022): Regression Discontinuity Designs. Annual Review of Economics 14: 821-851.
Cattaneo, Idrobo and Titiunik (2020): A Practical Introduction to Regression Discontinuity Designs: Foundations. Cambridge Elements: Quantitative and Computational Methods for Social Science, Cambridge University Press. Erratum.
Cattaneo, Idrobo and Titiunik (2023): A Practical Introduction to Regression Discontinuity Designs: Extensions. Cambridge Elements: Quantitative and Computational Methods for Social Science, Cambridge University Press.