信頼区間と検定の関係性

信頼区間と検定の関係性#

信頼区間#

平均値の区間推定を行う場合について考える。

母平均を $μ$ 、標本平均を $\bar{X}$ とすると、 $Z = \frac{\sqrt{n} (\bar{X} - μ)}{σ}$ は標準正規分布に従うため確率を計算できるため、信頼係数 $1 - α$ に相当する確率になる区間

P (- Z_{α / 2} \leq \frac{\sqrt{n} (\bar{X} - μ)}{σ} \leq Z_{α / 2}) = 1 - α

となるように区間を決めて、これを $μ$ について解くと

P (\bar{X} - Z_{α / 2} \times \frac{σ}{\sqrt{n}} \leq μ \leq \bar{X} + Z_{α / 2} \times \frac{σ}{\sqrt{n}}) = 1 - α

と、母平均を含む確率が $1 - α$ の区間ということになる。

信頼区間を取り出すと

[\bar{X} - Z_{α / 2} \times \frac{σ}{\sqrt{n}}, \bar{X} + Z_{α / 2} \times \frac{σ}{\sqrt{n}}]

となる。

Show code cell source Hide code cell source

import numpy as np
import matplotlib.pyplot as plt
import japanize_matplotlib
from scipy.stats import norm

n = 200
sigma = 2
x = norm.rvs(loc=10, scale=sigma, size=n, random_state=0)
x_bar = x.mean()

fig, axes = plt.subplots(dpi=100, figsize=[11, 2], ncols=3)
# Histogram
axes[0].set(title=f"Sample (n={n}, σ={sigma})", xlabel="$x$")
axes[0].hist(x, bins=10)
axes[0].axvline(x=x_bar, color="darkorange")
axes[0].text(x_bar + 0.1, n*0.08, r"$\bar{X}$"+f"={x_bar:.1f}", color="darkorange", horizontalalignment="left")


# Z and Standard Normal Dist
z = np.linspace(-4, 4, 300)
y = norm.pdf(z)
axes[1].set(title="Standard Normal Distribution", xlabel="Z", xlim=(-4, 4))
axes[1].plot(z, y, color="dimgray")
axes[1].axhline(y=0, color="dimgray", linewidth=1)
alpha = 0.05 / 2
for a in [alpha, (1 - alpha)]:
    z_ = norm.ppf(a)
    axes[1].axvline(x=z_, color="steelblue")
    if z_ < 0:
        axes[1].text(z_ - 0.1, norm.pdf(z_) + 0.01, r"$Z_{\alpha/2}$", color="steelblue", horizontalalignment="right")
        axes[1].fill_between(z, 0, y, where = z <= z_, color="steelblue")
    else:
        axes[1].text(z_ + 0.1, norm.pdf(z_) + 0.01, r"$Z_{1-(\alpha/2)}$", color="steelblue", horizontalalignment="left")
        axes[1].fill_between(z, 0, y, where = z >= z_, color="steelblue")

is_accept = np.abs(z) <= norm.ppf(a)
axes[1].fill_between(z[is_accept], 0, y[is_accept], color="honeydew")
axes[1].text(0, norm.pdf(0) * 0.3, "Coverage Probability\n$P(\mu \in [L, U])$", color="green", horizontalalignment="center")


# Confidence interval
xlim = (x.min(), x.max())
x_space = np.linspace(xlim, 300)
y = norm.pdf(x_space)
axes[2].set(title="Confidence Interval", xlabel=r"$\mu$", xlim=xlim)
axes[2].axhline(y=0, color="dimgray", linewidth=1)

alpha = 0.05 / 2
for a in [alpha, (1 - alpha)]:
    z_ = norm.ppf(a)
    lower_or_upper = x_bar + z_ * (sigma / np.sqrt(n))
    axes[2].axvline(x=lower_or_upper, color="steelblue")
    if z_ < 0:
        axes[2].text(lower_or_upper - 0.2, 0.1, r"$L=\bar{X} - Z_{\alpha/2} \times \frac{\sigma}{\sqrt{n}}$" + f"\n={lower_or_upper:.1f}", color="steelblue", horizontalalignment="right")
    else:
        axes[2].text(lower_or_upper + 0.2, 0.3, r"$U=\bar{X} + Z_{\alpha/2} \times \frac{\sigma}{\sqrt{n}}$" + f"\n={lower_or_upper:.1f}", color="steelblue", horizontalalignment="left")

fig.show()

../../_images/825a4a284ec35d0929a8c0ff0e6a1247148242bb3c8eaeebd3a6a8eb958349e8.png

検定#

母平均 $μ$ 、母分散 $σ^{2}$ の正規母集団についての

H_{0} : μ = μ_{0} vs H_{1} : μ \neq μ_{0}

という検定問題について考える。

帰無仮説が正しければ、標本平均は中心極限定理により正規分布 $N (μ, σ^{2} / n)$ に従うため、標準化した $Z$ は標準正規分布 $N (0, 1)$ に従う。なので、標準正規分布のパーセント点 $Z_{α / 2}$ と比較して

\begin{array}{r} | Z | > Z_{α / 2} ⟹ H_{0} を棄却 \\ | Z | \leq Z_{α / 2} ⟹ H_{0} を受容 \end{array}

となる

検定の棄却域は有意水準 $α$ 、すなわち「帰無仮説 $H_{0}$ が正しいにも関わらず誤って $H_{0}$ を棄却してしまう確率」を

P_{μ = μ_{0}} (| Z | > Z_{α / 2}) = α

となるように $Z_{α / 2}$ を設定している。 ( $μ = μ_{0}$ は帰無仮説のもとで、ということ）

逆に受容域は

P_{μ = μ_{0}} (| Z | \leq Z_{α / 2}) = 1 - α

../../_images/ff2a1dbcf826d611c7e0d0e79cd55d044a70e75f9366707529c7949499cc6cb2.png

まとめ#

信頼区間とは、ある区間 $[L, U]$ が母数 $μ$ を含む確率が $1 - α$ になるような区間のこと： $P_{μ} (μ \in [L, U]) = 1 - α$
仮説検定とは、帰無仮説のもとで確率変数 $X$ が受容域 $A = {X \in X | | Z | \leq Z_{α / 2}}$ に含まれる確率を $P_{μ = μ_{0}} (X \in A) = 1 - α$ として、その範囲内に統計量 $T$ が収まるかどうかを判定したもの

参考文献#

久保川達也（2017）『現代数理統計学の基礎』、p.169

信頼区間と検定の関係性

Contents

信頼区間と検定の関係性#

信頼区間#

検定#

まとめ#

参考文献#