疑似R2乗値#
GLMでは通常のR2乗値は使えないため、疑似R2乗値(pseudo R-squared)というものが使われる
疑似R2の種類#
McFaddenの疑似\(R^2\)
\[
R^2_{\text{McF}} = 1 - \frac{ \ln(L) }{ \ln(L_{\text{null}}) }
\]
\(L\):評価対象のmodelの尤度
\(L_{\text{null}}\):null model(パスを何も引かないモデル)の尤度
McFaddenの疑似\(R^2\)は離散変数にのみ使える
Cox-Snellの疑似\(R^2\)
\[\begin{split}
\begin{aligned}
R^2_{\text{CS}}
&= 1 - \left( \frac{L_{\text{null}}}{L} \right)^{2 / n_{\text{obs}}}\\
&= 1 - \exp \left(
\frac{2}{n_{\text{obs}}} \big( \ln(L_{\text{null}}) - \ln(L) \big)
\right)
\end{aligned}
\end{split}\]
\(L\):評価対象のmodelの尤度
\(L_{\text{null}}\):null model(パスを何も引かないモデル)の尤度
Cox-Snellの疑似\(R^2\)は離散変数と連続変数の両方に使える
例#
import statsmodels.api as sm
data = sm.datasets.scotland.load()
data.exog = sm.add_constant(data.exog)
model = sm.GLM(data.endog, data.exog, family=sm.families.Gamma())
result = model.fit(cov_type="HC1")
print(result.summary())
Generalized Linear Model Regression Results
==============================================================================
Dep. Variable: YES No. Observations: 32
Model: GLM Df Residuals: 24
Model Family: Gamma Df Model: 7
Link Function: InversePower Scale: 0.0035843
Method: IRLS Log-Likelihood: -83.017
Date: Fri, 29 Nov 2024 Deviance: 0.087389
Time: 04:54:33 Pearson chi2: 0.0860
No. Iterations: 6 Pseudo R-squ. (CS): 0.9800
Covariance Type: HC1
======================================================================================
coef std err z P>|z| [0.025 0.975]
--------------------------------------------------------------------------------------
const -0.0178 0.007 -2.391 0.017 -0.032 -0.003
COUTAX 4.962e-05 9.02e-06 5.501 0.000 3.19e-05 6.73e-05
UNEMPF 0.0020 0.000 6.680 0.000 0.001 0.003
MOR -7.181e-05 2.18e-05 -3.288 0.001 -0.000 -2.9e-05
ACT 0.0001 2.73e-05 4.097 0.000 5.83e-05 0.000
GDP -1.468e-07 7.9e-08 -1.858 0.063 -3.02e-07 8.08e-09
AGE -0.0005 0.000 -3.336 0.001 -0.001 -0.000
COUTAX_FEMALEUNEMP -2.427e-06 4.15e-07 -5.852 0.000 -3.24e-06 -1.61e-06
======================================================================================
/usr/local/lib/python3.10/site-packages/statsmodels/genmod/generalized_linear_model.py:308: DomainWarning: The InversePower link function does not respect the domain of the Gamma family.
warnings.warn((f"The {type(family.link).__name__} link function "
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
y_train = data.endog
y_pred = result.predict(data.exog)
ax.scatter(y_train, y_pred)
ax.plot(y_train, y_train)
r2_cs = result.pseudo_rsquared(kind="cs")
r2_mcf = result.pseudo_rsquared(kind="mcf")
ax.set(xlabel="Actual", ylabel="Predicted",
title="Actual v.s. Predicted\n" + r"$R^2_{CS}$=" + f"{r2_cs:.3f}" + r", $R^2_{McF}$=" + f"{r2_mcf:.3f}")
fig.show()