疑似R2乗値

疑似R2乗値#

GLMでは通常のR2乗値は使えないため、疑似R2乗値(pseudo R-squared)というものが使われる

疑似R2の種類#

McFaddenの疑似\(R^2\)

\[ R^2_{\text{McF}} = 1 - \frac{ \ln(L) }{ \ln(L_{\text{null}}) } \]
  • \(L\):評価対象のmodelの尤度

  • \(L_{\text{null}}\):null model(パスを何も引かないモデル)の尤度

McFaddenの疑似\(R^2\)は離散変数にのみ使える

Cox-Snellの疑似\(R^2\)

\[\begin{split} \begin{aligned} R^2_{\text{CS}} &= 1 - \left( \frac{L_{\text{null}}}{L} \right)^{2 / n_{\text{obs}}}\\ &= 1 - \exp \left( \frac{2}{n_{\text{obs}}} \big( \ln(L_{\text{null}}) - \ln(L) \big) \right) \end{aligned} \end{split}\]
  • \(L\):評価対象のmodelの尤度

  • \(L_{\text{null}}\):null model(パスを何も引かないモデル)の尤度

Cox-Snellの疑似\(R^2\)は離散変数と連続変数の両方に使える

#

import statsmodels.api as sm

data = sm.datasets.scotland.load()
data.exog = sm.add_constant(data.exog)
model = sm.GLM(data.endog, data.exog, family=sm.families.Gamma())
result = model.fit(cov_type="HC1")
print(result.summary())
                 Generalized Linear Model Regression Results                  
==============================================================================
Dep. Variable:                    YES   No. Observations:                   32
Model:                            GLM   Df Residuals:                       24
Model Family:                   Gamma   Df Model:                            7
Link Function:           InversePower   Scale:                       0.0035843
Method:                          IRLS   Log-Likelihood:                -83.017
Date:                Fri, 29 Nov 2024   Deviance:                     0.087389
Time:                        04:54:33   Pearson chi2:                   0.0860
No. Iterations:                     6   Pseudo R-squ. (CS):             0.9800
Covariance Type:                  HC1                                         
======================================================================================
                         coef    std err          z      P>|z|      [0.025      0.975]
--------------------------------------------------------------------------------------
const                 -0.0178      0.007     -2.391      0.017      -0.032      -0.003
COUTAX              4.962e-05   9.02e-06      5.501      0.000    3.19e-05    6.73e-05
UNEMPF                 0.0020      0.000      6.680      0.000       0.001       0.003
MOR                -7.181e-05   2.18e-05     -3.288      0.001      -0.000    -2.9e-05
ACT                    0.0001   2.73e-05      4.097      0.000    5.83e-05       0.000
GDP                -1.468e-07    7.9e-08     -1.858      0.063   -3.02e-07    8.08e-09
AGE                   -0.0005      0.000     -3.336      0.001      -0.001      -0.000
COUTAX_FEMALEUNEMP -2.427e-06   4.15e-07     -5.852      0.000   -3.24e-06   -1.61e-06
======================================================================================
/usr/local/lib/python3.10/site-packages/statsmodels/genmod/generalized_linear_model.py:308: DomainWarning: The InversePower link function does not respect the domain of the Gamma family.
  warnings.warn((f"The {type(family.link).__name__} link function "
import matplotlib.pyplot as plt

fig, ax = plt.subplots()
y_train = data.endog
y_pred = result.predict(data.exog)
ax.scatter(y_train, y_pred)
ax.plot(y_train, y_train)

r2_cs = result.pseudo_rsquared(kind="cs")
r2_mcf = result.pseudo_rsquared(kind="mcf")
ax.set(xlabel="Actual", ylabel="Predicted",
       title="Actual v.s. Predicted\n" + r"$R^2_{CS}$=" + f"{r2_cs:.3f}" + r", $R^2_{McF}$=" + f"{r2_mcf:.3f}")
fig.show()
../../_images/dd1ab9f544bfb4b190e01c18c6908bab3f54780497491ff4a1486fc0572d0540.png

参考#