分位点回帰#
分位点
と表す。ここで
例えば
標準的な回帰モデルは二乗誤差
分位点回帰 (quantile regression)モデルはpinball loss
pinball lossは
あるいは
あるいは
と書かれる
Show code cell source
import numpy as np
import matplotlib.pyplot as plt
def pinball_loss(x, tau):
return (tau - 1 * (x <= 0)) * x
x = np.linspace(-1, 1, 100)
fig, axes = plt.subplots(figsize=[10, 2], ncols=3)
for i, tau in enumerate([0.1, 0.5, 0.9]):
y = pinball_loss(x, tau=tau)
axes[i].plot(x, y)
if i == 0:
axes[i].set(title=f"τ={tau}", xlabel=r"$x$", ylabel=r"$y = (\tau - 1(x <= 0)) x$")
else:
axes[i].set(title=f"τ={tau}", xlabel=r"$x$")
fig.show()

なお、pinball lossは
と、絶対誤差と比例する形になる。
絶対誤差の和を目的関数にとった線形モデルは統計学においてleast absolute deviations (LAD) と呼ばれ、その解は条件付き中央値になる
Show code cell source
import numpy as np
import matplotlib.pyplot as plt
def pinball_loss(x, tau):
return (tau - 1 * (x <= 0)) * x
x = np.linspace(-3, 3, 100)
fig, ax = plt.subplots(figsize=[4, 3])
ax.plot(x, pinball_loss(x, tau=0.5), label=r"$\rho_{0.5}(x)$")
ax.plot(x, abs(x), label="|x|")
ax.legend()
fig.show()

分位点回帰モデルの実践#
LightGBMでのquantile regression#
目的関数をbinball lossにすればいいだけなので他のアルゴリズムでも実行できる
Show code cell source
import numpy as np
import matplotlib.pyplot as plt
# create data
np.random.seed(0)
n = 1000
x = np.random.uniform(-3, 3, size=n)
X = x.reshape(-1, 1)
y = np.sin(x) + np.random.normal(scale=0.5, size=n)
fig, ax = plt.subplots()
ax.scatter(x, y, alpha=0.7)
# regression
x_plot = np.linspace(-3, 3, 500)
X_test = x_plot.reshape(-1, 1)
import lightgbm as lgb
taus = [0.1, 0.9]
colors = ["orange", "tomato"]
for i in range(2):
model = lgb.LGBMRegressor(objective="quantile", alpha=taus[i], verbose=-1)
model.fit(X, y)
y_hat = model.predict(X_test)
ax.plot(X_test, y_hat, color=colors[i], label=fr"$\tau = {taus[i]}$")
ax.set(xlabel="x", ylabel="y", title="Quantile Regression with LightGBM")
ax.legend()
fig.show()
/usr/local/lib/python3.10/site-packages/sklearn/utils/validation.py:2739: UserWarning: X does not have valid feature names, but LGBMRegressor was fitted with feature names
warnings.warn(
/usr/local/lib/python3.10/site-packages/sklearn/utils/validation.py:2739: UserWarning: X does not have valid feature names, but LGBMRegressor was fitted with feature names
warnings.warn(

from sklearn.metrics import d2_pinball_score, make_scorer
d2_pinball_score_09 = make_scorer(d2_pinball_score, alpha=0.9)
d2_pinball_score_09(model, X, y)
/usr/local/lib/python3.10/site-packages/sklearn/utils/validation.py:2739: UserWarning: X does not have valid feature names, but LGBMRegressor was fitted with feature names
warnings.warn(
0.48475107952573926