ナイーブベイズ

Contents

ナイーブベイズ#

すべての特徴量同士の関係について、目的変数を条件付けたもとでの条件付き独立をnaiveに仮定するモデル

特徴量x1,,xnのもとでの目的変数yの確率P(yx1,,xn)を次のように表す

P(yx1,,xn)=P(y)P(x1,,xny)P(x1,,xn)

条件付き独立の仮定により

P(xi|y,x1,,xi1,xi+1,,xn)=P(xi|y)

なので式は簡素化され

P(yx1,,xn)=P(y)i=1nP(xiy)P(x1,,xn)

P(x1,,xn)は入力を所与とすると定数なので

P(yx1,,xn)P(y)i=1nP(xiy)y^=argmaxyP(y)i=1nP(xiy),

#

線形分離不可能問題が解けない様子

# データの用意
import numpy as np
from sklearn.datasets import make_blobs
centers = [(1, 1), (1, -1), (-1, 1),  (-1, -1)]
X, y = make_blobs(n_samples=10000, n_features=2, centers=centers, cluster_std=[0.5, 0.5, 0.5, 0.5], random_state=0)

def replace_label(y):
    if y == 2:
        return 1
    if y == 3:
        return 0
    return y

y = np.array(list(map(replace_label, y)))

import matplotlib.pyplot as plt
fig, ax = plt.subplots()
for y_val in set(y):
    idx = y == y_val
    ax.scatter(X[idx, 0], X[idx, 1], label=f"y == {y_val}", alpha=0.3)
ax.legend()
fig.show()
../_images/ae06fd2b57903b3966e3f726a546c5615ddbf9d6fe5ed92e122515e71672ecdd.png
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
from sklearn.naive_bayes import BernoulliNB
clf = BernoulliNB()
clf.fit(X_train, y_train)
BernoulliNB()
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
from sklearn.metrics import ConfusionMatrixDisplay
ConfusionMatrixDisplay.from_estimator(clf, X_test, y_test)
<sklearn.metrics._plot.confusion_matrix.ConfusionMatrixDisplay at 0x7fd6fc112a10>
../_images/40f7aad705a09d0fbcb33ccf8cb89b17fb493e2a1d1c398906b68edf4c5ad888.png
Hide code cell source
from sklearn.inspection import DecisionBoundaryDisplay
fig, ax = plt.subplots()
disp = DecisionBoundaryDisplay.from_estimator(
    clf,
    X_test,
    response_method="predict",
    cmap=plt.cm.coolwarm,
    alpha=0.8,
    ax=ax,
    xlabel="x1",
    ylabel="x2",
)
ax.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.coolwarm, s=20, edgecolors="k")
ax.set(title="Decision Boundary of Naive Bayes")
fig.show()
../_images/b1b5a367e122f7ed0a26904e08c759846a8d98e04aab8f160edcd034c3b86a1a.png

参考文献#