ナイーブベイズ

ナイーブベイズ#

すべての特徴量同士の関係について、目的変数を条件付けたもとでの条件付き独立をnaiveに仮定するモデル

特徴量 $x_{1}, \dots, x_{n}$ のもとでの目的変数 $y$ の確率 $P (y ∣ x_{1}, \dots, x_{n})$ を次のように表す

P (y ∣ x_{1}, \dots, x_{n}) = \frac{P (y) P (x_{1}, \dots, x_{n} ∣ y)}{P (x_{1}, \dots, x_{n})}

条件付き独立の仮定により

P (x_{i} | y, x_{1}, \dots, x_{i - 1}, x_{i + 1}, \dots, x_{n}) = P (x_{i} | y)

なので式は簡素化され

P (y ∣ x_{1}, \dots, x_{n}) = \frac{P (y) \prod_{i = 1}^{n} P (x_{i} ∣ y)}{P (x_{1}, \dots, x_{n})}

$P (x_{1}, \dots, x_{n})$ は入力を所与とすると定数なので

\begin{array}{r} \begin{matrix} \begin{matrix} P (y ∣ x_{1}, \dots, x_{n}) \propto P (y) \prod_{i = 1}^{n} P (x_{i} ∣ y) \\ ⇓ \\ \hat{y} = \arg max_{y} P (y) \prod_{i = 1}^{n} P (x_{i} ∣ y), \end{matrix} \end{matrix} \end{array}

例#

線形分離不可能問題が解けない様子

# データの用意
import numpy as np
from sklearn.datasets import make_blobs
centers = [(1, 1), (1, -1), (-1, 1),  (-1, -1)]
X, y = make_blobs(n_samples=10000, n_features=2, centers=centers, cluster_std=[0.5, 0.5, 0.5, 0.5], random_state=0)

def replace_label(y):
    if y == 2:
        return 1
    if y == 3:
        return 0
    return y

y = np.array(list(map(replace_label, y)))

import matplotlib.pyplot as plt
fig, ax = plt.subplots()
for y_val in set(y):
    idx = y == y_val
    ax.scatter(X[idx, 0], X[idx, 1], label=f"y == {y_val}", alpha=0.3)
ax.legend()
fig.show()

../_images/ae06fd2b57903b3966e3f726a546c5615ddbf9d6fe5ed92e122515e71672ecdd.png

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

from sklearn.naive_bayes import BernoulliNB
clf = BernoulliNB()
clf.fit(X_train, y_train)

BernoulliNB()

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

from sklearn.metrics import ConfusionMatrixDisplay
ConfusionMatrixDisplay.from_estimator(clf, X_test, y_test)

<sklearn.metrics._plot.confusion_matrix.ConfusionMatrixDisplay at 0x7fd6fc112a10>

../_images/40f7aad705a09d0fbcb33ccf8cb89b17fb493e2a1d1c398906b68edf4c5ad888.png

../_images/b1b5a367e122f7ed0a26904e08c759846a8d98e04aab8f160edcd034c3b86a1a.png

参考文献#

1.9. Naive Bayes — scikit-learn 1.4.1 documentation

ナイーブベイズ

Contents

ナイーブベイズ#

例#

参考文献#