点推定#
ある確率分布\(f(x|\boldsymbol{\theta})\)を想定し、その未知の母数\(\boldsymbol{\theta}=(\theta_1, ..., \theta_k)\)を、その確率分布に従うランダムに抽出した\(n\)個の標本\(\boldsymbol{X}=(X_1,...,X_n)\)にもとづいて推定する問題を考える。
モーメント法#
のランダムサンプルについて、モーメント \(E[X^r]\) を標本モーメント \(\frac{1}{n} \sum^n_{i=1} X_i^r\) で置き換え
の同時方程式を\(\theta_1,...,\theta_k\)について解くことによって推定量\(\hat{\boldsymbol{\theta}}=(\hat{\theta}_1, ..., \hat{\theta}_k)\)を得る。これをモーメント推定量(moment estimator)という。
例
\(X \sim N(\mu, \sigma^2)\)とすると、
より、
が\(\mu, \sigma^2\)のモーメント推定量となる。
最尤推定法#
「得られた標本は確率が最大のもの(最も尤もらしいもの)が実現した」という仮定に基づき、もっともらしさの関数(尤度関数)を最大にするパラメータを推定する方法。
尤度関数(likelihood function)とは\(X_1,...,X_n\)の確率関数の積
で、サンプルのもとでその\(\theta\)のもっともらしさを示す関数である。
確率の積は数学的には扱いにくいので、通常はその対数をとった対数尤度
を扱う。
例:コイントス
コインを5回投げて2回表がでた(表を1とすると、\(\boldsymbol{X}=(0,1,1,0,0)\))とする。
確率\(p\)で1をとるベルヌーイ分布の確率質量関数は
であるため、尤度関数は
となり、尤度関数にいれる\(p\)の値を変えていくと
のようになる。これを繰り返すと次の図のように描くことができる。
そして、尤度関数を最大化するパラメータを点推定量として採用する。
Show code cell source
# 尤度関数のプロットを描いたコード
import numpy as np
import matplotlib.pyplot as plt
import japanize_matplotlib
x = np.array([0, 1, 1, 0, 0])
def bernoulli(p, r, n):
return p ** r * (1 - p) ** (n - r)
theta = np.linspace(0, 1, 100)
likelihood = [bernoulli(p, r = sum(x), n = len(x)) for p in theta]
fig, ax = plt.subplots()
ax.plot(theta, likelihood)
ax.set(xlabel='p', ylabel='尤度', title='尤度関数 L(p)=p^2*(1-p)^3')
from myst_nb import glue
glue("cointoss_likelihood", fig, display=False)
Show code cell output
ベイズ法#
同時確率密度関数\(f(\boldsymbol{x}|\boldsymbol{\theta})\)の\(\boldsymbol{\theta}\)を確率変数とみなして確率分布を仮定する。これを事前分布(prior distribution)といい、\(\pi(\boldsymbol{\theta}|\boldsymbol{\xi})\)と書く。\(\boldsymbol{\xi}\)は事前分布の母数であり、超母数(hyperparameter)と呼ばれる。
このモデルは次のように表される。
\(\boldsymbol{X}=\boldsymbol{x}\)を与えたときの\(\boldsymbol{\theta}\)の条件付き分布を\(\boldsymbol{\theta}\)の事後分布(posterior distribution)といい、
で与えられる。
ここで\(f_\pi (\boldsymbol{x | \xi})\)は\(\boldsymbol{X}\)の周辺分布で、\(\boldsymbol{\theta}\)が連続型確率変数のとき
である。
ベイズ法とは事後分布から推定量を導く方法である。事後分布の平均\(E[\boldsymbol{\theta|X}]\)は事後期待値(expected a posteriori: EAP)と呼ばれる。 事後分布の最頻値は事後確率最大値(maximum a posteriori: MAP)やベイズ的最尤推定量(Bayesian maximum likelihood estimator)と呼ばれる。こうした分布の代表値を使用して点推定を行うことができる。
MCMCによる推定#
Show code cell source
# nest_asyncio: asyncioを使うstanをjupyterで使うための対処
# [Stanによる推定例:ベルヌーイ分布のパラメータ - The One with ...](https://hamada.hatenablog.jp/entry/2017/06/28/100815)
import nest_asyncio
nest_asyncio.apply()
import stan
stan_code = """
data {
int N;
array[N] int X;
}
parameters {
real<lower=0, upper=1> p;
}
model {
for (i in 1:N)
X[i] ~ bernoulli(p);
}
"""
x = np.array([0, 1, 1, 0, 0])
data = {
"N": len(x),
"X": x,
}
posterior = stan.build(stan_code, data=data, random_seed=1)
fit = posterior.sample(num_chains=10, num_samples=10000)
df = fit.to_frame()
df
Show code cell output
Building...
In file included from /usr/local/lib/python3.10/site-packages/httpstan/include/tbb/concurrent_unordered_map.h:26,
from /usr/local/lib/python3.10/site-packages/httpstan/include/stan/math/rev/core/profiling.hpp:10,
from /usr/local/lib/python3.10/site-packages/httpstan/include/stan/math/rev/core.hpp:53,
from /usr/local/lib/python3.10/site-packages/httpstan/include/stan/math/rev.hpp:10,
from /usr/local/lib/python3.10/site-packages/httpstan/include/stan/math.hpp:19,
from /usr/local/lib/python3.10/site-packages/httpstan/include/stan/model/model_header.hpp:4,
from /github/home/.cache/httpstan/4.13.0/models/j444rpnr/model_j444rpnr.cpp:2:
/usr/local/lib/python3.10/site-packages/httpstan/include/tbb/internal/_concurrent_unordered_impl.h: In instantiation of ‘void tbb::interface5::internal::concurrent_unordered_base<Traits>::internal_init() [with Traits = tbb::interface5::concurrent_unordered_map_traits<std::pair<std::basic_string<char>, std::thread::id>, stan::math::profile_info, tbb::interface5::internal::hash_compare<std::pair<std::basic_string<char>, std::thread::id>, stan::math::internal::hash_profile_key, stan::math::internal::equal_profile_key>, tbb::tbb_allocator<std::pair<const std::pair<std::basic_string<char>, std::thread::id>, stan::math::profile_info> >, false>]’:
/usr/local/lib/python3.10/site-packages/httpstan/include/tbb/internal/_concurrent_unordered_impl.h:773:9: required from ‘tbb::interface5::internal::concurrent_unordered_base<Traits>::concurrent_unordered_base(size_type, const hash_compare&, const allocator_type&) [with Traits = tbb::interface5::concurrent_unordered_map_traits<std::pair<std::basic_string<char>, std::thread::id>, stan::math::profile_info, tbb::interface5::internal::hash_compare<std::pair<std::basic_string<char>, std::thread::id>, stan::math::internal::hash_profile_key, stan::math::internal::equal_profile_key>, tbb::tbb_allocator<std::pair<const std::pair<std::basic_string<char>, std::thread::id>, stan::math::profile_info> >, false>; size_type = long unsigned int; hash_compare = tbb::interface5::internal::hash_compare<std::pair<std::basic_string<char>, std::thread::id>, stan::math::internal::hash_profile_key, stan::math::internal::equal_profile_key>; allocator_type = std::allocator_traits<tbb::tbb_allocator<std::pair<const std::pair<std::basic_string<char>, std::thread::id>, stan::math::profile_info> > >::rebind_alloc<std::pair<const std::pair<std::basic_string<char>, std::thread::id>, stan::math::profile_info> >]’
/usr/local/lib/python3.10/site-packages/httpstan/include/tbb/concurrent_unordered_map.h:112:68: required from ‘tbb::interface5::concurrent_unordered_map<Key, T, Hasher, Key_equality, Allocator>::concurrent_unordered_map(size_type, const hasher&, const key_equal&, const allocator_type&) [with Key = std::pair<std::basic_string<char>, std::thread::id>; T = stan::math::profile_info; Hasher = stan::math::internal::hash_profile_key; Key_equality = stan::math::internal::equal_profile_key; Allocator = tbb::tbb_allocator<std::pair<const std::pair<std::basic_string<char>, std::thread::id>, stan::math::profile_info> >; size_type = long unsigned int; hasher = stan::math::internal::hash_profile_key; key_equal = stan::math::internal::equal_profile_key; allocator_type = std::allocator_traits<tbb::tbb_allocator<std::pair<const std::pair<std::basic_string<char>, std::thread::id>, stan::math::profile_info> > >::rebind_alloc<std::pair<const std::pair<std::basic_string<char>, std::thread::id>, stan::math::profile_info> >]’
/github/home/.cache/httpstan/4.13.0/models/j444rpnr/model_j444rpnr.cpp:6:25: required from here
/usr/local/lib/python3.10/site-packages/httpstan/include/tbb/internal/_concurrent_unordered_impl.h:1345:15: warning: ‘void* memset(void*, int, size_t)’ clearing an object of type ‘struct tbb::atomic<tbb::interface5::internal::flist_iterator<tbb::interface5::internal::split_ordered_list<std::pair<const std::pair<std::basic_string<char>, std::thread::id>, stan::math::profile_info>, tbb::tbb_allocator<std::pair<const std::pair<std::basic_string<char>, std::thread::id>, stan::math::profile_info> > >, std::pair<const std::pair<std::basic_string<char>, std::thread::id>, stan::math::profile_info> >*>’ with no trivial copy-assignment; use assignment or value-initialization instead [-Wclass-memaccess]
1345 | memset(my_buckets, 0, sizeof(my_buckets));
| ~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /usr/local/lib/python3.10/site-packages/httpstan/include/tbb/tbb_profiling.h:123,
from /usr/local/lib/python3.10/site-packages/httpstan/include/tbb/task.h:36,
from /usr/local/lib/python3.10/site-packages/httpstan/include/tbb/task_arena.h:23,
from /usr/local/lib/python3.10/site-packages/httpstan/include/stan/math/prim/core/init_threadpool_tbb.hpp:18,
from /usr/local/lib/python3.10/site-packages/httpstan/include/stan/math/prim/core.hpp:4,
from /usr/local/lib/python3.10/site-packages/httpstan/include/stan/math/rev/core/Eigen_NumTraits.hpp:5,
from /usr/local/lib/python3.10/site-packages/httpstan/include/stan/math/rev/core/typedefs.hpp:7,
from /usr/local/lib/python3.10/site-packages/httpstan/include/stan/math/rev/core/chainable_object.hpp:6,
from /usr/local/lib/python3.10/site-packages/httpstan/include/stan/math/rev/core.hpp:10:
/usr/local/lib/python3.10/site-packages/httpstan/include/tbb/atomic.h:507:1: note: ‘struct tbb::atomic<tbb::interface5::internal::flist_iterator<tbb::interface5::internal::split_ordered_list<std::pair<const std::pair<std::basic_string<char>, std::thread::id>, stan::math::profile_info>, tbb::tbb_allocator<std::pair<const std::pair<std::basic_string<char>, std::thread::id>, stan::math::profile_info> > >, std::pair<const std::pair<std::basic_string<char>, std::thread::id>, stan::math::profile_info> >*>’ declared here
507 | atomic<T*>: internal::atomic_impl_with_arithmetic<T*,ptrdiff_t,T> {
| ^~~~~~~~~~
Building: 20.8s, done.
Messages from stanc:
Warning: The parameter p has no priors. This means either no prior is
provided, or the prior(s) depend on data variables. In the later case,
this may be a false positive.
Sampling: 0%
Sampling: 10% (11000/110000)
Sampling: 20% (22000/110000)
Sampling: 30% (33000/110000)
Sampling: 40% (44000/110000)
Sampling: 50% (55000/110000)
Sampling: 60% (66000/110000)
Sampling: 70% (77000/110000)
Sampling: 80% (88000/110000)
Sampling: 90% (99000/110000)
Sampling: 100% (110000/110000)
Sampling: 100% (110000/110000), done.
Messages received during sampling:
Gradient evaluation took 1.7e-05 seconds
1000 transitions using 10 leapfrog steps per transition would take 0.17 seconds.
Adjust your expectations accordingly!
Gradient evaluation took 1e-05 seconds
1000 transitions using 10 leapfrog steps per transition would take 0.1 seconds.
Adjust your expectations accordingly!
Gradient evaluation took 1.2e-05 seconds
1000 transitions using 10 leapfrog steps per transition would take 0.12 seconds.
Adjust your expectations accordingly!
Gradient evaluation took 1e-05 seconds
1000 transitions using 10 leapfrog steps per transition would take 0.1 seconds.
Adjust your expectations accordingly!
Gradient evaluation took 3e-06 seconds
1000 transitions using 10 leapfrog steps per transition would take 0.03 seconds.
Adjust your expectations accordingly!
Gradient evaluation took 5e-06 seconds
1000 transitions using 10 leapfrog steps per transition would take 0.05 seconds.
Adjust your expectations accordingly!
Gradient evaluation took 2e-06 seconds
1000 transitions using 10 leapfrog steps per transition would take 0.02 seconds.
Adjust your expectations accordingly!
Gradient evaluation took 3e-06 seconds
1000 transitions using 10 leapfrog steps per transition would take 0.03 seconds.
Adjust your expectations accordingly!
Gradient evaluation took 3e-06 seconds
1000 transitions using 10 leapfrog steps per transition would take 0.03 seconds.
Adjust your expectations accordingly!
Gradient evaluation took 3e-06 seconds
1000 transitions using 10 leapfrog steps per transition would take 0.03 seconds.
Adjust your expectations accordingly!
parameters | lp__ | accept_stat__ | stepsize__ | treedepth__ | n_leapfrog__ | divergent__ | energy__ | p |
---|---|---|---|---|---|---|---|---|
draws | ||||||||
0 | -4.916791 | 0.878296 | 0.973668 | 2.0 | 3.0 | 0.0 | 5.638045 | 0.527176 |
1 | -4.789036 | 0.983342 | 1.055691 | 2.0 | 3.0 | 0.0 | 4.865028 | 0.453318 |
2 | -4.787570 | 1.000000 | 1.063648 | 1.0 | 3.0 | 0.0 | 4.929972 | 0.451125 |
3 | -4.805306 | 0.959883 | 1.121603 | 1.0 | 3.0 | 0.0 | 6.513737 | 0.470618 |
4 | -5.652753 | 1.000000 | 0.983822 | 1.0 | 1.0 | 0.0 | 5.940693 | 0.672167 |
... | ... | ... | ... | ... | ... | ... | ... | ... |
99995 | -5.967900 | 0.724627 | 1.014034 | 1.0 | 3.0 | 0.0 | 6.733939 | 0.708839 |
99996 | -4.929901 | 0.980139 | 1.146752 | 1.0 | 1.0 | 0.0 | 4.949786 | 0.531799 |
99997 | -5.340393 | 0.849165 | 0.929210 | 1.0 | 3.0 | 0.0 | 6.164319 | 0.626200 |
99998 | -4.804075 | 0.801233 | 0.934480 | 2.0 | 3.0 | 0.0 | 5.923040 | 0.388224 |
99999 | -4.780478 | 0.949646 | 0.955647 | 2.0 | 3.0 | 0.0 | 5.171340 | 0.431489 |
100000 rows × 8 columns
MCMCで生成した乱数の分布は次のようになった
Show code cell source
import matplotlib.pyplot as plt
import seaborn as sns
burn_in = 10000
ax = sns.displot(x="p", data=df.iloc[burn_in:, :], kde=True, bins=20)
Show code cell output
Show code cell source
# カーネル密度推定によりMAP推定値を取得する
from scipy.stats import gaussian_kde
kernel = gaussian_kde(df.iloc[burn_in:, :]["p"])
x_values = np.linspace(0, 1, 1000)
estimated_density = kernel(x_values)
max_index = np.argmax(estimated_density)
max_a_posteriori = x_values[max_index] # MAP estimate
fig, ax = plt.subplots()
ax.plot(x_values, estimated_density)
ax.set(xlabel="p", ylabel="density")
ax.axvline(max_a_posteriori, color="gray")
ax.text(max_a_posteriori * 1.1, 0.2, f"MAP estimate = {max_a_posteriori:.3f}", color="gray")
fig.show()
例#
あるゲームのガチャを1000回引いた結果、「外れ」と「当たり」の回数が以下のようになった。このガチャの「当たり」の確率はいくつか。
Show code cell source
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
np.random.seed(0)
x = np.random.binomial(n=1, p=0.1, size=1000)
x = pd.Series(x)
table = x.map({0: "外れ", 1: "当たり"}).value_counts().to_frame()
table.columns = ['回数']
table.T
外れ | 当たり | |
---|---|---|
回数 | 892 | 108 |
モーメント法による推定#
ベルヌーイ分布\(Ber(p)\)の平均が成功確率\(p\)なので、\(\hat{p} = \sum^n_{i=1} X_i / n\)
# モーメント法による推定
n = len(x)
sum(x) / n
0.108
最尤法による推定#
ベルヌーイ分布
\(X_i = 1\)となる確率が\(p\)のベルヌーイ分布\(Ber(p)\)に従った確率変数の実現値がサンプルだとしたとき、\(r=\sum X_i\)とすると、その尤度関数は
であり、対数尤度関数は
となる。
今回のサンプルのもとでは次の図のような曲線になる。(グレーの縦線は最尤推定値のpを示す)
Show code cell source
fig, axes = plt.subplots(ncols=2, figsize=(12, 2))
# 尤度関数
p_candidates = np.linspace(0.0001, 0.5, 1000)
likelihood = np.array([bernoulli(p, n = len(x), r = sum(x)) for p in p_candidates])
max_p = p_candidates[np.argmax(likelihood)] # 最尤推定値
axes[0].plot(p_candidates, likelihood)
axes[0].set(xlabel='p', ylabel='尤度', title='尤度')
axes[0].axvline(max_p, color="gray")
# 対数尤度関数
def log_bernoulli(p, n, r):
return r * np.log(p) + (n - r) * np.log(1 - p)
likelihood = np.array([log_bernoulli(p, n = len(x), r = sum(x)) for p in p_candidates])
max_p = p_candidates[np.argmax(likelihood)] # 最尤推定値
axes[1].plot(p_candidates, likelihood)
axes[1].set(xlabel='p', ylabel='対数尤度', title='対数尤度')
axes[1].axvline(max_p, color="gray")
fig.show()
対数尤度の導関数は
で、これを0とおいて\(p\)について解くと
となり、モーメント法と同じ結果になる。
(参考)式展開
なので
から
なので
これは
なので
ゆえに
解析的に解くことができない場合は勾配降下法などで数値的に解く。
ベイズ推定#
ベイズ推定ではパラメータの確率分布を推定するため、点推定を行いたい場合はその分布の何らかの代表値(期待値や中央値や最頻値)を推定することになる。
以下ではStanを用い、無情報事前分布を使用して推定を行う。
# nest_asyncio: asyncioを使うstanをjupyterで使うための対処
# [Stanによる推定例:ベルヌーイ分布のパラメータ - The One with ...](https://hamada.hatenablog.jp/entry/2017/06/28/100815)
import nest_asyncio
nest_asyncio.apply()
import stan
stan_code = """
data {
int N;
array[N] int X;
}
parameters {
real<lower=0, upper=1> p;
}
model {
for (i in 1:N)
X[i] ~ bernoulli(p);
}
"""
data = {
"N": len(x),
"X": list(x),
}
posterior = stan.build(stan_code, data=data, random_seed=1)
fit = posterior.sample(num_chains=10, num_samples=10000)
df = fit.to_frame()
Show code cell source
# nest_asyncio: asyncioを使うstanをjupyterで使うための対処
# [Stanによる推定例:ベルヌーイ分布のパラメータ - The One with ...](https://hamada.hatenablog.jp/entry/2017/06/28/100815)
import nest_asyncio
nest_asyncio.apply()
import stan
stan_code = """
data {
int N;
array[N] int X;
}
parameters {
real<lower=0, upper=1> p;
}
model {
for (i in 1:N)
X[i] ~ bernoulli(p);
}
"""
data = {
"N": len(x),
"X": list(x),
}
posterior = stan.build(stan_code, data=data, random_seed=1)
fit = posterior.sample(num_chains=10, num_samples=10000)
df = fit.to_frame()
df
Show code cell output
Building...
Building: found in cache, done.
Messages from stanc:
Warning: The parameter p has no priors. This means either no prior is
provided, or the prior(s) depend on data variables. In the later case,
this may be a false positive.
Sampling: 0%
Sampling: 10% (11000/110000)
Sampling: 20% (22000/110000)
Sampling: 30% (33000/110000)
Sampling: 40% (44000/110000)
Sampling: 50% (55000/110000)
Sampling: 60% (66000/110000)
Sampling: 70% (77000/110000)
Sampling: 80% (88000/110000)
Sampling: 90% (99000/110000)
Sampling: 100% (110000/110000)
Sampling: 100% (110000/110000), done.
Messages received during sampling:
Gradient evaluation took 0.000105 seconds
1000 transitions using 10 leapfrog steps per transition would take 1.05 seconds.
Adjust your expectations accordingly!
Gradient evaluation took 9.8e-05 seconds
1000 transitions using 10 leapfrog steps per transition would take 0.98 seconds.
Adjust your expectations accordingly!
Gradient evaluation took 0.000117 seconds
1000 transitions using 10 leapfrog steps per transition would take 1.17 seconds.
Adjust your expectations accordingly!
Gradient evaluation took 0.000124 seconds
1000 transitions using 10 leapfrog steps per transition would take 1.24 seconds.
Adjust your expectations accordingly!
Gradient evaluation took 4.8e-05 seconds
1000 transitions using 10 leapfrog steps per transition would take 0.48 seconds.
Adjust your expectations accordingly!
Gradient evaluation took 4.8e-05 seconds
1000 transitions using 10 leapfrog steps per transition would take 0.48 seconds.
Adjust your expectations accordingly!
Gradient evaluation took 5.9e-05 seconds
1000 transitions using 10 leapfrog steps per transition would take 0.59 seconds.
Adjust your expectations accordingly!
Gradient evaluation took 6e-05 seconds
1000 transitions using 10 leapfrog steps per transition would take 0.6 seconds.
Adjust your expectations accordingly!
Gradient evaluation took 4.8e-05 seconds
1000 transitions using 10 leapfrog steps per transition would take 0.48 seconds.
Adjust your expectations accordingly!
Gradient evaluation took 4e-05 seconds
1000 transitions using 10 leapfrog steps per transition would take 0.4 seconds.
Adjust your expectations accordingly!
parameters | lp__ | accept_stat__ | stepsize__ | treedepth__ | n_leapfrog__ | divergent__ | energy__ | p |
---|---|---|---|---|---|---|---|---|
draws | ||||||||
0 | -346.425864 | 0.954325 | 1.024886 | 1.0 | 1.0 | 0.0 | 346.705203 | 0.091173 |
1 | -345.584044 | 0.982794 | 1.072204 | 2.0 | 3.0 | 0.0 | 345.715281 | 0.095826 |
2 | -344.683203 | 0.999978 | 0.948500 | 1.0 | 3.0 | 0.0 | 344.898082 | 0.106267 |
3 | -345.856302 | 0.728626 | 1.186335 | 1.0 | 3.0 | 0.0 | 346.355623 | 0.094134 |
4 | -344.948383 | 0.848318 | 0.905097 | 2.0 | 3.0 | 0.0 | 347.119909 | 0.116535 |
... | ... | ... | ... | ... | ... | ... | ... | ... |
99995 | -345.235547 | 0.997186 | 0.884190 | 2.0 | 3.0 | 0.0 | 345.348532 | 0.119730 |
99996 | -346.850363 | 0.944274 | 1.031342 | 1.0 | 1.0 | 0.0 | 347.126154 | 0.089298 |
99997 | -344.827985 | 1.000000 | 0.926327 | 1.0 | 1.0 | 0.0 | 344.969987 | 0.103007 |
99998 | -344.852168 | 1.000000 | 0.955250 | 1.0 | 1.0 | 0.0 | 344.946773 | 0.115141 |
99999 | -344.852602 | 0.918100 | 0.960718 | 1.0 | 3.0 | 0.0 | 345.642312 | 0.102627 |
100000 rows × 8 columns
# カーネル密度推定によりMAP推定値を取得する
from scipy.stats import gaussian_kde
kernel = gaussian_kde(df.iloc[burn_in:, :]["p"])
x_values = np.linspace(0, 0.5, 1000)
estimated_density = kernel(x_values)
max_index = np.argmax(estimated_density)
max_a_posteriori = x_values[max_index] # MAP estimate
fig, ax = plt.subplots()
ax.plot(x_values, estimated_density)
ax.set(xlabel="p", ylabel="density")
ax.axvline(max_a_posteriori, color="gray")
ax.text(max_a_posteriori + 0.05, 3, f"MAP estimate = {max_a_posteriori:.3f}", color="gray")
fig.show()