点推定

点推定#

ある確率分布 $f (x | θ)$ を想定し、その未知の母数 $θ = (θ_{1}, . . ., θ_{k})$ を、その確率分布に従うランダムに抽出した $n$ 個の標本 $X = (X_{1}, . . ., X_{n})$ にもとづいて推定する問題を考える。

モーメント法#

X_{1}, . . ., X_{n}, i . i . d . \sim f (x | θ)

のランダムサンプルについて、モーメント $E [X^{r}]$ を標本モーメント $\frac{1}{n} \sum_{i = 1}^{n} X_{i}^{r}$ で置き換え

\begin{array}{r} {\begin{cases} \frac{1}{n} \sum_{i = 1}^{n} X_{i} = μ_{1}^{'} (θ_{1}, . . ., θ_{k}) \\ \frac{1}{n} \sum_{i = 1}^{n} X_{i}^{2} = μ_{2}^{'} (θ_{1}, . . ., θ_{k}) \\ ⋮ \\ \frac{1}{n} \sum_{i = 1}^{n} X_{i}^{k} = μ_{k}^{'} (θ_{1}, . . ., θ_{k}) \end{cases} \end{array}

の同時方程式を $θ_{1}, . . ., θ_{k}$ について解くことによって推定量 $\hat{θ} = ({\hat{θ}}_{1}, . . ., {\hat{θ}}_{k})$ を得る。これをモーメント推定量（moment estimator）という。

例

$X \sim N (μ, σ^{2})$ とすると、

\begin{array}{r} {\begin{cases} \frac{1}{n} \sum_{i = 1}^{n} X_{i} = μ \\ \frac{1}{n} \sum_{i = 1}^{n} X_{i}^{2} = σ^{2} + μ^{2} \end{cases} \end{array}

より、

\begin{array}{r} \begin{aligned} \hat{μ} & = \frac{1}{n} \sum_{i = 1}^{n} X_{i} \\ {\hat{σ}}^{2} & = \frac{1}{n} \sum_{i = 1}^{n} (X_{i} - \hat{μ})^{2} \end{aligned} \end{array}

が $μ, σ^{2}$ のモーメント推定量となる。

最尤推定法#

「得られた標本は確率が最大のもの（最も尤もらしいもの）が実現した」という仮定に基づき、もっともらしさの関数（尤度関数）を最大にするパラメータを推定する方法。

尤度関数（likelihood function）とは $X_{1}, . . ., X_{n}$ の確率関数の積

L (θ | X) = \prod_{i = 1}^{n} f (X_{i} | θ)

で、サンプルのもとでその $θ$ のもっともらしさを示す関数である。

確率の積は数学的には扱いにくいので、通常はその対数をとった対数尤度

\log L (θ | X) = \sum_{i = 1}^{n} \log f (X_{i} | θ)

を扱う。

例：コイントス

コインを5回投げて2回表がでた（表を1とすると、 $X = (0, 1, 1, 0, 0)$ ）とする。

確率 $p$ で1をとるベルヌーイ分布の確率質量関数は

P (X = 1) = p, P (X = 0) = 1 - p

であるため、尤度関数は

\begin{array}{r} \begin{aligned} L (p | X) & = (1 - p) \times p \times p \times (1 - p) \times (1 - p) \\ = p^{2} (1 - p)^{5 - 2} \end{aligned} \end{array}

となり、尤度関数にいれる $p$ の値を変えていくと

\begin{array}{r} \begin{matrix} L (p = 0.1 | X) = {0.1}^{2} \times {0.9}^{3} = 0.00729 \\ L (p = 0.5 | X) = {0.5}^{2} \times {0.5}^{3} = 0.03125 \\ L (p = 0.9 | X) = {0.9}^{2} \times {0.1}^{3} = 0.00081 \end{matrix} \end{array}

のようになる。これを繰り返すと次の図のように描くことができる。

../../_images/e679e4b3f2a29c8f7b55ba52cc8622b14e90a2e0dcb8af7206d36834127a30ad.png

そして、尤度関数を最大化するパラメータを点推定量として採用する。

ベイズ法#

同時確率密度関数 $f (x | θ)$ の $θ$ を確率変数とみなして確率分布を仮定する。これを事前分布（prior distribution）といい、 $π (θ | ξ)$ と書く。 $ξ$ は事前分布の母数であり、超母数（hyperparameter）と呼ばれる。

このモデルは次のように表される。

\begin{array}{r} {\begin{cases} X | θ \sim f (x | θ) \\ θ \sim π (θ | ξ) \end{cases} \end{array}

$X = x$ を与えたときの $θ$ の条件付き分布を $θ$ の事後分布（posterior distribution）といい、

π (θ | x, ξ) = \frac{f (x | θ) π (θ | ξ)}{f_{π} (x | ξ)}

で与えられる。

ここで $f_{π} (x | ξ)$ は $X$ の周辺分布で、 $θ$ が連続型確率変数のとき

f_{π} (x | ξ) = \int f (x | θ) π (θ | ξ) d θ

である。

ベイズ法とは事後分布から推定量を導く方法である。事後分布の平均 $E [θ | X]$ は事後期待値（expected a posteriori: EAP）と呼ばれる。事後分布の最頻値は事後確率最大値（maximum a posteriori: MAP）やベイズ的最尤推定量（Bayesian maximum likelihood estimator）と呼ばれる。こうした分布の代表値を使用して点推定を行うことができる。

MCMCによる推定#

Show code cell output Hide code cell output

Building...

In file included from /usr/local/lib/python3.10/site-packages/httpstan/include/tbb/concurrent_unordered_map.h:26,
                 from /usr/local/lib/python3.10/site-packages/httpstan/include/stan/math/rev/core/profiling.hpp:10,
                 from /usr/local/lib/python3.10/site-packages/httpstan/include/stan/math/rev/core.hpp:53,
                 from /usr/local/lib/python3.10/site-packages/httpstan/include/stan/math/rev.hpp:10,
                 from /usr/local/lib/python3.10/site-packages/httpstan/include/stan/math.hpp:19,
                 from /usr/local/lib/python3.10/site-packages/httpstan/include/stan/model/model_header.hpp:4,
                 from /github/home/.cache/httpstan/4.13.0/models/iaygxuon/model_iaygxuon.cpp:2:
/usr/local/lib/python3.10/site-packages/httpstan/include/tbb/internal/_concurrent_unordered_impl.h: In instantiation of ‘void tbb::interface5::internal::concurrent_unordered_base<Traits>::internal_init() [with Traits = tbb::interface5::concurrent_unordered_map_traits<std::pair<std::basic_string<char>, std::thread::id>, stan::math::profile_info, tbb::interface5::internal::hash_compare<std::pair<std::basic_string<char>, std::thread::id>, stan::math::internal::hash_profile_key, stan::math::internal::equal_profile_key>, tbb::tbb_allocator<std::pair<const std::pair<std::basic_string<char>, std::thread::id>, stan::math::profile_info> >, false>]’:
/usr/local/lib/python3.10/site-packages/httpstan/include/tbb/internal/_concurrent_unordered_impl.h:773:9:   required from ‘tbb::interface5::internal::concurrent_unordered_base<Traits>::concurrent_unordered_base(size_type, const hash_compare&, const allocator_type&) [with Traits = tbb::interface5::concurrent_unordered_map_traits<std::pair<std::basic_string<char>, std::thread::id>, stan::math::profile_info, tbb::interface5::internal::hash_compare<std::pair<std::basic_string<char>, std::thread::id>, stan::math::internal::hash_profile_key, stan::math::internal::equal_profile_key>, tbb::tbb_allocator<std::pair<const std::pair<std::basic_string<char>, std::thread::id>, stan::math::profile_info> >, false>; size_type = long unsigned int; hash_compare = tbb::interface5::internal::hash_compare<std::pair<std::basic_string<char>, std::thread::id>, stan::math::internal::hash_profile_key, stan::math::internal::equal_profile_key>; allocator_type = std::allocator_traits<tbb::tbb_allocator<std::pair<const std::pair<std::basic_string<char>, std::thread::id>, stan::math::profile_info> > >::rebind_alloc<std::pair<const std::pair<std::basic_string<char>, std::thread::id>, stan::math::profile_info> >]’
/usr/local/lib/python3.10/site-packages/httpstan/include/tbb/concurrent_unordered_map.h:112:68:   required from ‘tbb::interface5::concurrent_unordered_map<Key, T, Hasher, Key_equality, Allocator>::concurrent_unordered_map(size_type, const hasher&, const key_equal&, const allocator_type&) [with Key = std::pair<std::basic_string<char>, std::thread::id>; T = stan::math::profile_info; Hasher = stan::math::internal::hash_profile_key; Key_equality = stan::math::internal::equal_profile_key; Allocator = tbb::tbb_allocator<std::pair<const std::pair<std::basic_string<char>, std::thread::id>, stan::math::profile_info> >; size_type = long unsigned int; hasher = stan::math::internal::hash_profile_key; key_equal = stan::math::internal::equal_profile_key; allocator_type = std::allocator_traits<tbb::tbb_allocator<std::pair<const std::pair<std::basic_string<char>, std::thread::id>, stan::math::profile_info> > >::rebind_alloc<std::pair<const std::pair<std::basic_string<char>, std::thread::id>, stan::math::profile_info> >]’
/github/home/.cache/httpstan/4.13.0/models/iaygxuon/model_iaygxuon.cpp:6:25:   required from here
/usr/local/lib/python3.10/site-packages/httpstan/include/tbb/internal/_concurrent_unordered_impl.h:1345:15: warning: ‘void* memset(void*, int, size_t)’ clearing an object of type ‘struct tbb::atomic<tbb::interface5::internal::flist_iterator<tbb::interface5::internal::split_ordered_list<std::pair<const std::pair<std::basic_string<char>, std::thread::id>, stan::math::profile_info>, tbb::tbb_allocator<std::pair<const std::pair<std::basic_string<char>, std::thread::id>, stan::math::profile_info> > >, std::pair<const std::pair<std::basic_string<char>, std::thread::id>, stan::math::profile_info> >*>’ with no trivial copy-assignment; use assignment or value-initialization instead [-Wclass-memaccess]
 1345 |         memset(my_buckets, 0, sizeof(my_buckets));
      |         ~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /usr/local/lib/python3.10/site-packages/httpstan/include/tbb/tbb_profiling.h:123,
                 from /usr/local/lib/python3.10/site-packages/httpstan/include/tbb/task.h:36,
                 from /usr/local/lib/python3.10/site-packages/httpstan/include/tbb/task_arena.h:23,
                 from /usr/local/lib/python3.10/site-packages/httpstan/include/stan/math/prim/core/init_threadpool_tbb.hpp:18,
                 from /usr/local/lib/python3.10/site-packages/httpstan/include/stan/math/prim/core.hpp:4,
                 from /usr/local/lib/python3.10/site-packages/httpstan/include/stan/math/rev/core/Eigen_NumTraits.hpp:5,
                 from /usr/local/lib/python3.10/site-packages/httpstan/include/stan/math/rev/core/typedefs.hpp:7,
                 from /usr/local/lib/python3.10/site-packages/httpstan/include/stan/math/rev/core/chainable_object.hpp:6,
                 from /usr/local/lib/python3.10/site-packages/httpstan/include/stan/math/rev/core.hpp:10:
/usr/local/lib/python3.10/site-packages/httpstan/include/tbb/atomic.h:507:1: note: ‘struct tbb::atomic<tbb::interface5::internal::flist_iterator<tbb::interface5::internal::split_ordered_list<std::pair<const std::pair<std::basic_string<char>, std::thread::id>, stan::math::profile_info>, tbb::tbb_allocator<std::pair<const std::pair<std::basic_string<char>, std::thread::id>, stan::math::profile_info> > >, std::pair<const std::pair<std::basic_string<char>, std::thread::id>, stan::math::profile_info> >*>’ declared here
  507 | atomic<T*>: internal::atomic_impl_with_arithmetic<T*,ptrdiff_t,T> {
      | ^~~~~~~~~~

Building: 17.0s, done.

Messages from stanc:

Warning: The parameter p has no priors. This means either no prior is
    provided, or the prior(s) depend on data variables. In the later case,
    this may be a false positive.

Sampling:   0%

Sampling:  10% (11000/110000)

Sampling:  20% (22000/110000)

Sampling:  30% (33000/110000)

Sampling:  40% (44000/110000)

Sampling:  50% (55000/110000)

Sampling:  60% (66000/110000)

Sampling:  70% (77000/110000)

Sampling:  80% (88000/110000)

Sampling:  90% (99000/110000)

Sampling: 100% (110000/110000)

Sampling: 100% (110000/110000), done.

Messages received during sampling:

  Gradient evaluation took 1.6e-05 seconds

  1000 transitions using 10 leapfrog steps per transition would take 0.16 seconds.

  Adjust your expectations accordingly!

  Gradient evaluation took 1.2e-05 seconds

  1000 transitions using 10 leapfrog steps per transition would take 0.12 seconds.

  Adjust your expectations accordingly!

  Gradient evaluation took 1e-05 seconds

  1000 transitions using 10 leapfrog steps per transition would take 0.1 seconds.

  Adjust your expectations accordingly!

  Gradient evaluation took 7e-06 seconds

  1000 transitions using 10 leapfrog steps per transition would take 0.07 seconds.

  Adjust your expectations accordingly!

  Gradient evaluation took 1e-06 seconds

  1000 transitions using 10 leapfrog steps per transition would take 0.01 seconds.

  Adjust your expectations accordingly!

  Gradient evaluation took 1e-06 seconds

  1000 transitions using 10 leapfrog steps per transition would take 0.01 seconds.

  Adjust your expectations accordingly!

  Gradient evaluation took 1e-06 seconds

  1000 transitions using 10 leapfrog steps per transition would take 0.01 seconds.

  Adjust your expectations accordingly!

  Gradient evaluation took 2e-06 seconds

  1000 transitions using 10 leapfrog steps per transition would take 0.02 seconds.

  Adjust your expectations accordingly!

  Gradient evaluation took 1e-06 seconds

  1000 transitions using 10 leapfrog steps per transition would take 0.01 seconds.

  Adjust your expectations accordingly!

  Gradient evaluation took 1e-06 seconds

  1000 transitions using 10 leapfrog steps per transition would take 0.01 seconds.

  Adjust your expectations accordingly!

parameters	lp__	accept_stat__	stepsize__	treedepth__	n_leapfrog__	divergent__	energy__	p
draws
0	-4.916791	0.878296	0.973668	2.0	3.0	0.0	5.638045	0.527176
1	-4.788788	0.983347	1.055563	2.0	3.0	0.0	4.864940	0.452960
2	-4.787570	1.000000	1.063648	1.0	3.0	0.0	4.929972	0.451125
3	-4.805306	0.959883	1.121603	1.0	3.0	0.0	6.513737	0.470618
4	-5.652753	1.000000	0.983822	1.0	1.0	0.0	5.940693	0.672167
...	...	...	...	...	...	...	...	...
99995	-5.967900	0.724627	1.014034	1.0	3.0	0.0	6.733939	0.708839
99996	-4.929901	0.980139	1.146752	1.0	1.0	0.0	4.949786	0.531799
99997	-5.340393	0.849165	0.929210	1.0	3.0	0.0	6.164319	0.626200
99998	-4.804075	0.801233	0.934480	2.0	3.0	0.0	5.923040	0.388224
99999	-4.780478	0.949646	0.955647	2.0	3.0	0.0	5.171340	0.431489

100000 rows × 8 columns

MCMCで生成した乱数の分布は次のようになった

../../_images/c22e87b73f03eef3d1e770c3f2e72fa74563f40147f2f1dce9beccefe6dca5b8.png

例#

あるゲームのガチャを1000回引いた結果、「外れ」と「当たり」の回数が以下のようになった。このガチャの「当たり」の確率はいくつか。

	外れ	当たり
回数	892	108

モーメント法による推定#

ベルヌーイ分布 $B e r (p)$ の平均が成功確率 $p$ なので、 $\hat{p} = \sum_{i = 1}^{n} X_{i} / n$

# モーメント法による推定
n = len(x)
sum(x) / n

0.108

最尤法による推定#

ベルヌーイ分布

$X_{i} = 1$ となる確率が $p$ のベルヌーイ分布 $B e r (p)$ に従った確率変数の実現値がサンプルだとしたとき、 $r = \sum X_{i}$ とすると、その尤度関数は

L (p | X) = p^{r} (1 - p)^{n - r}

であり、対数尤度関数は

\log L (p | X) = r \log p + (n - r) \log (1 - p)

となる。

今回のサンプルのもとでは次の図のような曲線になる。（グレーの縦線は最尤推定値のpを示す）

../../_images/20c35bfddbb434e23d7595ac85f1813b16183ab6725ca2ff3090e32a751354af.png

対数尤度の導関数は

\frac{\partial \log L (p | X)}{\partial p} = \frac{r}{p} - \frac{n - r}{1 - p}

で、これを0とおいて $p$ について解くと

p = \frac{r}{n} = \frac{\sum X_{i}}{n}

となり、モーメント法と同じ結果になる。

（参考）式展開

\frac{\partial \log L (p | X)}{\partial p} = \frac{r}{p} - \frac{n - r}{1 - p} = 0

なので

\frac{r}{p} = \frac{n - r}{1 - p}

から

\frac{p}{1 - p} = \frac{r}{n - r}

なので

p = \frac{r (1 - p)}{n - r} = \frac{r (1 - p)}{r (\frac{n}{r} - 1)} = \frac{1 - p}{\frac{n}{r} - 1}

これは

p (\frac{n}{r} - 1) = 1 - p

なので

p \frac{n}{r} = 1

ゆえに

p = \frac{r}{n} = \frac{\sum X_{i}}{n}

解析的に解くことができない場合は勾配降下法などで数値的に解く。

ベイズ推定#

ベイズ推定ではパラメータの確率分布を推定するため、点推定を行いたい場合はその分布の何らかの代表値（期待値や中央値や最頻値）を推定することになる。

以下ではStanを用い、無情報事前分布を使用して推定を行う。

参考資料

Stan モデリング言語: ユーザーガイド・リファレンスマニュアル

# nest_asyncio: asyncioを使うstanをjupyterで使うための対処
# [Stanによる推定例：ベルヌーイ分布のパラメータ - The One with ...](https://hamada.hatenablog.jp/entry/2017/06/28/100815)
import nest_asyncio
nest_asyncio.apply()

import stan


stan_code = """
data {
    int N;
    array[N] int X;
}
parameters {
    real<lower=0, upper=1> p;
}
model {
    for (i in 1:N) 
        X[i] ~ bernoulli(p);
}
"""

data = {
    "N": len(x),
    "X": list(x),
}

posterior = stan.build(stan_code, data=data, random_seed=1)
fit = posterior.sample(num_chains=10, num_samples=10000)
df = fit.to_frame()