ordinalcorr.biserial¶

ordinalcorr.biserial(x: Sequence[float | int] | ndarray[float | int], y: Sequence[int] | ndarray[int]) → float[source]¶

Compute the biserial correlation coefficient between a continuous variable x and a dichotomized variable y (0 or 1), assuming y was split from a latent continuous variable.

Parameters:

x (array-like) – Continuous variable.
y (array-like) – Dichotomous variable (0 and 1), assumed to be derived from a latent continuous variable.

Returns:

Biserial correlation coefficient.

Return type:

float

Examples

>>> from ordinalcorr import biserial
>>> x = [0.1, 0.2, 0.3, 0.4, 0.5]
>>> y = [0, 0, 1, 1, 1]
>>> biserial(x, y)

Details:

The biserial correlation coefficient is defined as:

\[r_{b} = r_{pb} \cdot \frac{\sqrt{p (1 - p)}}{\phi(z)} = \frac{\bar{X}_1 - \bar{X}_0}{s_X} \cdot \frac{p (1 - p)}{\phi(z)}\]

where

\(r_{pb}\) is the point-biserial correlation coefficient
\(\bar{X}_1\) and \(\bar{X}_0\) are the means of the continuous variable for the two categories of the dichotomous variable
- \(\bar{X}_1 = \frac{1}{n_1} \sum_{i:Y_i = 1} X_i, \quad n_1 = |\{i: Y_i = 1\}|\)
- \(\bar{X}_0 = \frac{1}{n_0} \sum_{i:Y_i = 0} X_i, \quad n_0 = |\{i: Y_i = 0\}|\)
\(s_X\) is the standard deviation of the continuous variable
- \(s_X = \sqrt{\frac{1}{n} \sum_{i=1}^{n} (X_i - \bar{X})^2}, \quad n = n_1 + n_0\)
\(p\) is the proportion of Y = 1 of the dichotomous variable
- \(p = \frac{n_1}{n}\)
\(\phi(z)\) is the probability density function of the standard normal distribution
\(z\) is the percentile of \(p\) in the standard normal distribution