ordinalcorr.biserial

ordinalcorr.biserial(x: Sequence[float | int] | ndarray[float | int], y: Sequence[int] | ndarray[int]) float[source]

Compute the biserial correlation coefficient between a continuous variable x and a dichotomized variable y (0 or 1), assuming y was split from a latent continuous variable.

Parameters:
  • x (array-like) – Continuous variable.

  • y (array-like) – Dichotomous variable (0 and 1), assumed to be derived from a latent continuous variable.

Returns:

Biserial correlation coefficient.

Return type:

float

Examples

>>> from ordinalcorr import biserial
>>> x = [0.1, 0.2, 0.3, 0.4, 0.5]
>>> y = [0, 0, 1, 1, 1]
>>> biserial(x, y)
Details:

The biserial correlation coefficient is defined as:

\[r_{b} = r_{pb} \cdot \frac{\sqrt{p (1 - p)}}{\phi(z)} = \frac{\bar{X}_1 - \bar{X}_0}{s_X} \cdot \frac{p (1 - p)}{\phi(z)}\]

where

  • \(r_{pb}\) is the point-biserial correlation coefficient

  • \(\bar{X}_1\) and \(\bar{X}_0\) are the means of the continuous variable for the two categories of the dichotomous variable
    • \(\bar{X}_1 = \frac{1}{n_1} \sum_{i:Y_i = 1} X_i, \quad n_1 = |\{i: Y_i = 1\}|\)

    • \(\bar{X}_0 = \frac{1}{n_0} \sum_{i:Y_i = 0} X_i, \quad n_0 = |\{i: Y_i = 0\}|\)

  • \(s_X\) is the standard deviation of the continuous variable
    • \(s_X = \sqrt{\frac{1}{n} \sum_{i=1}^{n} (X_i - \bar{X})^2}, \quad n = n_1 + n_0\)

  • \(p\) is the proportion of Y = 1 of the dichotomous variable
    • \(p = \frac{n_1}{n}\)

  • \(\phi(z)\) is the probability density function of the standard normal distribution

  • \(z\) is the percentile of \(p\) in the standard normal distribution