ordinalcorr.point_biserial¶

ordinalcorr.point_biserial(x: Sequence[T] | ndarray[T], y: Sequence[T] | ndarray[T]) → float[source]¶

Compute the point-biserial correlation between a continuous variable x and a dichotomous variable y (0 or 1), assuming y is a true dichotomous variable.

Parameters:

x (array-like) – Continuous variable.
y (array-like) – Dichotomous variable (0 and 1).

Returns:

Point-biserial correlation coefficient.

Return type:

float

Examples

>>> from ordinalcorr import point_biserial
>>> x = [0.1, 0.2, 0.3, 0.4, 0.5]
>>> y = [0, 0, 1, 1, 1]
>>> point_biserial(x, y)

Details:

The point-biserial correlation coefficient is defined as:

\[r_{pb} = \frac{\bar{X}_1 - \bar{X}_0}{s_X} \sqrt{p \times (1 - p)}\]

where

\(\bar{X}_1\) and \(\bar{X}_0\) are the means of the continuous variable for the two categories of the dichotomous variable
- \(\bar{X}_1 = \frac{1}{n_1} \sum_{i:Y_i = 1} X_i, \quad n_1 = |\{i: Y_i = 1\}|\)
- \(\bar{X}_0 = \frac{1}{n_0} \sum_{i:Y_i = 0} X_i, \quad n_0 = |\{i: Y_i = 0\}|\)
\(s_X\) is the standard deviation of the continuous variable
- \(s_X = \sqrt{\frac{1}{n} \sum_{i=1}^{n} (X_i - \bar{X})^2}, \quad n = n_1 + n_0\)
\(p\) is the proportion of Y = 1 of the dichotomous variable
- \(p = \frac{n_1}{n}\)

Note that the point-biserial correlation coefficient is equivalent to the Pearson correlation coefficient between the continuous variable and the dichotomous variable.

References