ordinalcorr.biserial¶
- ordinalcorr.biserial(x: Sequence[float | int] | ndarray[float | int], y: Sequence[int] | ndarray[int]) float[source]¶
Compute the biserial correlation coefficient between a continuous variable x and a dichotomized variable y (0 or 1), assuming y was split from a latent continuous variable.
- Parameters:
x (array-like) – Continuous variable.
y (array-like) – Dichotomous variable (0 and 1), assumed to be derived from a latent continuous variable.
- Returns:
Biserial correlation coefficient.
- Return type:
float
Examples
>>> from ordinalcorr import biserial >>> x = [0.1, 0.2, 0.3, 0.4, 0.5] >>> y = [0, 0, 1, 1, 1] >>> biserial(x, y)
- Details:
The biserial correlation coefficient is defined as:
\[r_{b} = r_{pb} \cdot \frac{\sqrt{p (1 - p)}}{\phi(z)} = \frac{\bar{X}_1 - \bar{X}_0}{s_X} \cdot \frac{p (1 - p)}{\phi(z)}\]where
\(r_{pb}\) is the point-biserial correlation coefficient
- \(\bar{X}_1\) and \(\bar{X}_0\) are the means of the continuous variable for the two categories of the dichotomous variable
\(\bar{X}_1 = \frac{1}{n_1} \sum_{i:Y_i = 1} X_i, \quad n_1 = |\{i: Y_i = 1\}|\)
\(\bar{X}_0 = \frac{1}{n_0} \sum_{i:Y_i = 0} X_i, \quad n_0 = |\{i: Y_i = 0\}|\)
- \(s_X\) is the standard deviation of the continuous variable
\(s_X = \sqrt{\frac{1}{n} \sum_{i=1}^{n} (X_i - \bar{X})^2}, \quad n = n_1 + n_0\)
- \(p\) is the proportion of Y = 1 of the dichotomous variable
\(p = \frac{n_1}{n}\)
\(\phi(z)\) is the probability density function of the standard normal distribution
\(z\) is the percentile of \(p\) in the standard normal distribution