ordinalcorr.point_biserial¶
- ordinalcorr.point_biserial(x: Sequence[T] | ndarray[T], y: Sequence[T] | ndarray[T]) float[source]¶
Compute the point-biserial correlation between a continuous variable x and a dichotomous variable y (0 or 1), assuming y is a true dichotomous variable.
- Parameters:
x (array-like) – Continuous variable.
y (array-like) – Dichotomous variable (0 and 1).
- Returns:
Point-biserial correlation coefficient.
- Return type:
float
Examples
>>> from ordinalcorr import point_biserial >>> x = [0.1, 0.2, 0.3, 0.4, 0.5] >>> y = [0, 0, 1, 1, 1] >>> point_biserial(x, y)
- Details:
The point-biserial correlation coefficient is defined as:
\[r_{pb} = \frac{\bar{X}_1 - \bar{X}_0}{s_X} \sqrt{p \times (1 - p)}\]where
- \(\bar{X}_1\) and \(\bar{X}_0\) are the means of the continuous variable for the two categories of the dichotomous variable
\(\bar{X}_1 = \frac{1}{n_1} \sum_{i:Y_i = 1} X_i, \quad n_1 = |\{i: Y_i = 1\}|\)
\(\bar{X}_0 = \frac{1}{n_0} \sum_{i:Y_i = 0} X_i, \quad n_0 = |\{i: Y_i = 0\}|\)
- \(s_X\) is the standard deviation of the continuous variable
\(s_X = \sqrt{\frac{1}{n} \sum_{i=1}^{n} (X_i - \bar{X})^2}, \quad n = n_1 + n_0\)
- \(p\) is the proportion of Y = 1 of the dichotomous variable
\(p = \frac{n_1}{n}\)
Note that the point-biserial correlation coefficient is equivalent to the Pearson correlation coefficient between the continuous variable and the dichotomous variable.
References