ordinalcorr.hetcor

ordinalcorr.hetcor(data: DataFrame, n_unique: int = 20) DataFrame[source]

Estimate the heterogeneous correlation matrix.

The heterogeneous correlation matrix includes:

  • Pearson product-moment correlations between continuous variables

  • Polychoric correlations between ordinal variables

  • Polyserial correlations between continuous and ordinal variables

Parameters:
  • data (pd.DataFrame) –

    A DataFrame containing continuous and/or ordinal variables. Appropriate correlation coefficients are automatically selected based on the types of variables.

    • Columns with dtype float are treated as continuous variables.

    • Columns with dtype int and number of unique values less than or equal to n_unique are treated as ordinal variables.

    • Columns with dtype category are treated as ordinal variables if they are ordered.

  • n_unique (int, default=20) – The maximum number of unique values for an integer column to be considered ordinal. If the number of unique values exceeds n_unique, the column is treated as continuous.

Returns:

Estimated heterogeneous correlation matrix.

Return type:

pd.DataFrame

Examples

>>> from ordinalcorr import hetcor
>>> import pandas as pd
>>> data = pd.DataFrame({
...     "continuous": [0.1, 0.1, 0.2, 0.2, 0.3, 0.3],
...     "ordinal": [0, 0, 0, 1, 1, 2],
... })
>>> hetcor(data)