Weighted predictions (xcolumns.weighted_prediction)

xcolumns.weighted_prediction module provides the methods for calculating the weighted prediction for each instance based on the conditional probabilities of labels. The main function of the module is predict_weighted_per_instance().

xcolumns.weighted_prediction.predict_weighted_per_instance(y_proba: ndarray | csr_matrix, k: int, th: float = 0.0, a: ndarray | None = None, b: ndarray | None = None, dtype: dtype | None = None, return_meta: bool = False) ndarray | csr_matrix | Tuple[ndarray | csr_matrix, dict][source]

Returns the weighted prediction for each instance (row) in a provided matrix of conditional probabilities estimates of labels \(\boldsymbol{H}\), (y_proba), where each element \(\eta_{ij} = P(y_j|x_i)\) is the probability of the label \(j\) for the instance \(i\), The gains vector \(\boldsymbol{g}\) is calculated for each instance \(i\) as follows:

\[\boldsymbol{g} = \boldsymbol{a} \odot \boldsymbol{\eta}_i + \boldsymbol{b}\]

If k is larger than 0, the top k labels with the highest gains are selected for the instance. If k is 0, then the labels with gains higher than th are selected for the instance.

Parameters:
  • y_proba – A 2D matrix of conditional probabilities for each label of shape (n, m).

  • k – The number of labels to predict for each instance.

  • th – The single number threshold or a vector of thresholds for the gains. Only used if k is 0. If a vector, it needs to be a size of number of columns of y_proba (m).

  • a – The vector of slopes (coefficients) \(\boldsymbol{a}\) used for calculating gains. It needs to be a size of number of columns of y_proba (m). If equal to None, then \(\boldsymbol{a} = \boldsymbol{1}\).

  • b – The vector of intercepts (constants) \(\boldsymbol{b}\) used for calculating gains. It needs to be a size of number of columns of y_proba (m). If equal to None, then \(\boldsymbol{b} = \boldsymbol{0}\).

  • dtype – The data type for the output matrix. If equal to None, the data type of y_proba will be used.

  • return_meta – Whether to return meta data.

Returns:

The binary prediction matrix – the shape and type of the matrix is the same as y_proba. If return_meta is True, additionally, a dictionary is returned, that contains the time taken to calculate the prediction.

Prediction strategies based on weighted predictions

Based on predict_weighted_per_instance() function the module provides few additional functions for calculating the predictions that are optimal for some specific metrics or arbitrary upweight labels with smaller prior probabilities.

xcolumns.weighted_prediction.predict_log_weighted_per_instance(y_proba: ndarray | csr_matrix, k: int, priors: ndarray, epsilon: float = 1e-06, dtype: dtype | None = None, return_meta: bool = False) ndarray | csr_matrix | Tuple[ndarray | csr_matrix, dict][source]

Predicts the top k labels for each instance (row) in provided matrix of conditional probabilities estimates of labels \(\eta\) (y_proba) according to the log law weighting scheme:

\[a = -\log \pi\]

where \(\pi\) (priors) is the prior probability of each label, \(\beta\) (beta) is power parameter and \(\epsilon\) (epsilon) is a small value to avoid domain error.

It is equivalent to calling predict_weighted_per_instance(y_proba, k=k, a=-log(priors + epsilon), return_meta=return_meta).

Parameters:
  • y_proba – A 2D matrix of conditional probabilities for each label.

  • k – The number of labels to predict for each instance.

  • priors – The prior probabilities for each label.

  • epsilon – A small value to avoid domain error.

  • dtype – The data type for the output matrix, if equal to None, the data type of y_proba will be used.

  • return_meta – Whether to return meta data.

Result

The binary prediction matrix: with exactly k labels in each row, the shape and type of the matrix is the same as y_proba. If return_meta is True, additionally, a dictionary is returned, that contains the time taken to calculate the prediction.

xcolumns.weighted_prediction.predict_optimizing_instance_precision(y_proba: ndarray | csr_matrix, k: int, return_meta: bool = False) ndarray | csr_matrix | Tuple[ndarray | csr_matrix, dict][source]

Predicts the top k labels for each instance (row) in a provided matrix of conditional probabilities estimates of labels \(\eta\) (y_proba). This is optimal inference strategy for precision at k and nDCG at k.

Parameters:
  • y_proba – A 2D matrix of conditional probabilities for each label.

  • k – The number of labels to predict for each instance.

  • priors – The prior probabilities for each label.

  • beta – The power parameter.

  • epsilon – A small value to avoid division by zero when calculating inverse of priors.

  • return_meta – Whether to return meta data.

Result

The binary prediction matrix: with exactly k labels in each row, the shape and type of the matrix is the same as y_proba. If return_meta is True, additionally, a dictionary is returned, that contains the time taken to calculate the prediction.

xcolumns.weighted_prediction.predict_optimizing_instance_propensity_scored_precision(y_proba: ndarray | csr_matrix, k: int, inverse_propensities: ndarray | None = None, propensities: ndarray | None = None, return_meta: bool = False) ndarray | csr_matrix | Tuple[ndarray | csr_matrix, dict][source]
xcolumns.weighted_prediction.predict_optimizing_macro_balanced_accuracy(y_proba: ndarray | csr_matrix, k: int, priors: ndarray, epsilon: float = 1e-06, dtype: dtype | None = None, return_meta: bool = False) ndarray | csr_matrix | Tuple[ndarray | csr_matrix, dict][source]

Predicts the k labels for each instance (row) in a provided matrix of conditional probabilities estimates of labels \(\eta\) (y_proba), such that the prediction at k optimizes macro-averaged balanced accuracy for the population with the given prior probabilities of labels (priors).

Parameters:
  • y_proba – A 2D matrix of conditional probabilities for each label.

  • k – The number of labels to predict for each instance.

  • priors – The prior probabilities for each label.

  • epsilon – A small value to avoid division by zero when calculating inverse of priors.

  • dtype – The data type for the output matrix, if equal to None, the data type of y_proba will be used.

  • return_meta – Whether to return meta data.

Returns:

The binary prediction matrix – the shape and type of the matrix is the same as y_proba. If return_meta is True, additionally, a dictionary is returned, that contains the time taken to calculate the prediction.

xcolumns.weighted_prediction.predict_optimizing_macro_recall(y_proba: ndarray | csr_matrix, k: int, priors: ndarray, epsilon: float = 1e-06, dtype: dtype | None = None, return_meta: bool = False) ndarray | csr_matrix | Tuple[ndarray | csr_matrix, dict][source]

Predicts the k labels for each instance (row) in a provided matrix of conditional probabilities estimates of labels \(\eta\) (y_proba), such that the prediction optimizes macro-averaged recall for the population with the given prior probabilities of labels (priors). It is equivalent to calling predict_weighted_per_instance(y_proba, k=k, a=1.0 / (priors + epsilon), return_meta=return_meta).

Parameters:
  • y_proba – A 2D matrix of conditional probabilities for each label.

  • k – The number of labels to predict for each instance.

  • priors – The prior probabilities for each label.

  • epsilon – A small value to avoid division by zero when calculating inverse of priors.

  • dtype – The data type for the output matrix, if equal to None, the data type of y_proba will be used.

  • return_meta – Whether to return metadata.

Returns:

The binary prediction matrix – with exactly k labels in each row, the shape and type of the matrix is the same as y_proba. If return_meta is True, additionally, a dictionary is returned, that contains the time taken to calculate the prediction.

xcolumns.weighted_prediction.predict_power_law_weighted_per_instance(y_proba: ndarray | csr_matrix, k: int, priors: ndarray, beta: float, epsilon: float = 1e-06, return_meta: bool = False) ndarray | csr_matrix | Tuple[ndarray | csr_matrix, dict][source]

Predicts the top k labels for each instance (row) in provided matrix of conditional probabilities estimates of labels \(\eta\) (y_proba) according to the power law weighting scheme:

\[a = (\pi + \epsilon)^{-\beta}\]

where \(\pi\) (priors) is the prior probability of each label, \(\beta\) (beta) is power parameter and \(\epsilon\) (epsilon) is a small value to avoid division by zero.

It is equivalent to calling predict_weighted_per_instance(y_proba, k=k, a=(priors + epsilon) ** -beta, return_meta=return_meta).

Parameters:
  • y_proba – A 2D matrix of conditional probabilities for each label.

  • k – The number of labels to predict for each instance.

  • priors – The prior probabilities for each label.

  • beta – The power parameter.

  • epsilon – A small value to avoid division by zero when calculating inverse of priors.

  • return_meta – Whether to return meta data.

Result

The binary prediction matrix: with exactly k labels in each row, the shape and type of the matrix is the same as y_proba. If return_meta is True, additionally, a dictionary is returned, that contains the time taken to calculate the prediction.

xcolumns.weighted_prediction.predict_top_k(y_proba: ndarray | csr_matrix, k: int, return_meta: bool = False) ndarray | csr_matrix | Tuple[ndarray | csr_matrix, dict][source]

Predicts the top k labels for each instance (row) in a provided matrix of conditional probabilities estimates of labels \(\eta\) (y_proba). This is optimal inference strategy for precision at k and nDCG at k. It is equivalent to calling predict_weighted_per_instance(y_proba, k=k, a=None, b=None, return_meta=return_meta).

Parameters:
  • y_proba – A 2D matrix of conditional probabilities for each label.

  • k – The number of labels to predict for each instance.

  • return_meta – Whether to return metadata. Defaults to False.

Returns:

The binary prediction matrix – with exactly k labels in each row, the shape and type of the matrix is the same as y_proba. If return_meta is True, additionally, a dictionary is returned, that contains the time taken to calculate the prediction.