Weighted predictions (`xcolumns.weighted_prediction`)

xcolumns.weighted_prediction module provides the methods for calculating the weighted prediction for each instance based on the conditional probabilities of labels. The main function of the module is predict_weighted_per_instance().

xcolumns.weighted_prediction.predict_weighted_per_instance(y_proba: ndarray | csr_matrix, k: int, th: float = 0.0, a: ndarray | None = None, b: ndarray | None = None, dtype: dtype | None = None, keep_scores: bool = False, return_meta: bool = False, return_weights: bool = False) → ndarray | csr_matrix | Tuple[ndarray | csr_matrix, dict][source]

Returns the weighted prediction for each instance (row) in a provided matrix of conditional probabilities estimates of labels \(\boldsymbol{H}\), (y_proba), where each element \(\eta_{ij} = P(y_j|x_i)\) is the probability of the label \(j\) for the instance \(i\), The gains vector \(\boldsymbol{g}\) is calculated for each instance \(i\) as follows:

\[\boldsymbol{g} = \boldsymbol{a} \odot \boldsymbol{\eta}_i + \boldsymbol{b}\]

If k is larger than 0, the top k labels with the highest gains are selected for the instance. If k is 0, then the labels with gains higher than th are selected for the instance.

Parameters:

y_proba – A 2D matrix of conditional probabilities for each label of shape (n, m).
k – The number of labels to predict for each instance.
th – The single number threshold or a vector of thresholds for the gains. Only used if k is 0. If a vector, it needs to be a size of number of columns of y_proba (m).
a – The vector of slopes (coefficients) \(\boldsymbol{a}\) used for calculating gains. It needs to be a size of number of columns of y_proba (m). If equal to None, then \(\boldsymbol{a} = \boldsymbol{1}\).
b – The vector of intercepts (constants) \(\boldsymbol{b}\) used for calculating gains. It needs to be a size of number of columns of y_proba (m). If equal to None, then \(\boldsymbol{b} = \boldsymbol{0}\).
keep_scores – Whether to keep the scores in the output prediction matrix instead of 1.
dtype – The data type for the output matrix. If equal to None, the data type of y_proba will be used.
return_meta – Whether to return meta data.

Returns:

The binary prediction matrix – the shape and type of the matrix is the same as y_proba. If return_meta is True, additionally, a dictionary is returned, that contains the time taken to calculate the prediction.

Prediction strategies based on weighted predictions

Based on predict_weighted_per_instance() function the module provides few additional functions for calculating the predictions that are optimal for some specific metrics or arbitrary upweight labels with smaller prior probabilities.

xcolumns.weighted_prediction.predict_log_weighted_per_instance(y_proba: ndarray | csr_matrix, k: int, priors: ndarray, epsilon: float = 1e-09, keep_scores: bool = False, dtype: dtype | None = None, return_meta: bool = False, return_weights: bool = False) → ndarray | csr_matrix | Tuple[ndarray | csr_matrix, dict][source]

Predicts the top k labels for each instance (row) in provided matrix of conditional probabilities estimates of labels \(\eta\) (y_proba) according to the log law weighting scheme:

\[a = -\log (\pi + \epsilon)\]

where \(\pi\) (priors) is the prior probability of each label, \(\beta\) (beta) is power parameter and \(\epsilon\) (epsilon) is a small value to avoid domain error.

It is equivalent to calling predict_weighted_per_instance(y_proba, k=k, a=-log(priors + epsilon), return_meta=return_meta).

Parameters:

y_proba – A 2D matrix of conditional probabilities for each label.
k – The number of labels to predict for each instance.
priors – The prior probabilities for each label.
epsilon – A small value to avoid domain error.
keep_scores – Whether to keep the scores in the output prediction matrix instead of 1.
dtype – The data type for the output matrix, if equal to None, the data type of y_proba will be used.
return_meta – Whether to return meta data.

Result: The binary prediction matrix: with exactly k labels in each row, the shape and type of the matrix is the same as y_proba. If return_meta is True, additionally, a dictionary is returned, that contains the time taken to calculate the prediction.

xcolumns.weighted_prediction.predict_optimizing_instance_precision(y_proba: ndarray | csr_matrix, k: int, keep_scores: bool = False, dtype: dtype | None = None, return_meta: bool = False) → ndarray | csr_matrix | Tuple[ndarray | csr_matrix, dict][source]

Predicts the top k labels for each instance (row) in a provided matrix of conditional probabilities estimates of labels \(\eta\) (y_proba). This is optimal inference strategy for precision at k and nDCG at k.

Parameters:

y_proba – A 2D matrix of conditional probabilities for each label.
k – The number of labels to predict for each instance.
priors – The prior probabilities for each label.
beta – The power parameter.
epsilon – A small value to avoid division by zero when calculating inverse of priors.
keep_scores – Whether to keep the scores in the output prediction matrix instead of 1.
dtype – The data type for the output matrix, if equal to None, the data type of y_proba will be used.
return_meta – Whether to return meta data.

Result: The binary prediction matrix: with exactly k labels in each row, the shape and type of the matrix is the same as y_proba. If return_meta is True, additionally, a dictionary is returned, that contains the time taken to calculate the prediction.

xcolumns.weighted_prediction.predict_optimizing_instance_propensity_scored_precision(y_proba: ndarray | csr_matrix, k: int, inverse_propensities: ndarray | None = None, propensities: ndarray | None = None, keep_scores: bool = False, dtype: dtype | None = None, return_meta: bool = False, return_weights: bool = False) → ndarray | csr_matrix | Tuple[ndarray | csr_matrix, dict][source]: Predicts the top k labels for each instance (row) in provided matrix of conditional probabilities estimates of labels \(\eta\) (y_proba) weighted by provided inverse propensity scores (inverse_propensities).

xcolumns.weighted_prediction.predict_optimizing_macro_balanced_accuracy(y_proba: ndarray | csr_matrix, k: int, priors: ndarray, epsilon: float = 1e-06, dtype: dtype | None = None, return_meta: bool = False) → ndarray | csr_matrix | Tuple[ndarray | csr_matrix, dict][source]

Predicts the k labels for each instance (row) in a provided matrix of conditional probabilities estimates of labels \(\eta\) (y_proba), such that the prediction at k optimizes macro-averaged balanced accuracy for the population with the given prior probabilities of labels (priors).

Parameters:

y_proba – A 2D matrix of conditional probabilities for each label.
k – The number of labels to predict for each instance.
priors – The prior probabilities for each label.
epsilon – A small value to avoid division by zero when calculating inverse of priors.
dtype – The data type for the output matrix, if equal to None, the data type of y_proba will be used.
return_meta – Whether to return meta data.

Returns:

The binary prediction matrix – the shape and type of the matrix is the same as y_proba. If return_meta is True, additionally, a dictionary is returned, that contains the time taken to calculate the prediction.

xcolumns.weighted_prediction.predict_optimizing_macro_recall(y_proba: ndarray | csr_matrix, k: int, priors: ndarray, epsilon: float = 1e-06, keep_scores: bool = False, dtype: dtype | None = None, return_meta: bool = False, return_weights: bool = False) → ndarray | csr_matrix | Tuple[ndarray | csr_matrix, dict][source]

Predicts the k labels for each instance (row) in a provided matrix of conditional probabilities estimates of labels \(\eta\) (y_proba), such that the prediction optimizes macro-averaged recall for the population with the given prior probabilities of labels (priors). It is equivalent to calling predict_weighted_per_instance(y_proba, k=k, a=1.0 / (priors + epsilon), return_meta=return_meta).

Parameters:

y_proba – A 2D matrix of conditional probabilities for each label.
k – The number of labels to predict for each instance.
priors – The prior probabilities for each label.
epsilon – A small value to avoid division by zero when calculating inverse of priors.
keep_scores – Whether to keep the scores in the output prediction matrix instead of 1.
dtype – The data type for the output matrix, if equal to None, the data type of y_proba will be used.
return_meta – Whether to return metadata.

Returns:

The binary prediction matrix – with exactly k labels in each row, the shape and type of the matrix is the same as y_proba. If return_meta is True, additionally, a dictionary is returned, that contains the time taken to calculate the prediction.

xcolumns.weighted_prediction.predict_power_law_weighted_per_instance(y_proba: ndarray | csr_matrix, k: int, priors: ndarray, beta: float, epsilon: float = 1e-09, keep_scores: bool = False, dtype: dtype | None = None, return_meta: bool = False, return_weights: bool = False) → ndarray | csr_matrix | Tuple[ndarray | csr_matrix, dict][source]

Predicts the top k labels for each instance (row) in provided matrix of conditional probabilities estimates of labels \(\eta\) (y_proba) according to the power law weighting scheme:

\[a = (\pi + \epsilon)^{-\beta}\]

where \(\pi\) (priors) is the prior probability of each label, \(\beta\) (beta) is power parameter and \(\epsilon\) (epsilon) is a small value to avoid division by zero.

It is equivalent to calling predict_weighted_per_instance(y_proba, k=k, a=(priors + epsilon) ** -beta, return_meta=return_meta).

Parameters:

y_proba – A 2D matrix of conditional probabilities for each label.
k – The number of labels to predict for each instance.
priors – The prior probabilities for each label.
beta – The power parameter.
epsilon – A small value to avoid division by zero when calculating inverse of priors.
keep_scores – Whether to keep the scores in the output prediction matrix instead of 1.
dtype – The data type for the output matrix, if equal to None, the data type of y_proba will be used.
return_meta – Whether to return meta data.

Result: The binary prediction matrix: with exactly k labels in each row, the shape and type of the matrix is the same as y_proba. If return_meta is True, additionally, a dictionary is returned, that contains the time taken to calculate the prediction.

Predicts the top k labels for each instance (row) in a provided matrix of conditional probabilities estimates of labels \(\eta\) (y_proba). This is optimal inference strategy for precision at k and nDCG at k. It is equivalent to calling predict_weighted_per_instance(y_proba, k=k, a=None, b=None, return_meta=return_meta).

Parameters:

y_proba – A 2D matrix of conditional probabilities for each label.
k – The number of labels to predict for each instance.
keep_scores – Whether to keep the scores in the output prediction matrix instead of 1.
dtype – The data type for the output matrix, if equal to None, the data type of y_proba will be used.
return_meta – Whether to return metadata. Defaults to False.

Returns:

The binary prediction matrix – with exactly k labels in each row, the shape and type of the matrix is the same as y_proba. If return_meta is True, additionally, a dictionary is returned, that contains the time taken to calculate the prediction.

Weighted predictions (xcolumns.weighted_prediction)

Prediction strategies based on weighted predictions

Weighted predictions (`xcolumns.weighted_prediction`)