Weighted predictions (xcolumns.weighted_prediction
)
xcolumns.weighted_prediction
module provides the methods for calculating the weighted prediction for each instance based on the conditional probabilities of labels.
The main function of the module is predict_weighted_per_instance()
.
- xcolumns.weighted_prediction.predict_weighted_per_instance(y_proba: ndarray | csr_matrix, k: int, th: float = 0.0, a: ndarray | None = None, b: ndarray | None = None, dtype: dtype | None = None, return_meta: bool = False) ndarray | csr_matrix | Tuple[ndarray | csr_matrix, dict] [source]
Returns the weighted prediction for each instance (row) in a provided matrix of conditional probabilities estimates of labels \(\boldsymbol{H}\), (y_proba), where each element \(\eta_{ij} = P(y_j|x_i)\) is the probability of the label \(j\) for the instance \(i\), The gains vector \(\boldsymbol{g}\) is calculated for each instance \(i\) as follows:
\[\boldsymbol{g} = \boldsymbol{a} \odot \boldsymbol{\eta}_i + \boldsymbol{b}\]If k is larger than 0, the top k labels with the highest gains are selected for the instance. If k is 0, then the labels with gains higher than th are selected for the instance.
- Parameters:
y_proba – A 2D matrix of conditional probabilities for each label of shape (n, m).
k – The number of labels to predict for each instance.
th – The single number threshold or a vector of thresholds for the gains. Only used if k is 0. If a vector, it needs to be a size of number of columns of y_proba (m).
a – The vector of slopes (coefficients) \(\boldsymbol{a}\) used for calculating gains. It needs to be a size of number of columns of y_proba (m). If equal to None, then \(\boldsymbol{a} = \boldsymbol{1}\).
b – The vector of intercepts (constants) \(\boldsymbol{b}\) used for calculating gains. It needs to be a size of number of columns of y_proba (m). If equal to None, then \(\boldsymbol{b} = \boldsymbol{0}\).
dtype – The data type for the output matrix. If equal to None, the data type of y_proba will be used.
return_meta – Whether to return meta data.
- Returns:
The binary prediction matrix – the shape and type of the matrix is the same as y_proba. If return_meta is True, additionally, a dictionary is returned, that contains the time taken to calculate the prediction.
Prediction strategies based on weighted predictions
Based on predict_weighted_per_instance()
function the module provides few additional functions for calculating the predictions
that are optimal for some specific metrics or arbitrary upweight labels with smaller prior probabilities.
- xcolumns.weighted_prediction.predict_log_weighted_per_instance(y_proba: ndarray | csr_matrix, k: int, priors: ndarray, epsilon: float = 1e-06, dtype: dtype | None = None, return_meta: bool = False) ndarray | csr_matrix | Tuple[ndarray | csr_matrix, dict] [source]
Predicts the top k labels for each instance (row) in provided matrix of conditional probabilities estimates of labels \(\eta\) (y_proba) according to the log law weighting scheme:
\[a = -\log \pi\]where \(\pi\) (priors) is the prior probability of each label, \(\beta\) (beta) is power parameter and \(\epsilon\) (epsilon) is a small value to avoid domain error.
It is equivalent to calling
predict_weighted_per_instance(y_proba, k=k, a=-log(priors + epsilon), return_meta=return_meta)
.- Parameters:
y_proba – A 2D matrix of conditional probabilities for each label.
k – The number of labels to predict for each instance.
priors – The prior probabilities for each label.
epsilon – A small value to avoid domain error.
dtype – The data type for the output matrix, if equal to None, the data type of y_proba will be used.
return_meta – Whether to return meta data.
- Result
The binary prediction matrix: with exactly k labels in each row, the shape and type of the matrix is the same as y_proba. If return_meta is True, additionally, a dictionary is returned, that contains the time taken to calculate the prediction.
- xcolumns.weighted_prediction.predict_optimizing_instance_precision(y_proba: ndarray | csr_matrix, k: int, return_meta: bool = False) ndarray | csr_matrix | Tuple[ndarray | csr_matrix, dict] [source]
Predicts the top k labels for each instance (row) in a provided matrix of conditional probabilities estimates of labels \(\eta\) (y_proba). This is optimal inference strategy for precision at k and nDCG at k.
- Parameters:
y_proba – A 2D matrix of conditional probabilities for each label.
k – The number of labels to predict for each instance.
priors – The prior probabilities for each label.
beta – The power parameter.
epsilon – A small value to avoid division by zero when calculating inverse of priors.
return_meta – Whether to return meta data.
- Result
The binary prediction matrix: with exactly k labels in each row, the shape and type of the matrix is the same as y_proba. If return_meta is True, additionally, a dictionary is returned, that contains the time taken to calculate the prediction.
- xcolumns.weighted_prediction.predict_optimizing_instance_propensity_scored_precision(y_proba: ndarray | csr_matrix, k: int, inverse_propensities: ndarray | None = None, propensities: ndarray | None = None, return_meta: bool = False) ndarray | csr_matrix | Tuple[ndarray | csr_matrix, dict] [source]
- xcolumns.weighted_prediction.predict_optimizing_macro_balanced_accuracy(y_proba: ndarray | csr_matrix, k: int, priors: ndarray, epsilon: float = 1e-06, dtype: dtype | None = None, return_meta: bool = False) ndarray | csr_matrix | Tuple[ndarray | csr_matrix, dict] [source]
Predicts the k labels for each instance (row) in a provided matrix of conditional probabilities estimates of labels \(\eta\) (y_proba), such that the prediction at k optimizes macro-averaged balanced accuracy for the population with the given prior probabilities of labels (priors).
- Parameters:
y_proba – A 2D matrix of conditional probabilities for each label.
k – The number of labels to predict for each instance.
priors – The prior probabilities for each label.
epsilon – A small value to avoid division by zero when calculating inverse of priors.
dtype – The data type for the output matrix, if equal to None, the data type of y_proba will be used.
return_meta – Whether to return meta data.
- Returns:
The binary prediction matrix – the shape and type of the matrix is the same as y_proba. If return_meta is True, additionally, a dictionary is returned, that contains the time taken to calculate the prediction.
- xcolumns.weighted_prediction.predict_optimizing_macro_recall(y_proba: ndarray | csr_matrix, k: int, priors: ndarray, epsilon: float = 1e-06, dtype: dtype | None = None, return_meta: bool = False) ndarray | csr_matrix | Tuple[ndarray | csr_matrix, dict] [source]
Predicts the k labels for each instance (row) in a provided matrix of conditional probabilities estimates of labels \(\eta\) (y_proba), such that the prediction optimizes macro-averaged recall for the population with the given prior probabilities of labels (priors). It is equivalent to calling
predict_weighted_per_instance(y_proba, k=k, a=1.0 / (priors + epsilon), return_meta=return_meta)
.- Parameters:
y_proba – A 2D matrix of conditional probabilities for each label.
k – The number of labels to predict for each instance.
priors – The prior probabilities for each label.
epsilon – A small value to avoid division by zero when calculating inverse of priors.
dtype – The data type for the output matrix, if equal to None, the data type of y_proba will be used.
return_meta – Whether to return metadata.
- Returns:
The binary prediction matrix – with exactly k labels in each row, the shape and type of the matrix is the same as y_proba. If return_meta is True, additionally, a dictionary is returned, that contains the time taken to calculate the prediction.
- xcolumns.weighted_prediction.predict_power_law_weighted_per_instance(y_proba: ndarray | csr_matrix, k: int, priors: ndarray, beta: float, epsilon: float = 1e-06, return_meta: bool = False) ndarray | csr_matrix | Tuple[ndarray | csr_matrix, dict] [source]
Predicts the top k labels for each instance (row) in provided matrix of conditional probabilities estimates of labels \(\eta\) (y_proba) according to the power law weighting scheme:
\[a = (\pi + \epsilon)^{-\beta}\]where \(\pi\) (priors) is the prior probability of each label, \(\beta\) (beta) is power parameter and \(\epsilon\) (epsilon) is a small value to avoid division by zero.
It is equivalent to calling
predict_weighted_per_instance(y_proba, k=k, a=(priors + epsilon) ** -beta, return_meta=return_meta)
.- Parameters:
y_proba – A 2D matrix of conditional probabilities for each label.
k – The number of labels to predict for each instance.
priors – The prior probabilities for each label.
beta – The power parameter.
epsilon – A small value to avoid division by zero when calculating inverse of priors.
return_meta – Whether to return meta data.
- Result
The binary prediction matrix: with exactly k labels in each row, the shape and type of the matrix is the same as y_proba. If return_meta is True, additionally, a dictionary is returned, that contains the time taken to calculate the prediction.
- xcolumns.weighted_prediction.predict_top_k(y_proba: ndarray | csr_matrix, k: int, return_meta: bool = False) ndarray | csr_matrix | Tuple[ndarray | csr_matrix, dict] [source]
Predicts the top k labels for each instance (row) in a provided matrix of conditional probabilities estimates of labels \(\eta\) (y_proba). This is optimal inference strategy for precision at k and nDCG at k. It is equivalent to calling
predict_weighted_per_instance(y_proba, k=k, a=None, b=None, return_meta=return_meta)
.- Parameters:
y_proba – A 2D matrix of conditional probabilities for each label.
k – The number of labels to predict for each instance.
return_meta – Whether to return metadata. Defaults to False.
- Returns:
The binary prediction matrix – with exactly k labels in each row, the shape and type of the matrix is the same as y_proba. If return_meta is True, additionally, a dictionary is returned, that contains the time taken to calculate the prediction.