Block Coordinate-based prediction methods (xcolumns.block_coordinate
)
xcolumns.block_coordinate
module implements the methods for finding the optimal prediction for given test set using the Block Coordinate Ascend/Desend algorithm with 0-th order approximation of expected utility.
The method was first introduced and described in the paper:
Note: BCA/BCD with 0-approximationuses tp, fp, fn, tn matrices parametrization of the confussion matrix,
as opposed to algorithms presented in the paper, which use :math:t, q, p
parametrization. However both algorithms are equivalent.
The main function of the module is predict_using_bc_with_0approx()
:
- xcolumns.block_coordinate.predict_using_bc_with_0approx(y_proba: ndarray | csr_matrix, binary_metric_func: Callable | List[Callable], k: int, metric_aggregation: str = 'mean', maximize: bool = True, tolerance: float = 1e-06, init_y_pred: str | ndarray | csr_matrix = 'random', max_iters: int = 100, shuffle_order: bool = True, skip_tn: bool = False, return_meta: bool = False, seed: int | None = None, verbose: bool = False) ndarray | csr_matrix | Tuple[ndarray | csr_matrix, Dict[str, Any]] [source]
Predicts for each instance (row) in a provided matrix of conditional probabilities estimates of labels \(\boldsymbol{H}\), (y_proba), where each element \(\eta_{ij} = P(y_j|x_i)\) is the probability of the label \(j\) for the instance \(i\). It uses block coordinate ascent/descent with 0-th order approximation of Expected Test Utlity (ETU) objective optimizing a given metric, that decomposes into a sum/mean of binary metrics for each label:
\[\text{metric}(\boldsymbol{C}) = \sum_{j=1}^{m} \text{or} \prod_{j=1}^{m} \text{binary_metric}(\text{TP}_j, \text{FP}_j, \text{FN}_j, \text{TN}_j)\]The algorithm iterate over instances. It changes the predictions for the one that optimizes the metric the most for one instance at a time. The algorithm stops when the improvement of the metric is smaller than the given tolerance or the maximum number of iterations max_iters is reached.
- Parameters:
y_proba – The matrix of predicted probabilities of shape (n, m).
binary_metric_func – The binary metric function or a list of binary metric functions. It needs to take four arguments that are vectors of: True Positives, False Positives, False Negatives, True Negatives for each label, and return a vector of metric values for each label. (
binary_metric_func(tp, fp, fn, tn)
). If a list of functions is provided, the metric is calculated as a sum of the metrics calculated by each function.k – The budget of positive labels per instance. If k is 0, the algorithm optimizes for the metric without any budget constraint.
metric_aggregation – The aggregation function for the binary metric(s), that forms the final metrics (objectite). Either “mean” or “sum”.
maximize – Whether to maximize the metric.
tolerance – Defines the stopping condition, if the expected improvement of the metric is smaller than tolerance the algorithm stops.
init_y_pred – The initial prediction matrix. It can be either “random”, “top”, “greedy” or a matrix of shape (n, m).
max_iters – The maximum number of iterations.
shuffle_order – Whether to shuffle the order of instances in each iteration.
skip_tn – Whether to skip the calculation of True Negatives in the confusion matrix, if the metric does not use the True Negatives, this can speed up the calculation, especially when using sparse matrices.
return_meta – Whether to return the meta information.
seed – The seed for the random number generator.
verbose – Whether to print the progress.
- Returns:
The predicted labels matrix of shape (n, m) – the shape and type of the matrix is the same as y_proba. If return_meta is True, additionally, a dictionary is returned, that contains the time taken to calculate the prediction, the number of iterations, and expected performance at each iteration.
Example
Example of maximizng macro-averaged F1 score using block coordinate ascent algorithm:
from xcolumns.block_coordinate import predict_using_bc_with_0approx y_proba = some_model.predict_proba(X_test) # Marginal probabilities of labels, matrix of shape (n, m) # Define the binary metric function to optimize (eg. F1-score) def my_binary_f1_score_on_conf_matrix(tp, fp, fn, tn): return 2 * tp / (2 * tp + fp + fn + 1e-9) y_pred = predict_using_bc_with_0approx( y_proba, my_binary_f1_score_on_conf_matrix, k=3, # The udget of positive labels per instance metric_aggregation='mean', # The aggregation function for the binary metric, here mean results in macro-averaged F1-score maximize=True # Maximize the metric ) # Returns the predicted labels matrix of shape (n, m)
Wrapper functions for specific metrics
The module provides the wrapper functions for specific metrics that can be used as arguments for the predict_using_bc_with_0approx()
function as well as factory function for creating such wrapper functions.
- xcolumns.block_coordinate.make_bc_wrapper(binary_metric_func: Callable, metric_name: str, maximize: bool = True, metric_aggregation: str = 'mean', skip_tn: bool = False, warn_k_eq_0: bool = False)[source]
Factory function that creates a wrapper function for predicting for a given test set optimizing a given metric using using block coordinate ascent/descent method (
predict_using_bc_with_0approx()
).- Parameters:
metric_func – The metric function to optimize.
metric_name – The name of the metric that will be used in docstring.
maximize – Whether to maximize the metric.
metric_aggregation – The aggregation function for the metric. Either “mean” or “sum”.
skip_tn – Whether to skip the calculation of True Negatives in the confusion matrix.
warn_k_eq_0 – Whether to warn if the budget k equal to 0 leads to degenerated solution.
- Returns:
The wrapper function.
- xcolumns.block_coordinate.predict_optimizing_instance_precision_using_bc(y_proba: ndarray | csr_matrix, k: int, tolerance: float = 1e-06, init_y_pred: str | ndarray | csr_matrix = 'random', max_iters: int = 100, shuffle_order: bool = True, verbose: bool = False, return_meta: bool = False)[source]
This function is a wrapper for using block coordinate ascent with instance precision as the target metric. See
predict_using_bc_with_0approx()
for more details and a description of parameters.
- xcolumns.block_coordinate.predict_optimizing_macro_balanced_accuracy_using_fw(y_proba: ndarray | csr_matrix, k: int, tolerance: float = 1e-06, init_y_pred: str | ndarray | csr_matrix = 'random', max_iters: int = 100, shuffle_order: bool = True, return_meta: bool = False, seed: int | None = None, verbose: bool = False)
Find a randomized classifier that maximizes macro-averaged balanced accuracy metric using Frank-Wolfe algorithm. It is equivalent to calling
find_classifier_using_fw(y_true, y_proba, metric_on_y_true_and_y_pred, k, ..., metric_aggregation=mean, maximize=True, skip_tn=False)
function. Seepredict_using_bc_with_0approx()
for more details and a description of arguments.
- xcolumns.block_coordinate.predict_optimizing_macro_f1_score_using_bc(y_proba: ndarray | csr_matrix, k: int, tolerance: float = 1e-06, init_y_pred: str | ndarray | csr_matrix = 'random', max_iters: int = 100, shuffle_order: bool = True, return_meta: bool = False, seed: int | None = None, verbose: bool = False)
Find a randomized classifier that maximizes macro-averaged F1 score metric using Frank-Wolfe algorithm. It is equivalent to calling
find_classifier_using_fw(y_true, y_proba, binary_f1_score_on_conf_matrix, k, ..., metric_aggregation=mean, maximize=True, skip_tn=True)
function. Seepredict_using_bc_with_0approx()
for more details and a description of arguments.
- xcolumns.block_coordinate.predict_optimizing_macro_gmean_using_fw(y_proba: ndarray | csr_matrix, k: int, tolerance: float = 1e-06, init_y_pred: str | ndarray | csr_matrix = 'random', max_iters: int = 100, shuffle_order: bool = True, return_meta: bool = False, seed: int | None = None, verbose: bool = False)
Find a randomized classifier that maximizes macro-averaged G-mean metric using Frank-Wolfe algorithm. It is equivalent to calling
find_classifier_using_fw(y_true, y_proba, metric_on_y_true_and_y_pred, k, ..., metric_aggregation=mean, maximize=True, skip_tn=False)
function. Seepredict_using_bc_with_0approx()
for more details and a description of arguments.
- xcolumns.block_coordinate.predict_optimizing_macro_hmean_using_fw(y_proba: ndarray | csr_matrix, k: int, tolerance: float = 1e-06, init_y_pred: str | ndarray | csr_matrix = 'random', max_iters: int = 100, shuffle_order: bool = True, return_meta: bool = False, seed: int | None = None, verbose: bool = False)
Find a randomized classifier that maximizes macro-averaged H-mean metric using Frank-Wolfe algorithm. It is equivalent to calling
find_classifier_using_fw(y_true, y_proba, metric_on_y_true_and_y_pred, k, ..., metric_aggregation=mean, maximize=True, skip_tn=False)
function. Seepredict_using_bc_with_0approx()
for more details and a description of arguments.
- xcolumns.block_coordinate.predict_optimizing_macro_jaccard_score_using_fw(y_proba: ndarray | csr_matrix, k: int, tolerance: float = 1e-06, init_y_pred: str | ndarray | csr_matrix = 'random', max_iters: int = 100, shuffle_order: bool = True, return_meta: bool = False, seed: int | None = None, verbose: bool = False)
Find a randomized classifier that maximizes macro-averaged Jaccard score metric using Frank-Wolfe algorithm. It is equivalent to calling
find_classifier_using_fw(y_true, y_proba, metric_on_y_true_and_y_pred, k, ..., metric_aggregation=mean, maximize=True, skip_tn=True)
function. Seepredict_using_bc_with_0approx()
for more details and a description of arguments.
- xcolumns.block_coordinate.predict_optimizing_macro_precision_using_bc(y_proba: ndarray | csr_matrix, k: int, tolerance: float = 1e-06, init_y_pred: str | ndarray | csr_matrix = 'random', max_iters: int = 100, shuffle_order: bool = True, return_meta: bool = False, seed: int | None = None, verbose: bool = False)
Find a randomized classifier that maximizes macro-averaged precision metric using Frank-Wolfe algorithm. It is equivalent to calling
find_classifier_using_fw(y_true, y_proba, binary_precision_on_conf_matrix, k, ..., metric_aggregation=mean, maximize=True, skip_tn=True)
function. Seepredict_using_bc_with_0approx()
for more details and a description of arguments.
- xcolumns.block_coordinate.predict_optimizing_macro_recall_using_bc(y_proba: ndarray | csr_matrix, k: int, tolerance: float = 1e-06, init_y_pred: str | ndarray | csr_matrix = 'random', max_iters: int = 100, shuffle_order: bool = True, return_meta: bool = False, seed: int | None = None, verbose: bool = False)
Find a randomized classifier that maximizes macro-averaged recall metric using Frank-Wolfe algorithm. It is equivalent to calling
find_classifier_using_fw(y_true, y_proba, binary_recall_on_conf_matrix, k, ..., metric_aggregation=mean, maximize=True, skip_tn=True)
function. Seepredict_using_bc_with_0approx()
for more details and a description of arguments.
- xcolumns.block_coordinate.predict_optimizing_mixed_instance_precision_and_macro_f1_score_using_bc(y_proba: ndarray | csr_matrix, k: int, alpha: float = 1, tolerance: float = 1e-06, init_y_pred: str | ndarray | csr_matrix = 'random', max_iters: int = 100, shuffle_order: bool = True, verbose: bool = False, return_meta: bool = False)[source]
This function is a wrapper for using block coordinate ascent with metric being a weighted average of instance precision and macro f1 as the target metric. See
predict_using_bc_with_0approx()
for more details and a description of parameters.
- xcolumns.block_coordinate.predict_optimizing_mixed_instance_precision_and_macro_precision_using_bc(y_proba: ndarray | csr_matrix, k: int, alpha: float = 1, tolerance: float = 1e-06, init_y_pred: str | ndarray | csr_matrix = 'random', max_iters: int = 100, shuffle_order: bool = True, verbose: bool = False, return_meta: bool = False)[source]
This function is a wrapper for using block coordinate ascent with metric being a weighted average of instance precision and macro precision as the target metric. See
predict_using_bc_with_0approx()
for more details and a description of parameters.
- xcolumns.block_coordinate.predict_optimizing_mixed_instance_precision_and_macro_recall_using_bc(y_proba: ndarray | csr_matrix, k: int, alpha: float = 1, tolerance: float = 1e-06, init_y_pred: str | ndarray | csr_matrix = 'random', max_iters: int = 100, shuffle_order: bool = True, verbose: bool = False, return_meta: bool = False)[source]
This function is a wrapper for using block coordinate ascent with metric being a weighted average of instance precision and macro f1 as the target metric. See
predict_using_bc_with_0approx()
for more details and a description of parameters.
Special function for optimization of coverage
The module provides the special function for optimization of coverage metric that use other way of estimating the expected value of the metric than predict_using_bc_with_0approx()
function.
- xcolumns.block_coordinate.predict_optimizing_coverage_using_bc(y_proba: ndarray | csr_matrix, k: int, alpha: float = 1, tolerance: float = 1e-06, init_y_pred: str | ndarray | csr_matrix = 'random', max_iters: int = 100, shuffle_order: bool = True, return_meta: bool = False, seed: int | None = None, verbose: bool = False) ndarray | csr_matrix | Tuple[ndarray | csr_matrix, Dict[str, Any]] [source]
An efficient implementation of the block coordinate-descent for coverage.
TODO: Add more details