Finding population classifiers using Frank Wolfe-based method (`xcolumns.frank_wolfe`)

xcolumns.frank_wolfe module implements the methods for finding the optimal population classifier using the Frank-Wolfe algorithm. The method was first introduced and described in the paper:

Erik Schultheis, Wojciech Kotłowski, Marek Wydmuch, Rohit Babbar, Strom Borman, Krzysztof Dembczyński. Consistent algorithms for multi-label classification with macro-at-k metrics. ICLR 2024.

The main function of the module is find_classifier_using_fw():

xcolumns.frank_wolfe.find_classifier_using_fw(y_true: ndarray | csr_matrix, y_proba: ndarray | csr_matrix, metric_func: Callable, k: int, max_iters: int = 100, init_classifier: str | Tuple[ndarray, ndarray] = 'top', maximize: bool = True, normalize_conf_matrix: bool = True, metric_kwargs: Dict[str, Any] | None = None, tolerance: float = 1e-06, search_for_best_alpha: bool = True, alpha_search_algo: str = 'uniform', alpha_tolerance: float = 0.001, alpha_uniform_search_step: float = 0.0001, skip_tn: bool = False, seed: int | None = None, verbose: bool = False, return_meta: bool = False, **kwargs) → RandomizedWeightedClassifier | Tuple[RandomizedWeightedClassifier, Dict[str, Any]][source]

Finds a randomized classifier that optimizes the given metric under Population Utility (PU) objective using the Frank-Wolfe algorithm on provided training dataset of true labels y_true and corresponding conditional probabilities y_proba.

The algorithm iteratively calculates the gradient of the metric with respect to the confusion matrix and updates the randomized classifier accordingly. The step size is determined by the Frank-Wolfe algorithm, that can be either the standard step size \(2/(i + 1)\) or can be searched for the best step size using the provided search algorithm. The algorithm stops if the step size is smaller than the provided epsilon alpha_tolerance or the maximum number of iterations max_iters is reached.

Parameters:

y_true – A 2D matrix of true labels of set that will be used to find the optimal classifier.
y_proba – A 2D matrix of conditional probabilities that will be used to find the optimal classifier.
metric_func – The metric function defined on confusion matrix to optimize. It needs to take four arguments that are vectors of: True Positives, False Positives, False Negatives, True Negatives for each label and return a scalar value (metric_func(tp, fp, fn, tn)).
k – The budget of labels to predict for each instance. If equal to 0, this means that there is no budget constraint.
max_iters – The maximum number of iterations.
init_classifier – The initial classifier, can be either “random”, “top”, or an initial weighted classifier with provided vectors of coeficients \(\boldsymbol{a}\) and constants \(\boldsymbol{b}\).
maximize – Whether to maximize or minimize the metric.
metric_kwargs – Additional keyword arguments for the metric function.
search_for_best_alpha – Whether to search for the best alpha (step size) in each iteration or to use standard Frank-Wolfe step size \(2/(i + 1)\), where \(i\) is an iteration number. Setting slows down the algorithm, but may help to find better solution if the metric is not convex.
alpha_search_algo – The algorithm for searching for the best alpha, can be either “uniform” or “ternary”. “Ternary” should be only used if the metric is unimodal.
alpha_tolerance – The stopping condition, if the new alpha is smaller than value of alpha_tolerance the algorithm stops.
alpha_uniform_search_step – The step size for uniform search of alpha.
skip_tn – Whether to skip the calculation of True Negatives in the confusion matrix, if the metric does not use the True Negatives, this can speed up the calculation, especially when using sparse matrices.
seed – The seed for the random selection of classifiers.
verbose – Whether to print additional information.
return_meta – Whether to return meta data.

Returns:

The randomized classifier – returned as RandomizedWeightedClassifier If return_meta is True, additionally, a dictionary is returned, that contains the time taken to calculate the prediction, the number of iterations, and step sizes for each iteration and calculated metric values for each weighted classifier.

Example

TODO

The function returns the RandomizedWeightedClassifier object that can be used for prediction. The RandomizedWeightedClassifier is a set of weighted classifiers with paramters a (slopes) and b (intercepts) for each label similiar to the form used in xcolumns.weighted_prediction.predict_weighted_per_instance(). The module also provides the function predict_using_randomized_weighted_classifier() for predicting the labels using the RandomizedWeightedClassifier object.

class xcolumns.frank_wolfe.RandomizedWeightedClassifier(k: int, a: ndarray, b: ndarray, p: ndarray)[source]

Bases: object

The class represents a randomized classifier that is a set of weighted classifiers, that are randomly selected for each instance according to the provided probabilities.

predict(y_proba: ndarray | csr_matrix, dtype: dtype | None = None, seed: int | None = None) → ndarray | csr_matrix[source]

Returns the weighted prediction for each instance (row) in a provided matrix of conditional probabilities estimates of labels \(\boldsymbol{\eta}\) (y_proba) using a randomized classifier.

Parameters:

y_proba – A 2D matrix of conditional probabilities for each label.
dtype – The data type for the output matrix, if equal to None, the data type of y_proba will be used.
seed – The seed for the random selection of classifiers.

Returns:

The binary prediction matrix – the shape and type of the matrix is the same as y_proba.

xcolumns.frank_wolfe.predict_using_randomized_weighted_classifier(y_proba: ndarray | csr_matrix, k: int, classifiers_a: ndarray, classifiers_b: ndarray, classifiers_proba: ndarray, dtype: dtype | None = None, seed: int | None = None) → ndarray | csr_matrix[source]

Returns the prediction of randomized weighted classifier for each instance (row) in a provided matrix of conditional probabilities estimates of labels \(\boldsymbol{H}\), (y_proba), where each element \(\eta_{ij} = P(y_j|x_i)\) is the probability of the label \(j\) for the instance \(i\). A randomized weighted classifier is a set of weighted classifiers (classifiers_a, and classifiers_b), one classifier is randomly selected and used for prediction for every instance according to the provided probabilities (classifiers_proba).

The gains vector \(\boldsymbol{g}\) is calculated for each instance \(i\) as follows:

\[\begin{split}c &= \text{choose random classifier index} \\ \boldsymbol{g} &= \boldsymbol{a}_c \odot \boldsymbol{\eta}_i + \boldsymbol{b}_c\end{split}\]

If k is larger than 0, the top k labels with the highest gains are selected for the instance. If k is 0, then the labels with gains higher than 0 are selected for the instance.

Parameters:

y_proba – A 2D matrix of conditional probabilities for each label of shape (n, m).
k – The number of labels to predict for each instance.
classifiers_a – The matrix of slopes (coefficients) \(\boldsymbol{A}\) used for calculating gains. Each row represents slopes \(\boldsymbol{a}_c\) of a single classifier. The number of rows needs to be equal to the number of rows of classifiers_b and size of classifiers_proba. The number of columns needs to be equal to the number of columns of y_proba (m).
classifiers_b – The matrix of intercepts (constants) \(\boldsymbol{B}\) used for calculating gains. Each row represents intercepts \(\boldsymbol{b}_c\) of a single classifier. The number of rows needs to be equal to the number of rows of classifiers_b and size of classifiers_proba. The number of columns needs to be equal to the number of columns of y_proba (m).
classifiers_proba – The vector of probabilities of selection for each classifier.
dtype – The data type for the output matrix, if equal to None, the data type of y_proba will be used.
seed – The seed for the random selection of classifiers.

Returns:

The binary prediction matrix – the shape and type of the matrix is the same as y_proba.

Wrapper functions for specific metrics

The module provides the wrapper functions for specific metrics that can be used as arguments for the find_classifier_using_fw() function as well as factory function for creating such wrapper functions.

xcolumns.frank_wolfe.find_classifier_optimizing_macro_balanced_accuracy_using_fw(y_true: ndarray | csr_matrix, y_proba: ndarray | csr_matrix, k: int, max_iters: int = 100, init_classifier: str | Tuple[ndarray, ndarray] = 'top', normalize_conf_matrix: bool = True, metric_kwargs: Dict[str, Any] | None = None, tolerance: float = 1e-06, search_for_best_alpha: bool = True, alpha_search_algo: str = 'uniform', alpha_tolerance: float = 0.001, alpha_uniform_search_step: float = 0.0001, seed: int | None = None, verbose: bool = False, return_meta: bool = False): Find a randomized classifier that maximizes macro-averaged balanced accuracy metric using Frank-Wolfe algorithm. It is equivalent to calling find_classifier_using_fw(y_true, y_proba, macro_metric_on_conf_matrix, k, ..., maximize=True, skip_tn=False) function. See find_classifier_using_fw() for more details and a description of arguments.

xcolumns.frank_wolfe.find_classifier_optimizing_macro_f1_score_using_fw(y_true: ndarray | csr_matrix, y_proba: ndarray | csr_matrix, k: int, max_iters: int = 100, init_classifier: str | Tuple[ndarray, ndarray] = 'top', normalize_conf_matrix: bool = True, metric_kwargs: Dict[str, Any] | None = None, tolerance: float = 1e-06, search_for_best_alpha: bool = True, alpha_search_algo: str = 'uniform', alpha_tolerance: float = 0.001, alpha_uniform_search_step: float = 0.0001, seed: int | None = None, verbose: bool = False, return_meta: bool = False): Find a randomized classifier that maximizes macro-averaged F1 score metric using Frank-Wolfe algorithm. It is equivalent to calling find_classifier_using_fw(y_true, y_proba, macro_metric_on_conf_matrix, k, ..., maximize=True, skip_tn=True) function. See find_classifier_using_fw() for more details and a description of arguments.

xcolumns.frank_wolfe.find_classifier_optimizing_macro_gmean_using_fw(y_true: ndarray | csr_matrix, y_proba: ndarray | csr_matrix, k: int, max_iters: int = 100, init_classifier: str | Tuple[ndarray, ndarray] = 'top', normalize_conf_matrix: bool = True, metric_kwargs: Dict[str, Any] | None = None, tolerance: float = 1e-06, search_for_best_alpha: bool = True, alpha_search_algo: str = 'uniform', alpha_tolerance: float = 0.001, alpha_uniform_search_step: float = 0.0001, seed: int | None = None, verbose: bool = False, return_meta: bool = False): Find a randomized classifier that maximizes macro-averaged G-mean metric using Frank-Wolfe algorithm. It is equivalent to calling find_classifier_using_fw(y_true, y_proba, macro_metric_on_conf_matrix, k, ..., maximize=True, skip_tn=False) function. See find_classifier_using_fw() for more details and a description of arguments.

xcolumns.frank_wolfe.find_classifier_optimizing_macro_hmean_using_fw(y_true: ndarray | csr_matrix, y_proba: ndarray | csr_matrix, k: int, max_iters: int = 100, init_classifier: str | Tuple[ndarray, ndarray] = 'top', normalize_conf_matrix: bool = True, metric_kwargs: Dict[str, Any] | None = None, tolerance: float = 1e-06, search_for_best_alpha: bool = True, alpha_search_algo: str = 'uniform', alpha_tolerance: float = 0.001, alpha_uniform_search_step: float = 0.0001, seed: int | None = None, verbose: bool = False, return_meta: bool = False): Find a randomized classifier that maximizes macro-averaged H-mean metric using Frank-Wolfe algorithm. It is equivalent to calling find_classifier_using_fw(y_true, y_proba, macro_metric_on_conf_matrix, k, ..., maximize=True, skip_tn=False) function. See find_classifier_using_fw() for more details and a description of arguments.

xcolumns.frank_wolfe.find_classifier_optimizing_macro_jaccard_score_using_fw(y_true: ndarray | csr_matrix, y_proba: ndarray | csr_matrix, k: int, max_iters: int = 100, init_classifier: str | Tuple[ndarray, ndarray] = 'top', normalize_conf_matrix: bool = True, metric_kwargs: Dict[str, Any] | None = None, tolerance: float = 1e-06, search_for_best_alpha: bool = True, alpha_search_algo: str = 'uniform', alpha_tolerance: float = 0.001, alpha_uniform_search_step: float = 0.0001, seed: int | None = None, verbose: bool = False, return_meta: bool = False): Find a randomized classifier that maximizes macro-averaged Jaccard score metric using Frank-Wolfe algorithm. It is equivalent to calling find_classifier_using_fw(y_true, y_proba, macro_metric_on_conf_matrix, k, ..., maximize=True, skip_tn=True) function. See find_classifier_using_fw() for more details and a description of arguments.

xcolumns.frank_wolfe.find_classifier_optimizing_macro_precision_using_fw(y_true: ndarray | csr_matrix, y_proba: ndarray | csr_matrix, k: int, max_iters: int = 100, init_classifier: str | Tuple[ndarray, ndarray] = 'top', normalize_conf_matrix: bool = True, metric_kwargs: Dict[str, Any] | None = None, tolerance: float = 1e-06, search_for_best_alpha: bool = True, alpha_search_algo: str = 'uniform', alpha_tolerance: float = 0.001, alpha_uniform_search_step: float = 0.0001, seed: int | None = None, verbose: bool = False, return_meta: bool = False): Find a randomized classifier that maximizes macro-averaged precision metric using Frank-Wolfe algorithm. It is equivalent to calling find_classifier_using_fw(y_true, y_proba, macro_metric_on_conf_matrix, k, ..., maximize=True, skip_tn=True) function. See find_classifier_using_fw() for more details and a description of arguments.

xcolumns.frank_wolfe.find_classifier_optimizing_macro_recall_using_fw(y_true: ndarray | csr_matrix, y_proba: ndarray | csr_matrix, k: int, max_iters: int = 100, init_classifier: str | Tuple[ndarray, ndarray] = 'top', normalize_conf_matrix: bool = True, metric_kwargs: Dict[str, Any] | None = None, tolerance: float = 1e-06, search_for_best_alpha: bool = True, alpha_search_algo: str = 'uniform', alpha_tolerance: float = 0.001, alpha_uniform_search_step: float = 0.0001, seed: int | None = None, verbose: bool = False, return_meta: bool = False): Find a randomized classifier that maximizes macro-averaged recall metric using Frank-Wolfe algorithm. It is equivalent to calling find_classifier_using_fw(y_true, y_proba, macro_metric_on_conf_matrix, k, ..., maximize=True, skip_tn=True) function. See find_classifier_using_fw() for more details and a description of arguments.

xcolumns.frank_wolfe.find_classifier_optimizing_micro_balanced_accuracy_using_fw(y_true: ndarray | csr_matrix, y_proba: ndarray | csr_matrix, k: int, max_iters: int = 100, init_classifier: str | Tuple[ndarray, ndarray] = 'top', normalize_conf_matrix: bool = True, metric_kwargs: Dict[str, Any] | None = None, tolerance: float = 1e-06, search_for_best_alpha: bool = True, alpha_search_algo: str = 'uniform', alpha_tolerance: float = 0.001, alpha_uniform_search_step: float = 0.0001, seed: int | None = None, verbose: bool = False, return_meta: bool = False): Find a randomized classifier that maximizes micro-averaged balanced accuracy metric using Frank-Wolfe algorithm. It is equivalent to calling find_classifier_using_fw(y_true, y_proba, micro_metric_on_conf_matrix, k, ..., maximize=True, skip_tn=False) function. See find_classifier_using_fw() for more details and a description of arguments.

xcolumns.frank_wolfe.find_classifier_optimizing_micro_f1_score_using_fw(y_true: ndarray | csr_matrix, y_proba: ndarray | csr_matrix, k: int, max_iters: int = 100, init_classifier: str | Tuple[ndarray, ndarray] = 'top', normalize_conf_matrix: bool = True, metric_kwargs: Dict[str, Any] | None = None, tolerance: float = 1e-06, search_for_best_alpha: bool = True, alpha_search_algo: str = 'uniform', alpha_tolerance: float = 0.001, alpha_uniform_search_step: float = 0.0001, seed: int | None = None, verbose: bool = False, return_meta: bool = False): Find a randomized classifier that maximizes micro-averaged F1 score metric using Frank-Wolfe algorithm. It is equivalent to calling find_classifier_using_fw(y_true, y_proba, micro_metric_on_conf_matrix, k, ..., maximize=True, skip_tn=True) function. See find_classifier_using_fw() for more details and a description of arguments.

xcolumns.frank_wolfe.find_classifier_optimizing_micro_gmean_using_fw(y_true: ndarray | csr_matrix, y_proba: ndarray | csr_matrix, k: int, max_iters: int = 100, init_classifier: str | Tuple[ndarray, ndarray] = 'top', normalize_conf_matrix: bool = True, metric_kwargs: Dict[str, Any] | None = None, tolerance: float = 1e-06, search_for_best_alpha: bool = True, alpha_search_algo: str = 'uniform', alpha_tolerance: float = 0.001, alpha_uniform_search_step: float = 0.0001, seed: int | None = None, verbose: bool = False, return_meta: bool = False): Find a randomized classifier that maximizes micro-averaged G-mean metric using Frank-Wolfe algorithm. It is equivalent to calling find_classifier_using_fw(y_true, y_proba, micro_metric_on_conf_matrix, k, ..., maximize=True, skip_tn=False) function. See find_classifier_using_fw() for more details and a description of arguments.

xcolumns.frank_wolfe.find_classifier_optimizing_micro_hmean_using_fw(y_true: ndarray | csr_matrix, y_proba: ndarray | csr_matrix, k: int, max_iters: int = 100, init_classifier: str | Tuple[ndarray, ndarray] = 'top', normalize_conf_matrix: bool = True, metric_kwargs: Dict[str, Any] | None = None, tolerance: float = 1e-06, search_for_best_alpha: bool = True, alpha_search_algo: str = 'uniform', alpha_tolerance: float = 0.001, alpha_uniform_search_step: float = 0.0001, seed: int | None = None, verbose: bool = False, return_meta: bool = False): Find a randomized classifier that maximizes micro-averaged H-mean metric using Frank-Wolfe algorithm. It is equivalent to calling find_classifier_using_fw(y_true, y_proba, micro_metric_on_conf_matrix, k, ..., maximize=True, skip_tn=False) function. See find_classifier_using_fw() for more details and a description of arguments.

xcolumns.frank_wolfe.find_classifier_optimizing_micro_jaccard_score_using_fw(y_true: ndarray | csr_matrix, y_proba: ndarray | csr_matrix, k: int, max_iters: int = 100, init_classifier: str | Tuple[ndarray, ndarray] = 'top', normalize_conf_matrix: bool = True, metric_kwargs: Dict[str, Any] | None = None, tolerance: float = 1e-06, search_for_best_alpha: bool = True, alpha_search_algo: str = 'uniform', alpha_tolerance: float = 0.001, alpha_uniform_search_step: float = 0.0001, seed: int | None = None, verbose: bool = False, return_meta: bool = False): Find a randomized classifier that maximizes micro-averaged Jaccard score metric using Frank-Wolfe algorithm. It is equivalent to calling find_classifier_using_fw(y_true, y_proba, micro_metric_on_conf_matrix, k, ..., maximize=True, skip_tn=True) function. See find_classifier_using_fw() for more details and a description of arguments.

xcolumns.frank_wolfe.find_classifier_optimizing_micro_precision_using_fw(y_true: ndarray | csr_matrix, y_proba: ndarray | csr_matrix, k: int, max_iters: int = 100, init_classifier: str | Tuple[ndarray, ndarray] = 'top', normalize_conf_matrix: bool = True, metric_kwargs: Dict[str, Any] | None = None, tolerance: float = 1e-06, search_for_best_alpha: bool = True, alpha_search_algo: str = 'uniform', alpha_tolerance: float = 0.001, alpha_uniform_search_step: float = 0.0001, seed: int | None = None, verbose: bool = False, return_meta: bool = False): Find a randomized classifier that maximizes micro-averaged precision metric using Frank-Wolfe algorithm. It is equivalent to calling find_classifier_using_fw(y_true, y_proba, micro_metric_on_conf_matrix, k, ..., maximize=True, skip_tn=True) function. See find_classifier_using_fw() for more details and a description of arguments.

xcolumns.frank_wolfe.find_classifier_optimizing_micro_recall_using_fw(y_true: ndarray | csr_matrix, y_proba: ndarray | csr_matrix, k: int, max_iters: int = 100, init_classifier: str | Tuple[ndarray, ndarray] = 'top', normalize_conf_matrix: bool = True, metric_kwargs: Dict[str, Any] | None = None, tolerance: float = 1e-06, search_for_best_alpha: bool = True, alpha_search_algo: str = 'uniform', alpha_tolerance: float = 0.001, alpha_uniform_search_step: float = 0.0001, seed: int | None = None, verbose: bool = False, return_meta: bool = False): Find a randomized classifier that maximizes micro-averaged recall metric using Frank-Wolfe algorithm. It is equivalent to calling find_classifier_using_fw(y_true, y_proba, micro_metric_on_conf_matrix, k, ..., maximize=True, skip_tn=True) function. See find_classifier_using_fw() for more details and a description of arguments.

xcolumns.frank_wolfe.find_classifier_optimizing_mixed_instance_precision_and_macro_f1_score_using_fw(y_true: ndarray | csr_matrix, y_proba: ndarray | csr_matrix, k: int, alpha: float = 1, **kwargs)[source]: Find a randomized classifier that maximizes a metric using Frank-Wolfe algorithm with metric being a weighted average of instance precision and macro-averaged f1 score as the target metric. See find_classifier_using_fw() for more details and a description of arguments.

xcolumns.frank_wolfe.find_classifier_optimizing_mixed_instance_precision_and_macro_precision_using_fw(y_true: ndarray | csr_matrix, y_proba: ndarray | csr_matrix, k: int, alpha: float = 1, **kwargs)[source]: Find a randomized classifier that maximizes a metric using Frank-Wolfe algorithm with metric being a weighted average of instance precision and macro-averaged precision as the target metric. See find_classifier_using_fw() for more details and a description of arguments.

xcolumns.frank_wolfe.find_classifier_optimizing_mixed_instance_precision_and_macro_recall_using_fw(y_true: ndarray | csr_matrix, y_proba: ndarray | csr_matrix, k: int, alpha: float = 1, **kwargs)[source]: Find a randomized classifier that maximizes a metric using Frank-Wolfe algorithm with metric being a weighted average of instance precision and macro-averaged recall as the target metric. See find_classifier_using_fw() for more details and a description of arguments.

xcolumns.frank_wolfe.find_classifier_optimizing_mixed_macro_recall_and_macro_precision_using_fw(y_true: ndarray | csr_matrix, y_proba: ndarray | csr_matrix, k: int, alpha: float = 1, **kwargs)[source]: Find a randomized classifier that maximizes a metric using Frank-Wolfe algorithm with metric being a weighted average of instance precision and macro-averaged precision as the target metric. See find_classifier_using_fw() for more details and a description of arguments.

xcolumns.frank_wolfe.make_frank_wolfe_wrapper(metric_func: Callable, metric_name: str, maximize: bool = True, skip_tn: bool = False, warn_k_eq_0: bool = False)[source]

Factory function that creates a wrapper function for finding a randomized classifier that optimizes a given metric using the Frank-Wolfe algorithm (find_classifier_using_fw()).

Parameters:

metric_func – The metric function to optimize.
metric_name – The name of the metric that will be used in docstring.
maximize – Whether to maximize the metric.
skip_tn – Whether to skip the calculation of True Negatives in the confusion matrix.
warn_k_eq_0 – Whether to warn if the budget k equal to 0 leads to degenerated solution.

Returns:

The wrapper function.

Finding population classifiers using Frank Wolfe-based method (xcolumns.frank_wolfe)

Wrapper functions for specific metrics

Finding population classifiers using Frank Wolfe-based method (`xcolumns.frank_wolfe`)