Finding population classifiers using Frank Wolfe-based method (xcolumns.frank_wolfe
)
xcolumns.frank_wolfe
module implements the methods for finding the optimal population classifier using the Frank-Wolfe algorithm.
The method was first introduced and described in the paper:
The main function of the module is find_classifier_using_fw()
:
- xcolumns.frank_wolfe.find_classifier_using_fw(y_true: ndarray | csr_matrix, y_proba: ndarray | csr_matrix, metric_func: Callable, k: int, max_iters: int = 100, init_classifier: str | Tuple[ndarray, ndarray] = 'random', maximize: bool = True, search_for_best_alpha: bool = True, alpha_search_algo: str = 'uniform', alpha_tolerance: float = 0.001, alpha_uniform_search_step: float = 0.001, skip_tn: bool = False, seed: int | None = None, verbose: bool = False, return_meta: bool = False) RandomizedWeightedClassifier | Tuple[RandomizedWeightedClassifier, Dict[str, Any]] [source]
Finds a randomized classifier that optimizes the given metric under Population Utility (PU) objective using the Frank-Wolfe algorithm on provided training dataset of true labels y_true and corresponding conditional probabilities y_proba.
The algorithm iteratively calculates the gradient of the metric with respect to the confusion matrix and updates the randomized classifier accordingly. The step size is determined by the Frank-Wolfe algorithm, that can be either the standard step size \(2/(i + 1)\) or can be searched for the best step size using the provided search algorithm. The algorithm stops if the step size is smaller than the provided epsilon alpha_tolerance or the maximum number of iterations max_iters is reached.
- Parameters:
y_true – A 2D matrix of true labels of set that will be used to find the optimal classifier.
y_proba – A 2D matrix of conditional probabilities that will be used to find the optimal classifier.
metric_func – The metric function defined on confusion matrix to optimize. It needs to take four arguments that are vectors of: True Positives, False Positives, False Negatives, True Negatives for each label and return a scalar value (
metric_func(tp, fp, fn, tn)
).k – The budget of labels to predict for each instance. If equal to 0, this means that there is no budget constraint.
max_iters – The maximum number of iterations.
init_classifier – The initial classifier, can be either “random”, “top”, or an initial weighted classifier with provided vectors of coeficients \(\boldsymbol{a}\) and constants \(\boldsymbol{b}\).
maximize – Whether to maximize or minimize the metric.
search_for_best_alpha – Whether to search for the best alpha (step size) in each iteration or to use standard Frank-Wolfe step size \(2/(i + 1)\), where \(i\) is an iteration number. Setting slows down the algorithm, but may help to find better solution if the metric is not convex.
alpha_search_algo – The algorithm for searching for the best alpha, can be either “uniform” or “ternary”. “Ternary” should be only used if the metric is unimodal.
alpha_tolerance – The stopping condition, if the new alpha is smaller than value of alpha_tolerance the algorithm stops.
alpha_uniform_search_step – The step size for uniform search of alpha.
skip_tn – Whether to skip the calculation of True Negatives in the confusion matrix, if the metric does not use the True Negatives, this can speed up the calculation, especially when using sparse matrices.
seed – The seed for the random selection of classifiers.
verbose – Whether to print additional information.
return_meta – Whether to return meta data.
- Returns:
The randomized classifier – returned as
RandomizedWeightedClassifier
If return_meta is True, additionally, a dictionary is returned, that contains the time taken to calculate the prediction, the number of iterations, and step sizes for each iteration and calculated metric values for each weighted classifier.
Example
TODO
The function returns the RandomizedWeightedClassifier object that can be used for prediction.
The RandomizedWeightedClassifier is a set of weighted classifiers as defined in
The module also provides the function predict_using_randomized_weighted_classifier()
for predicting the labels using the RandomizedWeightedClassifier object.
- class xcolumns.frank_wolfe.RandomizedWeightedClassifier(k: int, a: ndarray, b: ndarray, p: ndarray)[source]
Bases:
object
The class represents a randomized classifier that is a set of weighted classifiers, that are randomly selected for each instance according to the provided probabilities.
- predict(y_proba: ndarray | csr_matrix, dtype: dtype | None = None, seed: int | None = None) ndarray | csr_matrix [source]
Returns the weighted prediction for each instance (row) in a provided matrix of conditional probabilities estimates of labels \(\boldsymbol{\eta}\) (y_proba) using a randomized classifier.
- Parameters:
y_proba – A 2D matrix of conditional probabilities for each label.
dtype – The data type for the output matrix, if equal to None, the data type of y_proba will be used.
seed – The seed for the random selection of classifiers.
- Returns:
The binary prediction matrix – the shape and type of the matrix is the same as y_proba.
- xcolumns.frank_wolfe.predict_using_randomized_weighted_classifier(y_proba: ndarray | csr_matrix, k: int, classifiers_a: ndarray, classifiers_b: ndarray, classifiers_proba: ndarray, dtype: dtype | None = None, seed: int | None = None) ndarray | csr_matrix [source]
Returns the prediction of randomized weighted classifier for each instance (row) in a provided matrix of conditional probabilities estimates of labels \(\boldsymbol{H}\), (y_proba), where each element \(\eta_{ij} = P(y_j|x_i)\) is the probability of the label \(j\) for the instance \(i\). A randomized weighted classifier is a set of weighted classifiers (classifiers_a, and classifiers_b), one classifier is randomly selected and used for prediction for every instance according to the provided probabilities (classifiers_proba).
The gains vector \(\boldsymbol{g}\) is calculated for each instance \(i\) as follows:
\[\begin{split}c &= \text{choose random classifier index} \\ \boldsymbol{g} &= \boldsymbol{a}_c \odot \boldsymbol{\eta}_i + \boldsymbol{b}_c\end{split}\]If k is larger than 0, the top k labels with the highest gains are selected for the instance. If k is 0, then the labels with gains higher than 0 are selected for the instance.
- Parameters:
y_proba – A 2D matrix of conditional probabilities for each label of shape (n, m).
k – The number of labels to predict for each instance.
classifiers_a – The matrix of slopes (coefficients) \(\boldsymbol{A}\) used for calculating gains. Each row represents slopes \(\boldsymbol{a}_c\) of a single classifier. The number of rows needs to be equal to the number of rows of classifiers_b and size of classifiers_proba. The number of columns needs to be equal to the number of columns of y_proba (m).
classifiers_b – The matrix of intercepts (constants) \(\boldsymbol{B}\) used for calculating gains. Each row represents intercepts \(\boldsymbol{b}_c\) of a single classifier. The number of rows needs to be equal to the number of rows of classifiers_b and size of classifiers_proba. The number of columns needs to be equal to the number of columns of y_proba (m).
classifiers_proba – The vector of probabilities of selection for each classifier.
dtype – The data type for the output matrix, if equal to None, the data type of y_proba will be used.
seed – The seed for the random selection of classifiers.
- Returns:
The binary prediction matrix – the shape and type of the matrix is the same as y_proba.
Wrapper functions for specific metrics
The module provides the wrapper functions for specific metrics that can be used as arguments for the find_classifier_using_fw()
function as well as factory function for creating such wrapper functions.
- xcolumns.frank_wolfe.find_classifier_optimizing_macro_balanced_accuracy_using_fw(y_true: ndarray | csr_matrix, y_proba: ndarray | csr_matrix, k: int, max_iters: int = 100, init_classifier: str | Tuple[ndarray, ndarray] = 'random', search_for_best_alpha: bool = True, alpha_search_algo: str = 'uniform', alpha_tolerance: float = 0.001, alpha_uniform_search_step: float = 0.001, seed: int | None = None, verbose: bool = False, return_meta: bool = False)
Find a randomized classifier that maximizes macro-averaged balanced accuracy metric using Frank-Wolfe algorithm. It is equivalent to calling
find_classifier_using_fw(y_true, y_proba, metric_on_y_true_and_y_pred, k, ..., maximize=True, skip_tn=False)
function. Seefind_classifier_using_fw()
for more details and a description of arguments.
- xcolumns.frank_wolfe.find_classifier_optimizing_macro_f1_score_using_fw(y_true: ndarray | csr_matrix, y_proba: ndarray | csr_matrix, k: int, max_iters: int = 100, init_classifier: str | Tuple[ndarray, ndarray] = 'random', search_for_best_alpha: bool = True, alpha_search_algo: str = 'uniform', alpha_tolerance: float = 0.001, alpha_uniform_search_step: float = 0.001, seed: int | None = None, verbose: bool = False, return_meta: bool = False)
Find a randomized classifier that maximizes macro-averaged F1 score metric using Frank-Wolfe algorithm. It is equivalent to calling
find_classifier_using_fw(y_true, y_proba, metric_on_y_true_and_y_pred, k, ..., maximize=True, skip_tn=True)
function. Seefind_classifier_using_fw()
for more details and a description of arguments.
- xcolumns.frank_wolfe.find_classifier_optimizing_macro_gmean_using_fw(y_true: ndarray | csr_matrix, y_proba: ndarray | csr_matrix, k: int, max_iters: int = 100, init_classifier: str | Tuple[ndarray, ndarray] = 'random', search_for_best_alpha: bool = True, alpha_search_algo: str = 'uniform', alpha_tolerance: float = 0.001, alpha_uniform_search_step: float = 0.001, seed: int | None = None, verbose: bool = False, return_meta: bool = False)
Find a randomized classifier that maximizes macro-averaged G-mean metric using Frank-Wolfe algorithm. It is equivalent to calling
find_classifier_using_fw(y_true, y_proba, metric_on_y_true_and_y_pred, k, ..., maximize=True, skip_tn=False)
function. Seefind_classifier_using_fw()
for more details and a description of arguments.
- xcolumns.frank_wolfe.find_classifier_optimizing_macro_hmean_using_fw(y_true: ndarray | csr_matrix, y_proba: ndarray | csr_matrix, k: int, max_iters: int = 100, init_classifier: str | Tuple[ndarray, ndarray] = 'random', search_for_best_alpha: bool = True, alpha_search_algo: str = 'uniform', alpha_tolerance: float = 0.001, alpha_uniform_search_step: float = 0.001, seed: int | None = None, verbose: bool = False, return_meta: bool = False)
Find a randomized classifier that maximizes macro-averaged H-mean metric using Frank-Wolfe algorithm. It is equivalent to calling
find_classifier_using_fw(y_true, y_proba, metric_on_y_true_and_y_pred, k, ..., maximize=True, skip_tn=False)
function. Seefind_classifier_using_fw()
for more details and a description of arguments.
- xcolumns.frank_wolfe.find_classifier_optimizing_macro_jaccard_score_using_fw(y_true: ndarray | csr_matrix, y_proba: ndarray | csr_matrix, k: int, max_iters: int = 100, init_classifier: str | Tuple[ndarray, ndarray] = 'random', search_for_best_alpha: bool = True, alpha_search_algo: str = 'uniform', alpha_tolerance: float = 0.001, alpha_uniform_search_step: float = 0.001, seed: int | None = None, verbose: bool = False, return_meta: bool = False)
Find a randomized classifier that maximizes macro-averaged Jaccard score metric using Frank-Wolfe algorithm. It is equivalent to calling
find_classifier_using_fw(y_true, y_proba, metric_on_y_true_and_y_pred, k, ..., maximize=True, skip_tn=True)
function. Seefind_classifier_using_fw()
for more details and a description of arguments.
- xcolumns.frank_wolfe.find_classifier_optimizing_macro_precision_using_fw(y_true: ndarray | csr_matrix, y_proba: ndarray | csr_matrix, k: int, max_iters: int = 100, init_classifier: str | Tuple[ndarray, ndarray] = 'random', search_for_best_alpha: bool = True, alpha_search_algo: str = 'uniform', alpha_tolerance: float = 0.001, alpha_uniform_search_step: float = 0.001, seed: int | None = None, verbose: bool = False, return_meta: bool = False)
Find a randomized classifier that maximizes macro-averaged precision metric using Frank-Wolfe algorithm. It is equivalent to calling
find_classifier_using_fw(y_true, y_proba, metric_on_y_true_and_y_pred, k, ..., maximize=True, skip_tn=True)
function. Seefind_classifier_using_fw()
for more details and a description of arguments.
- xcolumns.frank_wolfe.find_classifier_optimizing_macro_recall_using_fw(y_true: ndarray | csr_matrix, y_proba: ndarray | csr_matrix, k: int, max_iters: int = 100, init_classifier: str | Tuple[ndarray, ndarray] = 'random', search_for_best_alpha: bool = True, alpha_search_algo: str = 'uniform', alpha_tolerance: float = 0.001, alpha_uniform_search_step: float = 0.001, seed: int | None = None, verbose: bool = False, return_meta: bool = False)
Find a randomized classifier that maximizes macro-averaged recall metric using Frank-Wolfe algorithm. It is equivalent to calling
find_classifier_using_fw(y_true, y_proba, metric_on_y_true_and_y_pred, k, ..., maximize=True, skip_tn=True)
function. Seefind_classifier_using_fw()
for more details and a description of arguments.
- xcolumns.frank_wolfe.find_classifier_optimizing_micro_balanced_accuracy_using_fw(y_true: ndarray | csr_matrix, y_proba: ndarray | csr_matrix, k: int, max_iters: int = 100, init_classifier: str | Tuple[ndarray, ndarray] = 'random', search_for_best_alpha: bool = True, alpha_search_algo: str = 'uniform', alpha_tolerance: float = 0.001, alpha_uniform_search_step: float = 0.001, seed: int | None = None, verbose: bool = False, return_meta: bool = False)
Find a randomized classifier that maximizes micro-averaged balanced accuracy metric using Frank-Wolfe algorithm. It is equivalent to calling
find_classifier_using_fw(y_true, y_proba, metric_on_y_true_and_y_pred, k, ..., maximize=True, skip_tn=False)
function. Seefind_classifier_using_fw()
for more details and a description of arguments.
- xcolumns.frank_wolfe.find_classifier_optimizing_micro_f1_score_using_fw(y_true: ndarray | csr_matrix, y_proba: ndarray | csr_matrix, k: int, max_iters: int = 100, init_classifier: str | Tuple[ndarray, ndarray] = 'random', search_for_best_alpha: bool = True, alpha_search_algo: str = 'uniform', alpha_tolerance: float = 0.001, alpha_uniform_search_step: float = 0.001, seed: int | None = None, verbose: bool = False, return_meta: bool = False)
Find a randomized classifier that maximizes micro-averaged F1 score metric using Frank-Wolfe algorithm. It is equivalent to calling
find_classifier_using_fw(y_true, y_proba, metric_on_y_true_and_y_pred, k, ..., maximize=True, skip_tn=True)
function. Seefind_classifier_using_fw()
for more details and a description of arguments.
- xcolumns.frank_wolfe.find_classifier_optimizing_micro_gmean_using_fw(y_true: ndarray | csr_matrix, y_proba: ndarray | csr_matrix, k: int, max_iters: int = 100, init_classifier: str | Tuple[ndarray, ndarray] = 'random', search_for_best_alpha: bool = True, alpha_search_algo: str = 'uniform', alpha_tolerance: float = 0.001, alpha_uniform_search_step: float = 0.001, seed: int | None = None, verbose: bool = False, return_meta: bool = False)
Find a randomized classifier that maximizes micro-averaged G-mean metric using Frank-Wolfe algorithm. It is equivalent to calling
find_classifier_using_fw(y_true, y_proba, metric_on_y_true_and_y_pred, k, ..., maximize=True, skip_tn=False)
function. Seefind_classifier_using_fw()
for more details and a description of arguments.
- xcolumns.frank_wolfe.find_classifier_optimizing_micro_hmean_using_fw(y_true: ndarray | csr_matrix, y_proba: ndarray | csr_matrix, k: int, max_iters: int = 100, init_classifier: str | Tuple[ndarray, ndarray] = 'random', search_for_best_alpha: bool = True, alpha_search_algo: str = 'uniform', alpha_tolerance: float = 0.001, alpha_uniform_search_step: float = 0.001, seed: int | None = None, verbose: bool = False, return_meta: bool = False)
Find a randomized classifier that maximizes micro-averaged H-mean metric using Frank-Wolfe algorithm. It is equivalent to calling
find_classifier_using_fw(y_true, y_proba, metric_on_y_true_and_y_pred, k, ..., maximize=True, skip_tn=False)
function. Seefind_classifier_using_fw()
for more details and a description of arguments.
- xcolumns.frank_wolfe.find_classifier_optimizing_micro_jaccard_score_using_fw(y_true: ndarray | csr_matrix, y_proba: ndarray | csr_matrix, k: int, max_iters: int = 100, init_classifier: str | Tuple[ndarray, ndarray] = 'random', search_for_best_alpha: bool = True, alpha_search_algo: str = 'uniform', alpha_tolerance: float = 0.001, alpha_uniform_search_step: float = 0.001, seed: int | None = None, verbose: bool = False, return_meta: bool = False)
Find a randomized classifier that maximizes micro-averaged Jaccard score metric using Frank-Wolfe algorithm. It is equivalent to calling
find_classifier_using_fw(y_true, y_proba, metric_on_y_true_and_y_pred, k, ..., maximize=True, skip_tn=True)
function. Seefind_classifier_using_fw()
for more details and a description of arguments.
- xcolumns.frank_wolfe.make_frank_wolfe_wrapper(metric_func: Callable, metric_name: str, maximize: bool = True, skip_tn: bool = False, warn_k_eq_0: bool = False)[source]
Factory function that creates a wrapper function for finding a randomized classifier that optimizes a given metric using the Frank-Wolfe algorithm (
find_classifier_using_fw()
).- Parameters:
metric_func – The metric function to optimize.
metric_name – The name of the metric that will be used in docstring.
maximize – Whether to maximize the metric.
skip_tn – Whether to skip the calculation of True Negatives in the confusion matrix.
warn_k_eq_0 – Whether to warn if the budget k equal to 0 leads to degenerated solution.
- Returns:
The wrapper function.