imsegm.classification module¶
Supporting file to create and set parameters for scikit-learn classifiers and some prepossessing functions that support classification
Copyright (C) 2014-2018 Jiri Borovec <jiri.borovec@fel.cvut.cz>
-
class
imsegm.classification.
CrossValidate
(nb_samples, nb_hold_out, rand_seed=None, ignore_overflow=0.01)[source]¶ Bases:
object
Cross-validator generator. In the hold-out, the data is split only once into a train set and a test set.
- Parameters
nb_samples (integer, total number of samples) –
nb_hold_out (integer, number of samples hold out) –
rand_seed (seed for the random number generator) –
ignore_overflow (float, tolerance while dividing dataset to folds) –
Examples
>>> # balanced split >>> cv = CrossValidate(6, 3, rand_seed=False) >>> cv.indexes [0, 1, 2, 3, 4, 5] >>> len(cv) 2 >>> list(cv) [([3, 4, 5], [0, 1, 2]), ([0, 1, 2], [3, 4, 5])] >>> [(len(tr), len(ts)) for tr, ts in CrossValidate(340, 0.41)] [(201, 139), (201, 139), (201, 139)]
>>> # not rounded split >>> cv = CrossValidate(7, 3, rand_seed=0) >>> list(cv) [([3, 0, 5, 4], [6, 2, 1]), ([6, 2, 1, 4], [3, 0, 5]), ([1, 3, 0, 5], [4, 6, 2])] >>> len(cv) 3 >>> cv.indexes [6, 2, 1, 3, 0, 5, 4]
>>> # larger test then train >>> cv = CrossValidate(7, 5, rand_seed=0) >>> list(cv) [([6, 2], [1, 3, 0, 5, 4]), ([1, 3], [6, 2, 0, 5, 4]), ([0, 5], [6, 2, 1, 3, 4]), ([4, 6], [2, 1, 3, 0, 5])] >>> [(len(tr), len(ts)) for tr, ts in CrossValidate(340, 0.55)] [(153, 187), (153, 187), (153, 187)]
>>> # impact of tolerance >>> len(CrossValidate(340, 0.33, ignore_overflow=0.0)) 4 >>> len(CrossValidate(340, 0.33, ignore_overflow=0.05)) 3
>>> [(len(tr), len(ts)) for tr, ts in CrossValidate(4651, 0.25, ignore_overflow=0.)] [(3488, 1163), (3488, 1163), (3488, 1163), (3488, 1163)] >>> [(len(tr), len(ts)) for tr, ts in CrossValidate(4651, 0.25, ignore_overflow=1e-2)] [(3488, 1163), (3488, 1163), (3488, 1163), (3489, 1162)]
constructor
- Parameters
-
class
imsegm.classification.
CrossValidateGroups
(set_sizes, nb_hold_out, rand_seed=None, ignore_overflow=0.01)[source]¶ Bases:
imsegm.classification.CrossValidate
Cross-validator generator. In the hold-out, the data is split only once into a train set and a test set.
- Parameters
set_sizes (list of integers, number of samples in each set) –
nb_hold_out (integer, number of sets hold out) –
rand_seed (seed for the random number generator) –
ignore_overflow (float, tolerance while dividing dataset to folds) –
Examples
>>> # balance split >>> cv = CrossValidateGroups([2, 3, 2, 3], 2, rand_seed=False) >>> cv.set_indexes [[0, 1], [2, 3, 4], [5, 6], [7, 8, 9]] >>> len(cv) 2 >>> list(cv) [([5, 6, 7, 8, 9], [0, 1, 2, 3, 4]), ([0, 1, 2, 3, 4], [5, 6, 7, 8, 9])] >>> [(len(tr), len(ts)) for tr, ts in CrossValidateGroups([7] * 340, 0.41)] [(1407, 973), (1407, 973), (1407, 973)]
>>> # unbalanced split >>> cv = CrossValidateGroups([2, 2, 1, 2, 1], 2, rand_seed=0) >>> cv.set_indexes [[0, 1], [2, 3], [4], [5, 6], [7]] >>> list(cv) [([2, 3, 5, 6, 7], [4, 0, 1]), ([4, 0, 1, 7], [2, 3, 5, 6]), ([0, 1, 2, 3, 5, 6], [7, 4])] >>> len(cv) 3 >>> cv.indexes [2, 0, 1, 3, 4]
>>> # larger test then train >>> cv = CrossValidateGroups([2, 2, 1, 2, 1, 1], 4, rand_seed=0) >>> list(cv) [([8, 4], [2, 3, 5, 6, 0, 1, 7]), ([2, 3, 5, 6], [8, 4, 0, 1, 7]), ([0, 1, 7], [8, 4, 2, 3, 5, 6])] >>> [(len(tr), len(ts)) for tr, ts in CrossValidateGroups([7] * 340, 0.55)] [(1071, 1309), (1071, 1309), (1071, 1309)]
construct
- Parameters
-
class
imsegm.classification.
HoldOut
(nb_samples, hold_out, rand_seed=0)[source]¶ Bases:
object
Hold-out cross-validator generator. In the hold-out, the data is split only once into a train set and a test set. Unlike in other cross-validation schemes, the hold-out consists of only one iteration.
- Parameters
Example
>>> ho = HoldOut(10, 7, rand_seed=None) >>> len(ho) 1 >>> list(ho) [([0, 1, 2, 3, 4, 5, 6], [7, 8, 9])] >>> ho = HoldOut(10, 7, rand_seed=0) >>> list(ho) [([2, 8, 4, 9, 1, 6, 7], [3, 0, 5])]
constructor
-
imsegm.classification.
balance_dataset_by_
(features, labels, balance_type='random', min_samples=None)[source]¶ balance number of training examples per class by several method
- Parameters
- Return tuple(ndarray,ndarray)
>>> np.random.seed(0) >>> fts, lbs = balance_dataset_by_(np.random.random((25, 3)), ... np.random.randint(0, 2, 25)) >>> fts.shape (24, 3) >>> lbs [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
-
imsegm.classification.
compose_dict_label_features
(features, labels)[source]¶ convert vector of features and related labels to a dictionary of features where key is the lables
-
imsegm.classification.
compute_classif_metrics
(y_true, y_pred, metric_averages=('macro', 'weighted'))[source]¶ compute standard metrics for multi-class classification
- Parameters
- Return dict(float)
>>> np.random.seed(0) >>> y_true = np.random.randint(0, 3, 25) * 2 >>> y_pred = np.random.randint(0, 2, 25) * 2 >>> d = compute_classif_metrics(y_true, y_true) >>> d['accuracy'] 1.0 >>> d['confusion'] [[10, 0, 0], [0, 10, 0], [0, 0, 5]] >>> d = compute_classif_metrics(y_true, y_pred) >>> d['accuracy'] 0.32... >>> d['confusion'] [[3, 7, 0], [5, 5, 0], [1, 4, 0]] >>> d = compute_classif_metrics(y_pred, y_pred) >>> d['accuracy'] 1.0
-
imsegm.classification.
compute_classif_stat_segm_annot
(annot_segm_name, drop_labels=None, relabel=False)[source]¶ compute classification statistic between annotation and segmentation
- Parameters
- Returns
>>> np.random.seed(0) >>> annot = np.random.randint(0, 2, (5, 10)) >>> segm = np.random.randint(0, 2, (5, 10)) >>> d = compute_classif_stat_segm_annot((annot, annot, 'ttt'), relabel=True, ... drop_labels=[5]) >>> d['(FP+FN)/(TP+FN)'] 0.0 >>> d['(TP+FP)/(TP+FN)'] 1.0 >>> d = compute_classif_stat_segm_annot((annot, segm, 'ttt'), relabel=True, ... drop_labels=[5]) >>> d['(FP+FN)/(TP+FN)'] 0.846... >>> d['(TP+FP)/(TP+FN)'] 1.153... >>> d = compute_classif_stat_segm_annot((annot, segm + 1, 'ttt'), ... relabel=False, drop_labels=[0]) >>> d['confusion'] [[13, 17], [0, 0]]
-
imsegm.classification.
compute_metric_fpfn_tpfn
(annot, segm, label_positive=None)[source]¶ compute measure (FP + FN) / (TP + FN)
- Parameters
annot (ndarray) – annotation
segm (ndarray) – segmentation
label_positive (int) – indexes of positive labels
- Return float
>>> np.random.seed(0) >>> annot = np.random.randint(0, 2, (50, 75)) * 3 >>> segm = np.random.randint(0, 2, (50, 75)) * 3 >>> compute_metric_fpfn_tpfn(annot, segm) 1.02... >>> compute_metric_fpfn_tpfn(annot, annot) 0.0 >>> compute_metric_fpfn_tpfn(annot, np.ones((50, 75))) nan
-
imsegm.classification.
compute_metric_tpfp_tpfn
(annot, segm, label_positive=None)[source]¶ compute measure (TP + FP) / (TP + FN)
- Parameters
annot (ndarray) –
segm (ndarray) –
label_positive (int) –
- Return float
>>> np.random.seed(0) >>> annot = np.random.randint(0, 2, (50, 75)) * 3 >>> segm = np.random.randint(0, 2, (50, 75)) * 3 >>> compute_metric_tpfp_tpfn(annot, segm) 1.03... >>> compute_metric_tpfp_tpfn(annot, annot) 1.0 >>> compute_metric_tpfp_tpfn(annot, np.ones((50, 75))) nan >>> compute_metric_tpfp_tpfn(annot, np.zeros((50, 75))) 0.0
-
imsegm.classification.
compute_stat_per_image
(segms, annots, names=None, nb_workers=2, drop_labels=None, relabel=False)[source]¶ compute statistic over multiple segmentations with annotation
- Parameters
- Return DF
>>> np.random.seed(0) >>> img_true = np.random.randint(0, 3, (50, 100)) >>> img_pred = np.random.randint(0, 2, (50, 100)) >>> df = compute_stat_per_image([img_true], [img_true], nb_workers=2, ... relabel=True) >>> pd.Series(df.iloc[0]).sort_index() ARS 1 accuracy 1 confusion [[1672, 0, 0], [0, 1682, 0], [0, 0, 1646]] f1_macro 1 precision_macro 1 recall_macro 1 support_macro None Name: 0, dtype: object >>> df = compute_stat_per_image([img_true], [img_pred], drop_labels=[-1]) >>> pd.Series(df.iloc[0]).sort_index() ARS 0.0... accuracy 0.3384 confusion [[836, 826, 770], [836, 856, 876], [0, 0, 0]] f1_macro 0.270077 precision_macro 0.336306 recall_macro 0.225694 support_macro None Name: 0, dtype: object
-
imsegm.classification.
compute_tp_tn_fp_fn
(annot, segm, label_positive=None)[source]¶ compute measure TruePositive, TrueNegative, FalsePositive, FalseNegative
- Parameters
annot (ndarray) – annotation
segm (ndarray) – segmentation
label_positive (int) – indexes of positive labels
- Return tuple(float,float,float,float)
>>> np.random.seed(0) >>> annot = np.random.randint(0, 2, (5, 7)) * 9 >>> segm = np.random.randint(0, 2, (5, 7)) * 9 >>> annot - segm array([[-9, 9, 0, -9, 9, 9, 0], [ 9, 0, 0, 0, -9, -9, 9], [-9, 0, -9, -9, -9, 0, 0], [ 0, 9, 0, -9, 0, 9, 0], [ 9, -9, 9, 0, 9, 0, 9]]) >>> compute_tp_tn_fp_fn(annot, annot) (20, 15, 0, 0) >>> compute_tp_tn_fp_fn(annot, segm) (9, 5, 11, 10) >>> compute_tp_tn_fp_fn(annot, np.ones((5, 7))) (nan, nan, nan, nan) >>> compute_tp_tn_fp_fn(np.zeros((5, 7)), np.zeros((5, 7))) (35, 0, 0, 0)
-
imsegm.classification.
convert_dict_label_features_2_vectors
(dict_features)[source]¶ convert dictionary of features where key is the labels to vector of all features and related labels
- Parameters
{int – [list(float)]} dict_features: {int: [list(float) * nb_features] * nb_samples}
- Return tuple(ndarray,list(int))
np.array<nb_samples, nb_features>, list(int)
-
imsegm.classification.
convert_set_features_labels_2_dataset
(imgs_features, imgs_labels, drop_labels=None, balance_type=None)[source]¶ with dictionary for each image we concentrate all features over images and labels into simple form
- Parameters
- Return tuple(ndarray,ndarray,ndarray)
>>> np.random.seed(0) >>> d_fts = {'a': np.random.random((25, 3)), ... 'b': np.random.random((30, 3)), } >>> d_lbs = {'a': np.random.randint(0, 2, 25), ... 'b': np.random.randint(0, 2, 30)} >>> fts, lbs, sizes = convert_set_features_labels_2_dataset(d_fts, d_lbs) >>> fts.shape (55, 3) >>> lbs.shape (55,) >>> sizes [25, 30]
-
imsegm.classification.
create_classif_search
(name_clf, clf_pipeline, nb_labels, search_type='random', cross_val=10, eval_metric='f1', nb_iter=250, nb_workers=5)[source]¶ create sklearn search depending on spec. random or grid
- Parameters
nb_labels (int) – number of labels
search_type (str) – hyper-params search type
eval_metric (str) – evaluation metric
nb_iter (int) – for random number of tries
name_clf (str) – name of classif.
clf_pipeline (obj) – object
cross_val (obj) – obj specific CV for fix train-test
nb_workers (int) – number jobs running in parallel
- Returns
-
imsegm.classification.
create_classif_search_train_export
(clf_name, features, labels, cross_val=10, nb_search_iter=100, search_type='random', eval_metric='f1', nb_workers=1, path_out=None, params=None, pca_coef=0.98, feature_names=None, label_names=None)[source]¶ create classifier and train it once or find best parameters. whether tha path out is given export it for later use
- Parameters
clf_name (str) – name of selected classifier
features (ndarray) – features in dimension nb_samples x nb_features
cross_val (int|obj) – Cross validation
search_type (str) – search type
eval_metric (str) – evaluation metric
params (dict) – extra parameters
pca_coef (float) – sklearn PCA - int/float/None
nb_search_iter (int) – number of searcher for hyper-parameters
path_out (str) – path to directory for exporting classifier
nb_workers (int) – parallel processes
feature_names (list(str)) – list of extracted features - names
- Returns
(obj, str): classifier, path to the exported classifier
>>> np.random.seed(0) >>> lbs = np.random.randint(0, 3, 150) >>> fts = np.random.random((150, 5)) + np.tile(lbs, (5, 1)).T >>> _, _ = create_classif_search_train_export('LogistRegr', fts, lbs, nb_search_iter=0) >>> clf, p_clf = create_classif_search_train_export('AdaBoost', fts, lbs, ... nb_search_iter=2, path_out='', search_type='grid') Fitting ... >>> clf Pipeline(...) >>> clf, p_clf = create_classif_search_train_export('RandForest', fts, lbs, ... nb_search_iter=2, path_out='.', search_type='random') Fitting ... >>> clf Pipeline(...) >>> p_clf './classifier_RandForest.pkl' >>> os.remove(p_clf) >>> import glob >>> files = glob.glob(os.path.join('.', 'classif_*.txt')) >>> sorted(files) ['./classif_RandForest_search_params_best.txt', './classif_RandForest_search_params_scores.txt'] >>> for p in files: os.remove(p)
-
imsegm.classification.
create_classifiers
(nb_workers=-1)[source]¶ create all classifiers with default parameters
- Parameters
nb_workers (int) – number of parallel if possible
- Return dict
{str: clf}
>>> classifs = create_classifiers() >>> classifs {...} >>> sum([isinstance(create_clf_param_search_grid(k), dict) ... for k in classifs.keys()]) 7 >>> sum([isinstance(create_clf_param_search_distrib(k), dict) ... for k in classifs.keys()]) 7
-
imsegm.classification.
create_clf_param_search_distrib
(name_classif='RandForest')[source]¶ create parameter distribution for random search
- Parameters
name_classif (str) – key name of classifier
- Returns
dict
>>> create_clf_param_search_distrib() {...} >>> dict_classif = create_classifiers() >>> all(len(create_clf_param_search_distrib(k)) > 0 for k in dict_classif) True >>> create_clf_param_search_distrib('none') {}
-
imsegm.classification.
create_clf_param_search_grid
(name_classif='RandForest')[source]¶ create parameter grid for search
- Parameters
name_classif (str) – key name of selected classifier
- Returns
dict
>>> create_clf_param_search_grid('RandForest') {'classif__...': ...} >>> dict_classif = create_classifiers() >>> all(len(create_clf_param_search_grid(k)) > 0 for k in dict_classif) True >>> create_clf_param_search_grid('none') {}
-
imsegm.classification.
create_clf_pipeline
(name_classif='RandForest', pca_coef=0.95)[source]¶ create complete pipeline with all required steps
- Parameters
pca_coef (int|float|None) – sklearn PCA
name_classif (str) – key name of classif.
- Returns
object
>>> create_clf_pipeline() Pipeline(...)
-
imsegm.classification.
create_pipeline_neuron_net
()[source]¶ create classifier for simple neuronal network
- Returns
clf
>>> create_pipeline_neuron_net() Pipeline(...)
-
imsegm.classification.
down_sample_dict_features_kmean
(dict_features, nb_samples)[source]¶ cluser with kmeans the features with nb cluster == given nb_samples and the retirn features which are closer to each cluster center
- Parameters
- Return dict
{int: [list(float) * nb_features] * nb_samples}
>>> np.random.seed(0) >>> d_fts = {'a': np.random.random((100, 3))} >>> d_fts = down_sample_dict_features_kmean(d_fts, 5) >>> d_fts['a'].shape (5, 3)
-
imsegm.classification.
down_sample_dict_features_random
(dict_features, nb_samples)[source]¶ browse all label features and take random subset of features to have given nb_samples per class
- Parameters
- Return dict
{int: [list(float) * nb_features] * nb_samples}
>>> np.random.seed(0) >>> d_fts = {'a': np.random.random((100, 3))} >>> d_fts = down_sample_dict_features_random(d_fts, 5) >>> d_fts['a'].shape (5, 3)
-
imsegm.classification.
down_sample_dict_features_unique
(dict_features)[source]¶ browse all label features and take unique features
- Parameters
dict_features (dict) – {int: [list(float) * nb_features] * nb_samples}
- Return dict
{int: [list(float) * nb_features] * nb}
>>> np.random.seed(0) >>> d_fts = {'a': np.random.random((100, 3))} >>> d_fts = down_sample_dict_features_unique(d_fts) >>> d_fts['a'].shape (100, 3)
-
imsegm.classification.
eval_classif_cross_val_roc
(clf_name, classif, features, labels, cross_val, path_out=None, nb_steps=100)[source]¶ compute mean ROC curve on cross-validation schema
http://scikit-learn.org/0.15/auto_examples/plot_roc_crossval.html
- Parameters
- Returns
>>> np.random.seed(0) >>> labels = np.array([0] * 150 + [1] * 100 + [3] * 50) >>> data = np.tile(labels, (6, 1)).T.astype(float) >>> data += np.random.random(data.shape) >>> data.shape (300, 6) >>> from sklearn.model_selection import StratifiedKFold >>> cv = StratifiedKFold(n_splits=5, random_state=0) >>> classif = create_classifiers()[DEFAULT_CLASSIF_NAME] >>> fp_tp, auc = eval_classif_cross_val_roc(DEFAULT_CLASSIF_NAME, classif, ... data, labels, cv, nb_steps=10) >>> fp_tp FP TP 0 0.000000 0.0 1 0.111111 1.0 2 0.222222 1.0 3 0.333333 1.0 4 0.444444 1.0 5 0.555556 1.0 6 0.666667 1.0 7 0.777778 1.0 8 0.888889 1.0 9 1.000000 1.0 >>> auc 0.94444444444444442 >>> labels[-50:] -= 1 >>> data[-50:, :] -= 1 >>> path_out = 'temp_eval-cv-roc' >>> os.mkdir(path_out) >>> fp_tp, auc = eval_classif_cross_val_roc(DEFAULT_CLASSIF_NAME, classif, ... data, labels, cv, nb_steps=5, path_out=path_out) >>> fp_tp FP TP 0 0.00 0.0 1 0.25 1.0 2 0.50 1.0 3 0.75 1.0 4 1.00 1.0 >>> auc 0.875 >>> import shutil >>> shutil.rmtree(path_out, ignore_errors=True)
-
imsegm.classification.
eval_classif_cross_val_scores
(clf_name, classif, features, labels, cross_val=10, path_out=None, scorings=('f1_macro', 'accuracy', 'precision_macro', 'recall_macro'))[source]¶ compute statistic on cross-validation schema
http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.cross_val_score.html
- Parameters
- Return DF
>>> labels = np.array([0] * 150 + [1] * 100 + [2] * 50) >>> data = np.tile(labels, (6, 1)).T.astype(float) >>> data += 0.5 - np.random.random(data.shape) >>> data.shape (300, 6) >>> from sklearn.model_selection import StratifiedKFold >>> cv = StratifiedKFold(n_splits=5, random_state=0) >>> classif = create_classifiers()[DEFAULT_CLASSIF_NAME] >>> eval_classif_cross_val_scores(DEFAULT_CLASSIF_NAME, classif, ... data, labels, cv) f1_macro accuracy precision_macro recall_macro 0 1.0 1.0 1.0 1.0 1 1.0 1.0 1.0 1.0 2 1.0 1.0 1.0 1.0 3 1.0 1.0 1.0 1.0 4 1.0 1.0 1.0 1.0 >>> labels[labels == 1] = 2 >>> cv = StratifiedKFold(n_splits=3, random_state=0) >>> eval_classif_cross_val_scores(DEFAULT_CLASSIF_NAME, classif, ... data, labels, cv, path_out='.') f1_macro accuracy precision_macro recall_macro 0 1.0 1.0 1.0 1.0 1 1.0 1.0 1.0 1.0 2 1.0 1.0 1.0 1.0 >>> import glob >>> p_files = glob.glob(NAME_CSV_CLASSIF_CV_SCORES.replace('{}', '*')) >>> sorted(p_files) ['classif_RandForest_cross-val_scores-all-folds.csv', 'classif_RandForest_cross-val_scores-statistic.csv'] >>> [os.remove(p) for p in p_files] [...]
-
imsegm.classification.
export_results_clf_search
(path_out, clf_name, clf_search)[source]¶ do the final testing and save all results
-
imsegm.classification.
feature_scoring_selection
(features, labels, names=None, path_out='')[source]¶ find the best features and retrun the indexes http://scikit-learn.org/stable/auto_examples/linear_model/plot_sparse_recovery.html http://scikit-learn.org/stable/auto_examples/feature_selection/plot_feature_selection.html
- Parameters
- Return tuple(list(int),DF)
indices, Dataframe with scoring
>>> from sklearn.datasets import make_classification >>> features, labels = make_classification(n_samples=250, n_features=5, ... n_informative=3, n_redundant=0, ... n_repeated=0, n_classes=2, ... random_state=0, shuffle=False) >>> indices, df_scoring = feature_scoring_selection(features, labels) >>> indices array([1, 0, 2, 3, 4]) >>> df_scoring ExtTree F-test k-Best variance feature 1 0.248465 0.755881 0.755881 2.495970 2 0.330818 58.944450 58.944450 1.851036 3 0.221636 2.242583 2.242583 1.541042 4 0.106441 4.022076 4.022076 0.965971 5 0.092639 0.022651 0.022651 1.016170 >>> features[:, 2] = 1 >>> path_out = 'test_fts-select' >>> os.mkdir(path_out) >>> indices, df_scoring = feature_scoring_selection(features.tolist(), labels.tolist(), ... path_out=path_out) >>> indices array([1, 0, 3, 4, 2]) >>> import shutil >>> shutil.rmtree(path_out, ignore_errors=True)
-
imsegm.classification.
load_classifier
(path_classif)[source]¶ estimate classifier for all data and export it
- Parameters
path_classif (str) – path to the exported classifier
- Return dict
>>> load_classifier('none.abc')
-
imsegm.classification.
relabel_sequential
(labels, uq_labels=None)[source]¶ relabel sequential vector staring from 0
>>> relabel_sequential([0, 0, 0, 5, 5, 5, 0, 5]) [0, 0, 0, 1, 1, 1, 0, 1]
-
imsegm.classification.
save_classifier
(path_out, classif, clf_name, params, feature_names=None, label_names=None)[source]¶ estimate classif for all data and export it
- Parameters
- Return str
>>> clf = create_classifiers()['RandForest'] >>> p_clf = save_classifier('.', clf, 'TESTINNG', {}) >>> p_clf './classifier_TESTINNG.pkl' >>> d_clf = load_classifier(p_clf) >>> sorted(d_clf.keys()) ['clf_pipeline', 'features', 'label_names', 'name', 'params'] >>> d_clf['clf_pipeline'] RandomForestClassifier(...) >>> d_clf['name'] 'TESTINNG' >>> os.remove(p_clf)
-
imsegm.classification.
search_params_cut_down_max_nb_iter
(clf_parameters, nb_iter)[source]¶ create parameters list and count number of possible combination in case they are they are limited
- Parameters
- Return int
>>> clf_params = create_clf_param_search_grid(DEFAULT_CLASSIF_NAME) >>> search_params_cut_down_max_nb_iter(clf_params, 100) 100 >>> search_params_cut_down_max_nb_iter(clf_params, 1e6) 1450
-
imsegm.classification.
shuffle_features_labels
(features, labels)[source]¶ take the set of features and labels and shuffle them together while keeping link between feature and its label
- Parameters
- Returns
np.array<nb_samples, nb_features>, np.array<nb_samples>
>>> np.random.seed(0) >>> fts = np.random.random((5, 2)) >>> lbs = np.random.randint(0, 2, 5) >>> fts_new, lbs_new = shuffle_features_labels(fts, lbs) >>> np.array_equal(fts, fts_new) False >>> np.array_equal(lbs, lbs_new) False
-
imsegm.classification.
unique_rows
(data)[source]¶ with matrix detect unique row and return only them
- Parameters
data (ndarray) – np.array
- Return ndarray
np.array
-
imsegm.classification.
DEFAULT_CLASSIF_NAME
= 'RandForest'[source]¶ default (recommended) classifier for supervised segmentation
-
imsegm.classification.
DEFAULT_CLUSTERING
= 'kMeans'[source]¶ default (recommended) clustering for unsupervised segmentation
-
imsegm.classification.
DICT_SCORING
= {'accuracy': sklearn.metrics.accuracy_score, 'f1': sklearn.metrics.f1_score, 'precision': sklearn.metrics.precision_score, 'recall': sklearn.metrics.recall_score}[source]¶ mapping of metrics names to used functions
-
imsegm.classification.
METRIC_AVERAGES
= ('macro', 'weighted')[source]¶ default types of computed metrics
-
imsegm.classification.
METRIC_SCORING
= ('f1_macro', 'accuracy', 'precision_macro', 'recall_macro')[source]¶ default computed metrics
-
imsegm.classification.
NAME_CSV_CLASSIF_CV_ROC
= 'classif_{}_cross-val_ROC-{}.csv'[source]¶ exporting partial results about trained classifier - Receiver Operating Characteristics
-
imsegm.classification.
NAME_CSV_CLASSIF_CV_SCORES
= 'classif_{}_cross-val_scores-{}.csv'[source]¶ exporting partial results about trained classifier
-
imsegm.classification.
NAME_CSV_FEATURES_SELECT
= 'feature_selection.csv'[source]¶ file name of exported evaluation on feature quality
-
imsegm.classification.
NAME_TXT_CLASSIF_CV_AUC
= 'classif_{}_cross-val_AUC-{}.txt'[source]¶ exporting partial results about trained classifier - Area Under Curve