bootstrap_performance()
allows you to calculate confidence
intervals for the model performance from a single train/test split by
bootstrapping the test set (#329, @kelly-sovacool).calc_balanced_precision()
allows you to calculate balanced
precision and balanced area under the precision-recall curve
(#333, @kelly-sovacool).find_feature_importance()
(#326, @kelly-sovacool).
names
to feat
to represent each feature or group of correlated features.lower
and upper
to report the bounds of the empirical 95% confidence interval from the permutation test.
See vignette('parallel')
for an example of plotting feature importance with confidence intervals.parallel
vignette (#310, @kelly-sovacool).parRF
, a parallel implementation of the rf
method, with
the same default hyperparameters as rf
set automatically (#306, @kelly-sovacool).calc_model_sensspec()
- calculate sensitivity, specificity, and precision for a model.calc_mean_roc()
& plot_mean_roc()
- calculate & plot specificity and mean sensitivity for multiple models.calc_mean_prc()
& plot_mean_prc()
- calculate & plot recall and mean precision for multiple models.run_ml()
are now forwarded to caret::train()
(#304, @kelly-sovacool).
weights
) to caret::train()
, allowing greater flexibility.compare_models()
compares the performance of two models with a permutation test (#295, @courtneyarmour).cv_times
did not affect the reported repeats for cross-validation (#291, @kelly-sovacool).This minor patch fixes a test failure on platforms with no long doubles. The actual package code remains unchanged.
kfold >= length(groups)
(#285, @kelly-sovacool).
kfold
<= the number of groups in the training set. Previously, an error was thrown if this condition was not met. Now, if there are not enough groups in the training set for groups to be kept together during CV, groups are allowed to be split up across CV partitions.cross_val
added to run_ml()
allows users to define their own custom cross-validation scheme (#278, @kelly-sovacool).
calculate_performance
, which controls whether performance metrics are calculated (default: TRUE
). Users may wish to skip performance calculations when training models with no cross-validation.group_partitions
added to run_ml()
allows users to control which groups should go to which partition of the train/test split (#281, @kelly-sovacool).training_frac
parameter in run_ml()
(#281, @kelly-sovacool).
training_frac
is a fraction between 0 and 1 that specifies how much of the dataset should be used in the training fraction of the train/test split.training_frac
a vector of indices that correspond to which rows of the dataset should go in the training fraction of the train/test split. This gives users direct control over exactly which observations are in the training fraction if desired.group_correlated_features()
is now a user-facing function.stats::cor
with the corr_method
parameter: get_feature_importance(corr_method = "pearson")
preprocess_data()
converted the outcome column to a character vector (#273, @kelly-sovacool, @ecmaggioncalda).preprocess_data()
: prefilter_threshold
(#240, @kelly-sovacool, @courtneyarmour).
prefilter_threshold
or fewer rows in the data.remove_singleton_columns()
called by preprocess_data()
to carry this out.get_feature_importance()
: groups
(#246, @kelly-sovacool).
groups
is NULL
by default; in this case, correlated features above corr_thresh
are grouped together.preprocess_data()
now replaces spaces in the outcome column with underscores (#247, @kelly-sovacool, @JonnyTran).preprocess_data()
and get_feature_importance()
using the progressr package (#257, @kelly-sovacool, @JonnyTran, @FedericoComoglio).stringsAsFactors
behavior.rpart
from Suggests to Imports for consistency with other packages used during model training.This is the first release version of mikropml! 🎉
NEWS.md
file to track changes to the package.run_ml()
preprocess_data()
plot_model_performance()
plot_hp_performance()
run_ml()
:
glmnet
: logistic and linear regressionrf
: random forestrpart2
: decision treessvmRadial
: support vector machinesxgbTree
: gradient-boosted trees