Package: mikropml 1.6.1.9000

Kelly Sovacool

mikropml: User-Friendly R Package for Supervised Machine Learning Pipelines

An interface to build machine learning models for classification and regression problems. 'mikropml' implements the ML pipeline described by Topçuoğlu et al. (2020) <doi:10.1128/mBio.00434-20> with reasonable default options for data preprocessing, hyperparameter tuning, cross-validation, testing, model evaluation, and interpretation steps. See the website <https://www.schlosslab.org/mikropml/> for more information, documentation, and examples.

Authors:Begüm Topçuoğlu [aut], Zena Lapp [aut], Kelly Sovacool [aut, cre], Evan Snitkin [aut], Jenna Wiens [aut], Patrick Schloss [aut], Nick Lesniak [ctb], Courtney Armour [ctb], Sarah Lucas [ctb]

mikropml_1.6.1.9000.tar.gz
mikropml_1.6.1.9000.zip(r-4.5)mikropml_1.6.1.9000.zip(r-4.4)mikropml_1.6.1.9000.zip(r-4.3)
mikropml_1.6.1.9000.tgz(r-4.5-any)mikropml_1.6.1.9000.tgz(r-4.4-any)mikropml_1.6.1.9000.tgz(r-4.3-any)
mikropml_1.6.1.9000.tar.gz(r-4.5-noble)mikropml_1.6.1.9000.tar.gz(r-4.4-noble)
mikropml_1.6.1.9000.tgz(r-4.4-emscripten)mikropml_1.6.1.9000.tgz(r-4.3-emscripten)
mikropml.pdf |mikropml.html✨
mikropml/json (API)
NEWS

# Install 'mikropml' in R:

install.packages('mikropml', repos = c('https://schlosslab.r-universe.dev', 'https://cloud.r-project.org'))

Bug tracker:https://github.com/schlosslab/mikropml/issues

Pkgdown site:https://www.schlosslab.org

Datasets:

otu_data_preproc - Mini OTU abundance dataset - preprocessed
otu_mini_bin - Mini OTU abundance dataset
otu_mini_bin_results_glmnet - Results from running the pipeline with L2 logistic regression on 'otu_mini_bin' with feature importance and grouping
otu_mini_bin_results_rf - Results from running the pipeline with random forest on 'otu_mini_bin'
otu_mini_bin_results_rpart2 - Results from running the pipeline with rpart2 on 'otu_mini_bin'
otu_mini_bin_results_svmRadial - Results from running the pipeline with svmRadial on 'otu_mini_bin'
otu_mini_bin_results_xgbTree - Results from running the pipeline with xbgTree on 'otu_mini_bin'
otu_mini_cont_results_glmnet - Results from running the pipeline with glmnet on 'otu_mini_bin' with 'Otu00001' as the outcome
otu_mini_cont_results_nocv - Results from running the pipeline with glmnet on 'otu_mini_bin' with 'Otu00001' as the outcome column, using a custom train control scheme that does not perform cross-validation
otu_mini_cv - Cross validation on 'train_data_mini' with grouped features.
otu_mini_multi - Mini OTU abundance dataset with 3 categorical variables
otu_mini_multi_group - Groups for otu_mini_multi
otu_mini_multi_results_glmnet - Results from running the pipeline with glmnet on 'otu_mini_multi' for multiclass outcomes
otu_small - Small OTU abundance dataset

On CRAN:

machine-learning

7.83 score 56 stars 86 scripts 550 downloads 1 mentions 39 exports 88 dependencies

Last updated 2 years agofrom:77669ee3fb. Checks:3 OK, 6 NOTE. Indexed: yes.

Target	Result	Latest binary
Doc / Vignettes	OK	Mar 15 2025
R-4.5-win	NOTE	Mar 15 2025
R-4.5-mac	NOTE	Mar 15 2025
R-4.5-linux	NOTE	Mar 15 2025
R-4.4-win	NOTE	Mar 15 2025
R-4.4-mac	NOTE	Mar 15 2025
R-4.4-linux	NOTE	Mar 15 2025
R-4.3-win	OK	Mar 15 2025
R-4.3-mac	OK	Mar 15 2025

Exports::=!!.data %>%bootstrap_performance calc_balanced_precision calc_baseline_precision calc_mean_perf calc_mean_prc calc_mean_roc calc_model_sensspec calc_perf_metrics combine_hp_performance compare_models contr.ltfr define_cv get_caret_processed_df get_feature_importance get_hp_performance get_hyperparams_list get_outcome_type get_partition_indices get_perf_metric_fn get_perf_metric_name get_performance_tbl get_tuning_grid group_correlated_features permute_p_value plot_hp_performance plot_mean_prc plot_mean_roc plot_model_performance preprocess_data randomize_feature_order remove_singleton_columns replace_spaces run_ml tidy_perf_data train_model

Dependencies:bitops caret caTools class cli clock codetools colorspace cpp11 data.table diagram digest dplyr e1071 fansi farver foreach future future.apply generics ggplot2 glmnet globals glue gower gplots gtable gtools hardhat ipred isoband iterators jsonlite kernlab KernSmooth labeling lattice lava lifecycle listenv lubridate magrittr MASS Matrix mgcv MLmetrics ModelMetrics munsell nlme nnet numDeriv parallelly pillar pkgconfig plyr pROC prodlim progressr proxy purrr R6 randomForest RColorBrewer Rcpp RcppEigen recipes reshape2 rlang ROCR rpart scales shape sparsevctrs SQUAREM stringi stringr survival tibble tidyr tidyselect timechange timeDate tzdb utf8 vctrs viridisLite withr xgboost

Introduction to mikropml

Zena Lapp

Rendered fromintroduction.Rmdusingknitr::rmarkdownon Mar 15 2025.

Last update: 2023-02-15
Started: 2020-07-01

mikropml: User-Friendly R Package for Supervised Machine Learning Pipelines

Begüm D. Topçuoğlu, Zena Lapp, Kelly L. Sovacool, Evan Snitkin, Jenna Wiens, Patrick D. Schloss

Rendered frompaper.Rmdusingknitr::rmarkdownon Mar 15 2025.

Last update: 2022-11-01
Started: 2020-10-15

Help page	Topics
Calculate a bootstrap confidence interval for the performance on a single train/test split	bootstrap_performance
Calculate balanced precision given actual and baseline precision	calc_balanced_precision
Calculate the fraction of positives, i.e. baseline precision for a PRC curve	calc_baseline_precision
Generic function to calculate mean performance curves for multiple models	calc_mean_perf
Calculate and summarize performance for ROC and PRC plots	calc_mean_prc calc_mean_roc calc_model_sensspec sensspec
Get performance metrics for test data	calc_perf_metrics
Combine hyperparameter performance metrics for multiple train/test splits	combine_hp_performance
Perform permutation tests to compare the performance metric across all pairs of a group variable.	compare_models
Define cross-validation scheme and training parameters	define_cv
Get preprocessed dataframe for continuous variables	get_caret_processed_df
Get feature importance using the permutation method	get_feature_importance
Get hyperparameter performance metrics	get_hp_performance
Set hyperparameters based on ML method and dataset characteristics	get_hyperparams_list
Get outcome type.	get_outcome_type
Select indices to partition the data into training & testing sets.	get_partition_indices
Get default performance metric function	get_perf_metric_fn
Get default performance metric name	get_perf_metric_name
Get model performance metrics as a one-row tibble	get_performance_tbl
Generate the tuning grid for tuning hyperparameters	get_tuning_grid
Group correlated features	group_correlated_features
Mini OTU abundance dataset - preprocessed	otu_data_preproc
Mini OTU abundance dataset	otu_mini_bin
Results from running the pipeline with L2 logistic regression on 'otu_mini_bin' with feature importance and grouping	otu_mini_bin_results_glmnet
Results from running the pipeline with random forest on 'otu_mini_bin'	otu_mini_bin_results_rf
Results from running the pipeline with rpart2 on 'otu_mini_bin'	otu_mini_bin_results_rpart2
Results from running the pipeline with svmRadial on 'otu_mini_bin'	otu_mini_bin_results_svmRadial
Results from running the pipeline with xbgTree on 'otu_mini_bin'	otu_mini_bin_results_xgbTree
Results from running the pipeline with glmnet on 'otu_mini_bin' with 'Otu00001' as the outcome	otu_mini_cont_results_glmnet
Results from running the pipeline with glmnet on 'otu_mini_bin' with 'Otu00001' as the outcome column, using a custom train control scheme that does not perform cross-validation	otu_mini_cont_results_nocv
Cross validation on 'train_data_mini' with grouped features.	otu_mini_cv
Mini OTU abundance dataset with 3 categorical variables	otu_mini_multi
Groups for otu_mini_multi	otu_mini_multi_group
Results from running the pipeline with glmnet on 'otu_mini_multi' for multiclass outcomes	otu_mini_multi_results_glmnet
Small OTU abundance dataset	otu_small
Calculated a permuted p-value comparing two models	permute_p_value
Plot hyperparameter performance metrics	plot_hp_performance
Plot ROC and PRC curves	plot_curves plot_mean_prc plot_mean_roc
Plot performance metrics for multiple ML runs with different parameters	plot_model_performance
Preprocess data prior to running machine learning	preprocess_data
Randomize feature order to eliminate any position-dependent effects	randomize_feature_order
Remove columns appearing in only 'threshold' row(s) or fewer.	remove_singleton_columns
Replace spaces in all elements of a character vector with underscores	replace_spaces
Run the machine learning pipeline	run_ml
Tidy the performance dataframe	tidy_perf_data
Train model using 'caret::train()'.	train_model

Package: mikropml 1.6.1.9000

mikropml: User-Friendly R Package for Supervised Machine Learning Pipelines

Introduction to mikropml

mikropml: User-Friendly R Package for Supervised Machine Learning Pipelines

Citation

Development and contributors

Readme and manuals

Help Manual

Usage by other packages (reverse dependencies)