--- title: "Introduction to schtools" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Introduction to schtools} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} schtools::set_knitr_opts() knitr::opts_chunk$set( collapse = TRUE, echo = TRUE, comment = "#>" ) ``` ```{r deps} library(schtools) library(dplyr) library(ggplot2) library(readr) library(tidyr) ``` ## Handling mothur data ### Calculate relative abundances You can read a shared file and calculate relative abundances with `calc_relabun()`: ```{r calc_relabun} shared_dat <- read_tsv(system.file("extdata", "test.shared", package = "schtools" )) relabun_dat <- shared_dat %>% calc_relabun() head(relabun_dat) ``` `calc_relabun()` returns the data frame in long format. You can use `tidyr::pivot_wider()` to convert it to wide format: ```{r pivot_wider} wide_dat <- relabun_dat %>% pivot_wider(names_from = "otu", values_from = "rel_abun") head(wide_dat) ``` You can see that the relative abundances for each sample sum to 1: ```{r sum1} wide_dat %>% select(starts_with("Otu")) %>% rowSums() ``` ### Taxonomy files mothur formats taxonomy files as tab-separated values (tsv). You can use `read_tax()` to parse the taxonomy data and create separate columns for each taxonomic level. ```{r read_tax} tax_dat <- read_tax(system.file("extdata", "test.taxonomy", package = "schtools" )) head(tax_dat) ``` The column `label_html` provides html that correctly italicizes the genus name without italicizing the OTU label. This can be used with `ggtext::element_markdown()` to make nice plots: ```{r italic-genus} library(ggtext) set.seed(20220427) relabun_dat %>% mutate( sample_num = stringr::str_remove(sample, "p") %>% as.integer(), treatment = case_when( sample_num %% 2 == 1 ~ "A", TRUE ~ "B" ) ) %>% inner_join(tax_dat, by = "otu") %>% ggplot(aes(x = rel_abun, y = label_html, color = treatment)) + geom_jitter(alpha = 0.7, height = 0.2) + labs(x = "Relative abundance", y = "") + theme_minimal() + theme(axis.text.y = element_markdown()) ``` #### Pooling OTU counts at different taxonomic levels A common task is to repeat OTU-level analyses at different taxonomic levels to determine [which resolution is optimal for answering your questions](https://doi.org/10.1128/mbio.03161-21). You'll need a shared file, generated from clustering sequences into OTUs with mothur, and a corresponding taxonomy file. Take a look at the [mothur documentation](https://mothur.org/wiki/) for info on generating these files and performing microbiome analyses. In this example, `pool_taxon_counts()` pools the OTU counts in the shared file at the genus level and returns new shared and taxonomy data frames. ```{r pool_genus} tax_dat <- read_tax(system.file("extdata", "test.taxonomy", package = "schtools" )) shared_dat <- readr::read_tsv(system.file("extdata", "test.shared", package = "schtools" )) pool_taxon_counts(shared_dat, tax_dat, "genus") ``` You can do this for any taxonomic level in your taxonomy data frame. ```{r pool_phylum} pool_taxon_counts(shared_dat, tax_dat, "phylum") ``` ### Distance files If you have a distance file saved as a phylip-formatted lower triangle matrix from mothur's [`dist.seqs`](https://mothur.org/wiki/dist.seqs/) command, you can read it into R with `read_dist()`: ```{r read_dist} dist_filepath <- system.file("extdata", "sample.final.thetayc.0.03.lt.ave.dist", package = "schtools" ) dist_tbl <- read_dist(dist_filepath) head(dist_tbl) ``` ## R Markdown helpers for scientific writing When writing scientific papers with R Markdown, we often find ourselves using the same knitr chunk options and miscellaneous helper functions. To use our favorite options like `eval=TRUE`, `echo=FALSE`, and others, run `set_knitr_opts()` in the first chunk of your R Markdown document: ````markdown `r ''````{r, include = FALSE} set_knitr_opts() ``` ```` This also sets the inline hook to our custom `inline_hook()` function, which automatically formats numbers in a human-readable way and inserts an Oxford comma into lists when needed. ### Who doesn't love an Oxford comma? When writing with R Markdown, you may wish to insert a list or vector inline and correctly format it with an Oxford comma. `inline_hook()` uses `paste_oxford_list()` to help you do just that! ```{r oxford} animals <- c("cats", "dogs", "fish") ``` Insert the string as inline code with `` `r ` ``: > `` `r "\u0060r animals\u0060"` `` are the most common pets. Rendered output: > `r paste_oxford_list(animals)` are the most common pets. ### Human-readable numbers `inline_hook()` uses `format_numbers()` under the hood to automatically format numbers to a human-readable format, rather than display in scientific notation. > The numbers `` `r "\u0060r c(1e-04, 1e-05, 1e-06)\u0060"` `` are very precise, > while `` `r "\u0060r c(1e04, 1e05, 1e06)\u0060"` `` are very large. Rendered output: > The numbers `r inline_hook(c(1e-04, 1e-05, 1e-06))` are very precise. > while `r inline_hook(c(1e04, 1e05, 1e06))` are very large.