class: center, middle, inverse, title-slide # Regression for Microbiome Data:Moving From Diversity to Inference ## 📚EPID 674📚 ### Brendan J. Kelly, MD, MS ### Updated: 18 June 2020 --- background-image: url(data:image/png;base64,#svg/geompoint.svg) background-size: 500px background-position: 85% 50% class: middle, inverse .pad-left[ ### Review α-diversity ### Review β-diversity ### α/β-diversity ⇾ linear regression ### β-diversity ⇾ PERMANOVA ### Compositional data? ] --- background-image: url(data:image/png;base64,#svg/geompoint.svg) background-size: 500px background-position: 85% 50% class: center, middle, inverse # α-diversity --- # High Dimensional Microbiome Data .center[ ``` ## 700013549 700014386 700014403 700014409 700014412 700014415 ## OTU_97.1 0 0 0 0 0 0 ## OTU_97.10 0 0 6 4 1 5 ## OTU_97.100 0 0 133 7 1 4 ## OTU_97.1000 0 0 0 0 0 0 ## OTU_97.10000 0 0 0 0 0 0 ## OTU_97.10001 0 0 0 0 0 1 ## OTU_97.10002 0 0 0 0 0 0 ## OTU_97.10003 0 0 0 0 0 0 ## OTU_97.10004 0 0 0 0 0 0 ## OTU_97.10005 0 0 0 0 0 0 ## OTU_97.10006 0 0 0 0 0 0 ## OTU_97.10007 0 0 0 0 0 0 ## OTU_97.10008 0 1 0 0 0 0 ## OTU_97.10009 0 0 1 0 0 0 ## OTU_97.1001 0 0 0 0 0 0 ## OTU_97.10010 0 0 0 0 0 0 ``` ] --- # High Dimensional Microbiome Data .pad-left[ - How to deal with high-dimensional microbiome data? - __Descriptive (e.g., heatmaps and stacked barplots)__ - Test a priori hypotheses regarding specific OTUs/taxa - Reduce dimensions: - single summary statistic (alpha diversity) - pairwise distances (beta diversity) with PCoA or PERMANOVA - community types (mixture modeling) ] --- background-image: url(data:image/png;base64,#img/hmp_heatmap.png) background-size: contain --- # High Dimensional Microbiome Data .pad-left[ - How to deal with high-dimensional microbiome data? - Descriptive (e.g., heatmaps and stacked barplots) - Test a priori hypotheses regarding specific OTUs/taxa - __Reduce dimensions:__ - __single summary statistic (alpha diversity)__ - pairwise distances (beta diversity) with PCoA or PERMANOVA - community types (mixture modeling) ] --- # Shannon Diversity .pad-left[ - __Richness__ & __evenness__ - Shannon diversity: `$$H' = - \sum{ p_{i} * \log_{b}{(p_{i})} }$$` - "The uncertainty contained in a probability distribution is the average log-probability of an event." (McElreath _Statistical Rethinking, 2nd_ 2020) ] --- background-image: url(data:image/png;base64,#img/hmp_shannon.png) background-size: contain --- background-image: url(data:image/png;base64,#svg/geompoint.svg) background-size: 500px background-position: 85% 50% class: center, middle, inverse # β-diversity --- # High Dimensional Microbiome Data .pad-left[ - How to deal with high-dimensional microbiome data? - Descriptive (e.g., heatmaps and stacked barplots) - Test a priori hypotheses regarding specific OTUs/taxa - __Reduce dimensions:__ - single summary statistic (alpha diversity) - __pairwise distances (beta diversity) with PCoA or PERMANOVA__ - community types (mixture modeling) ] --- # Beta Diversity as Dimension Reduction .pad-left[ - Summarize each sample’s relationship to other samples: - pairwise distances - OTU table → square matrix - Many beta diversity metrics: - just counts versus counts + phylogeny - weighted versus unweighted ] --- background-image: url(data:image/png;base64,#img/legendre_otu_to_dm.png) background-size: contain --- # Distance Metrics for Beta Diversity .pad-left[ - Just counts versus counts + phylogeny: - Jaccard: `\(J(A,B) = \frac{A∩B}{A∪B}\)` & `\(d_{J}(A,B) = 1 - J(A,B)\)` - UniFrac: fraction of unique branch length in tree - Weighted versus unweighted: - weighted: counts matter - unweighted: binary (presence-absence) ] --- # Pairwise Distances ⇾ PCoA .pad-left[ - PCoA: principal coordinate analysis - any metric distance, even if non-Euclidean - like PCA, eigenvalue decomposition (maximum variance) but mediated by distance function (no original descriptors) - unlike PCA, does not allow projection of original descriptors in reduced-dimension space ] --- background-image: url(data:image/png;base64,#img/weighted_unifrac.png) background-size: contain --- background-image: url(data:image/png;base64,#img/within_vs_between_group.png) background-size: contain --- background-image: url(data:image/png;base64,#img/anderson_adonis.png) background-size: contain --- background-image: url(data:image/png;base64,#img/kelly_bioinformatics_highlight.png) background-size: contain --- background-image: url(data:image/png;base64,#svg/geompoint.svg) background-size: 500px background-position: 85% 50% class: center, middle, inverse # Linear regression with α/β-diversity --- # Linear Regression with `lm()` .pull-left[ ```r # install.packages("tidyverse") library(tidyverse) # install.packages("vegan") library(vegan) # install.packages("ape") library(ape) *set.seed(16) otu_tab <- read_rds( "./data/HMP_OTU_table_matrix_stool_nares.rds" ) otu_tab %>% str(vec.len = 3) ``` ] .pull-right[ ``` ## num [1:43140, 1:10] 0 0 0 0 0 0 0 0 ... ## - attr(*, "dimnames")=List of 2 ## ..$ : chr [1:43140] "OTU_97.1" "OTU_97.10" "OTU_97.100" ... ## ..$ : chr [1:10] "700014718" "700014767" "700014923" ... ``` ] --- # Linear Regression with `lm()` .pull-left[ ```r otu_tab %>% as_tibble(rownames = "otu_id") %>% gather(key = "specimen_id", value = "read_count", -otu_id) %>% distinct() -> otu_long otu_long ``` ] .pull-right[ ``` ## # A tibble: 431,400 x 3 ## otu_id specimen_id read_count ## <chr> <chr> <dbl> ## 1 OTU_97.1 700014718 0 ## 2 OTU_97.10 700014718 0 ## 3 OTU_97.100 700014718 0 ## 4 OTU_97.1000 700014718 0 ## 5 OTU_97.10000 700014718 0 ## 6 OTU_97.10001 700014718 0 ## 7 OTU_97.10002 700014718 0 ## 8 OTU_97.10003 700014718 0 ## 9 OTU_97.10004 700014718 0 ## 10 OTU_97.10005 700014718 0 ## # … with 431,390 more rows ``` ] --- # Linear Regression with `lm()` .pull-left[ ```r read_tsv(file = "./data/v13_map_uniquebyPSN.txt.bz2") %>% rename_all(.funs = ~ gsub("#","",tolower(.x))) %>% rename(specimen_id = sampleid) %>% distinct() -> specimen_data specimen_data %>% group_by(hmpbodysubsite) %>% * mutate(dummy_variable_site = * rnorm(n = length(hmpbodysubsite), * mean = nchar(unique(hmpbodysubsite)), * sd = 0.5)) %>% ungroup() %>% * filter(hmpbodysubsite %in% * c("Anterior_nares","Stool")) %>% select(specimen_id, hmpbodysubsite, dummy_variable_site) %>% mutate(specimen_id = as.character(specimen_id)) %>% distinct() -> specimen_data specimen_data ``` ] .pull-right[ ``` ## # A tibble: 361 x 3 ## specimen_id hmpbodysubsite dummy_variable_site ## <chr> <chr> <dbl> ## 1 700013549 Stool 4.39 ## 2 700014386 Stool 4.10 ## 3 700014445 Anterior_nares 14.6 ## 4 700014488 Stool 5.27 ## 5 700014497 Stool 4.49 ## 6 700014527 Anterior_nares 13.8 ## 7 700014555 Stool 5.44 ## 8 700014718 Stool 5.00 ## 9 700014767 Anterior_nares 13.5 ## 10 700014797 Anterior_nares 14.0 ## # … with 351 more rows ``` ] --- # Linear Regression with `lm()` .pull-left[ ```r specimen_data %>% qplot(data = ., * x = dummy_variable_site, * fill = hmpbodysubsite, alpha = 0.8, geom = "histogram", position = "identity") + scale_alpha(guide = FALSE) + theme_bw() + theme(legend.position = "bottom") ``` ] .pull-right[ ![](data:image/png;base64,#lecture_microbiome-regression_files/figure-html/otu4-out-1.png)<!-- --> ] --- # Linear Regression with `lm()` .pull-left[ ```r otu_long %>% group_by(specimen_id) %>% * summarise(shannon = diversity(x = read_count, * index = "shannon")) %>% ungroup() %>% left_join(specimen_data, by = "specimen_id") %>% * mutate(dummy_variable_shannon = * rnorm(n = length(shannon), * mean = 0, * sd = 0.2) + * shannon) %>% distinct() -> shannon_summary shannon_summary ``` ] .pull-right[ ``` ## # A tibble: 10 x 5 ## specimen_id shannon hmpbodysubsite dummy_variable_site dummy_variable_shannon ## <chr> <dbl> <chr> <dbl> <dbl> ## 1 700014718 5.58 Stool 5.00 5.75 ## 2 700014767 2.14 Anterior_nares 13.5 1.91 ## 3 700014923 2.32 Anterior_nares 14.5 2.64 ## 4 700016920 4.96 Anterior_nares 14.7 5.20 ## 5 700023706 4.98 Anterior_nares 14.0 4.58 ## 6 700038343 5.56 Anterior_nares 13.7 5.53 ## 7 700095956 4.86 Stool 4.87 4.35 ## 8 700105834 4.58 Stool 4.67 4.54 ## 9 700107189 4.27 Stool 4.47 4.24 ## 10 700109383 4.44 Stool 5.14 4.47 ``` ] --- # Linear Regression with `lm()` .pull-left[ ```r shannon_summary %>% qplot(data = ., * x = shannon, * y = dummy_variable_site, * color = hmpbodysubsite, geom = c("point","smooth"), method = "lm") + theme_bw() + theme(legend.position = "bottom") ``` ] .pull-right[ ![](data:image/png;base64,#lecture_microbiome-regression_files/figure-html/shannon2-out-1.png)<!-- --> ] --- # Linear Regression with `lm()` .pull-left[ ```r shannon_summary %>% * lm(formula = dummy_variable_site ~ shannon, data = .) %>% summary() ``` ] .pull-right[ ``` ## ## Call: ## lm(formula = dummy_variable_site ~ shannon, data = .) ## ## Residuals: ## Min 1Q Median 3Q Max ## -5.1130 -4.1482 -0.8615 4.5794 6.0493 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 15.136 6.140 2.465 0.039 * ## shannon -1.301 1.359 -0.957 0.367 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 4.912 on 8 degrees of freedom ## Multiple R-squared: 0.1027, Adjusted R-squared: -0.00942 ## F-statistic: 0.916 on 1 and 8 DF, p-value: 0.3665 ``` ] --- # Linear Regression with `lm()` .pull-left[ ```r shannon_summary %>% qplot(data = ., * x = shannon, * y = dummy_variable_shannon, * color = hmpbodysubsite, geom = c("point","smooth"), method = "lm") + theme_bw() + theme(legend.position = "bottom") ``` ] .pull-right[ ![](data:image/png;base64,#lecture_microbiome-regression_files/figure-html/shannon3-out-1.png)<!-- --> ] --- # Linear Regression with `lm()` .pull-left[ ```r shannon_summary %>% * lm(formula = dummy_variable_shannon ~ shannon, data = .) %>% summary() ``` ] .pull-right[ ``` ## ## Call: ## lm(formula = dummy_variable_shannon ~ shannon, data = .) ## ## Residuals: ## Min 1Q Median 3Q Max ## -0.44584 -0.17415 0.03089 0.20184 0.31593 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 0.06029 0.35097 0.172 0.868 ## shannon 0.97515 0.07770 12.550 1.52e-06 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 0.2808 on 8 degrees of freedom ## Multiple R-squared: 0.9517, Adjusted R-squared: 0.9456 ## F-statistic: 157.5 on 1 and 8 DF, p-value: 1.523e-06 ``` ] --- # Linear Regression with `lm()` .pull-left[ ```r shannon_summary %>% * lm(formula = dummy_variable_shannon ~ shannon + * hmpbodysubsite, data = .) %>% summary() ``` ] .pull-right[ ``` ## ## Call: ## lm(formula = dummy_variable_shannon ~ shannon + hmpbodysubsite, ## data = .) ## ## Residuals: ## Min 1Q Median 3Q Max ## -0.42645 -0.17805 0.03369 0.21833 0.30672 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 0.05434 0.37478 0.145 0.889 ## shannon 0.98167 0.08769 11.194 1.01e-05 *** ## hmpbodysubsiteStool -0.04511 0.20042 -0.225 0.828 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 0.2991 on 7 degrees of freedom ## Multiple R-squared: 0.952, Adjusted R-squared: 0.9383 ## F-statistic: 69.43 on 2 and 7 DF, p-value: 2.422e-05 ``` ] --- # Linear Regression with `lm()` .pull-left[ ```r otu_tab %>% * t() %>% # TRANSPOSE * vegdist(x = ., method = "jaccard") %>% * pcoa(D = .) -> pc *pc$vectors %>% * as_tibble(rownames = "specimen_id") %>% * select(specimen_id, Axis.1, Axis.2) %>% left_join(shannon_summary, by = "specimen_id") %>% mutate(dummy_variable_pc = * rnorm(n = length(shannon), * mean = 0, * sd = 0.2) + * Axis.1) %>% distinct() -> pc_summary pc_summary ``` ] .pull-right[ ``` ## # A tibble: 10 x 8 ## specimen_id Axis.1 Axis.2 shannon hmpbodysubsite dummy_variable_site ## <chr> <dbl> <dbl> <dbl> <chr> <dbl> ## 1 700014718 -0.264 0.222 5.58 Stool 5.00 ## 2 700014767 0.516 0.174 2.14 Anterior_nares 13.5 ## 3 700014923 0.505 0.183 2.32 Anterior_nares 14.5 ## 4 700016920 0.00350 -0.434 4.96 Anterior_nares 14.7 ## 5 700023706 -0.000828 -0.290 4.98 Anterior_nares 14.0 ## 6 700038343 -0.00154 -0.435 5.56 Anterior_nares 13.7 ## 7 700095956 -0.222 0.189 4.86 Stool 4.87 ## 8 700105834 -0.225 0.174 4.58 Stool 4.67 ## 9 700107189 -0.0978 0.0363 4.27 Stool 4.47 ## 10 700109383 -0.213 0.180 4.44 Stool 5.14 ## # … with 2 more variables: dummy_variable_shannon <dbl>, ## # dummy_variable_pc <dbl> ``` ] --- # Linear Regression with `lm()` .pull-left[ ```r pc_summary %>% qplot(data = ., * x = Axis.1, * y = dummy_variable_pc, * color = hmpbodysubsite, geom = c("point","smooth"), method = "lm") + theme_bw() + theme(legend.position = "bottom") ``` ] .pull-right[ ![](data:image/png;base64,#lecture_microbiome-regression_files/figure-html/pc2-out-1.png)<!-- --> ] --- # Linear Regression with `lm()` .pull-left[ ```r pc_summary %>% * lm(formula = dummy_variable_pc ~ Axis.1, data = .) %>% summary() ``` ] .pull-right[ ``` ## ## Call: ## lm(formula = dummy_variable_pc ~ Axis.1, data = .) ## ## Residuals: ## Min 1Q Median 3Q Max ## -0.20028 -0.14828 -0.04132 0.16145 0.26277 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -0.00859 0.06096 -0.141 0.89143 ## Axis.1 1.02072 0.22325 4.572 0.00182 ** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 0.1928 on 8 degrees of freedom ## Multiple R-squared: 0.7232, Adjusted R-squared: 0.6886 ## F-statistic: 20.9 on 1 and 8 DF, p-value: 0.001821 ``` ] --- # Linear Regression with `lm()` .pull-left[ ```r pc_summary %>% * lm(formula = dummy_variable_pc ~ Axis.2, data = .) %>% summary() ``` ] .pull-right[ ``` ## ## Call: ## lm(formula = dummy_variable_pc ~ Axis.2, data = .) ## ## Residuals: ## Min 1Q Median 3Q Max ## -0.42213 -0.21484 0.01566 0.06357 0.71955 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -0.00859 0.11587 -0.074 0.943 ## Axis.2 0.01052 0.44610 0.024 0.982 ## ## Residual standard error: 0.3664 on 8 degrees of freedom ## Multiple R-squared: 6.955e-05, Adjusted R-squared: -0.1249 ## F-statistic: 0.0005564 on 1 and 8 DF, p-value: 0.9818 ``` ] --- background-image: url(data:image/png;base64,#svg/geompoint.svg) background-size: 500px background-position: 85% 50% class: center, middle, inverse # PERMANOVA with β-diversity --- # PERMANOVA with `adonis()` .pull-left[ ```r otu_tab %>% * t() %>% # TRANSPOSE * vegdist(x = ., method = "jaccard") -> otu_dist otu_dist %>% str(vec.len = 2) ``` ] .pull-right[ ``` ## 'dist' num [1:45] 1 0.996 ... ## - attr(*, "Size")= int 10 ## - attr(*, "Labels")= chr [1:10] "700014718" "700014767" ... ## - attr(*, "Diag")= logi FALSE ## - attr(*, "Upper")= logi FALSE ## - attr(*, "method")= chr "jaccard" ## - attr(*, "call")= language vegdist(x = ., method = "jaccard") ``` ] --- # PERMANOVA with `adonis()` .pull-left[ ```r *labels(otu_dist) %>% #match order from dist enframe(value = "specimen_id") %>% select(specimen_id) %>% left_join(pc_summary, by = "specimen_id") %>% * mutate(dummy_category = Axis.1 > mean(Axis.1)) %>% distinct() -> sorted_summary sorted_summary ``` ] .pull-right[ ``` ## # A tibble: 10 x 9 ## specimen_id Axis.1 Axis.2 shannon hmpbodysubsite dummy_variable_site ## <chr> <dbl> <dbl> <dbl> <chr> <dbl> ## 1 700014718 -0.264 0.222 5.58 Stool 5.00 ## 2 700014767 0.516 0.174 2.14 Anterior_nares 13.5 ## 3 700014923 0.505 0.183 2.32 Anterior_nares 14.5 ## 4 700016920 0.00350 -0.434 4.96 Anterior_nares 14.7 ## 5 700023706 -0.000828 -0.290 4.98 Anterior_nares 14.0 ## 6 700038343 -0.00154 -0.435 5.56 Anterior_nares 13.7 ## 7 700095956 -0.222 0.189 4.86 Stool 4.87 ## 8 700105834 -0.225 0.174 4.58 Stool 4.67 ## 9 700107189 -0.0978 0.0363 4.27 Stool 4.47 ## 10 700109383 -0.213 0.180 4.44 Stool 5.14 ## # … with 3 more variables: dummy_variable_shannon <dbl>, ## # dummy_variable_pc <dbl>, dummy_category <lgl> ``` ] --- # PERMANOVA with `adonis()` .pad-left[ ```r *# distance matrix is response variable *adonis(otu_dist ~ hmpbodysubsite, * data = sorted_summary) ``` ``` ## ## Call: ## adonis(formula = otu_dist ~ hmpbodysubsite, data = sorted_summary) ## ## Permutation: free ## Number of permutations: 999 ## ## Terms added sequentially (first to last) ## ## Df SumsOfSqs MeanSqs F.Model R2 Pr(>F) ## hmpbodysubsite 1 0.7036 0.70364 1.5719 0.16422 0.014 * ## Residuals 8 3.5811 0.44764 0.83578 ## Total 9 4.2847 1.00000 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ``` ] --- # PERMANOVA with `adonis()` .pad-left[ ```r *# multivariable possible... *adonis(otu_dist ~ hmpbodysubsite + dummy_category, * data = sorted_summary) ``` ``` ## ## Call: ## adonis(formula = otu_dist ~ hmpbodysubsite + dummy_category, data = sorted_summary) ## ## Permutation: free ## Number of permutations: 999 ## ## Terms added sequentially (first to last) ## ## Df SumsOfSqs MeanSqs F.Model R2 Pr(>F) ## hmpbodysubsite 1 0.7036 0.70364 1.6100 0.16422 0.002 ** ## dummy_category 1 0.5218 0.52180 1.1939 0.12178 0.125 ## Residuals 7 3.0593 0.43704 0.71400 ## Total 9 4.2847 1.00000 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ``` ] --- # PERMANOVA with `adonis()` .pad-left[ ```r *# ... but order matters!!! *adonis(otu_dist ~ dummy_category + hmpbodysubsite, * data = sorted_summary) ``` ``` ## ## Call: ## adonis(formula = otu_dist ~ dummy_category + hmpbodysubsite, data = sorted_summary) ## ## Permutation: free ## Number of permutations: 999 ## ## Terms added sequentially (first to last) ## ## Df SumsOfSqs MeanSqs F.Model R2 Pr(>F) ## dummy_category 1 0.6329 0.63293 1.4482 0.14772 0.018 * ## hmpbodysubsite 1 0.5925 0.59251 1.3557 0.13828 0.036 * ## Residuals 7 3.0593 0.43704 0.71400 ## Total 9 4.2847 1.00000 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ``` ] --- # PERMANOVA with `adonis()` .pad-left[ ```r *# ... do you mean strata? *adonis(otu_dist ~ dummy_category, * strata = sorted_summary$hmpbodysubsite, * data = sorted_summary) ``` ``` ## ## Call: ## adonis(formula = otu_dist ~ dummy_category, data = sorted_summary, strata = sorted_summary$hmpbodysubsite) ## ## Blocks: strata ## Permutation: free ## Number of permutations: 999 ## ## Terms added sequentially (first to last) ## ## Df SumsOfSqs MeanSqs F.Model R2 Pr(>F) ## dummy_category 1 0.6329 0.63293 1.3865 0.14772 0.292 ## Residuals 8 3.6518 0.45648 0.85228 ## Total 9 4.2847 1.00000 ``` ] --- # PERMANOVA with `adonis()` .pad-left[ ```r *# ... or do you mean nestedness? *adonis(otu_dist ~ dummy_category / hmpbodysubsite, * data = sorted_summary) ``` ``` ## ## Call: ## adonis(formula = otu_dist ~ dummy_category/hmpbodysubsite, data = sorted_summary) ## ## Permutation: free ## Number of permutations: 999 ## ## Terms added sequentially (first to last) ## ## Df SumsOfSqs MeanSqs F.Model R2 Pr(>F) ## dummy_category 1 0.6329 0.63293 1.4482 0.14772 0.009 ** ## dummy_category:hmpbodysubsite 1 0.5925 0.59251 1.3557 0.13828 0.023 * ## Residuals 7 3.0593 0.43704 0.71400 ## Total 9 4.2847 1.00000 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ``` ] --- background-image: url(data:image/png;base64,#svg/geompoint.svg) background-size: 500px background-position: 85% 50% class: center, middle, inverse # Regression & Compositional Data? --- # Regression & Compositional Data? .pad-left[ - Compositional data approaches correct OTU dependency: - e.g., `compositions::clr()` or `philr::philr()` - p >> n challenges persist - Must pair compositional transform with regularization: - `glmnet::glmnet` for LASSO/ridge/elastic net - Bayesian methods ] --- class: center, middle, inverse background-image: url(data:image/png;base64,#svg/conjugation.svg) background-size: 500px background-position: 85% 50% # Questions? ### Post to the discussion board! --- background-image: url(data:image/png;base64,#svg/bacteria.svg) background-size: 100px background-position: 98% 90% class: center, middle # Thank you! #### Slides available: [github.com/bjklab](https://github.com/bjklab/https://github.com/bjklab/EPID674_008_microbiome-regression.git) #### [brendank@pennmedicine.upenn.edu](brendank@pennmedicine.upenn.edu)