Regression for Microbiome Data:Moving From Diversity to Inference

class: center, middle, inverse, title-slide

# Regression for Microbiome Data:</br>Moving From Diversity to Inference
## 📚EPID 674📚
### Brendan J. Kelly, MD, MS
### Updated: 18 June 2020

---

background-image: url(data:image/png;base64,#svg/geompoint.svg)
background-size: 500px
background-position: 85% 50%
class: middle, inverse

.pad-left[

### Review α-diversity

### Review β-diversity

### α/β-diversity ⇾ linear regression

### β-diversity ⇾ PERMANOVA

### Compositional data?

]

---
background-image: url(data:image/png;base64,#svg/geompoint.svg)
background-size: 500px
background-position: 85% 50%
class: center, middle, inverse

# α-diversity

---

# High Dimensional Microbiome Data

.center[

```
##              700013549 700014386 700014403 700014409 700014412 700014415
## OTU_97.1             0         0         0         0         0         0
## OTU_97.10            0         0         6         4         1         5
## OTU_97.100           0         0       133         7         1         4
## OTU_97.1000          0         0         0         0         0         0
## OTU_97.10000         0         0         0         0         0         0
## OTU_97.10001         0         0         0         0         0         1
## OTU_97.10002         0         0         0         0         0         0
## OTU_97.10003         0         0         0         0         0         0
## OTU_97.10004         0         0         0         0         0         0
## OTU_97.10005         0         0         0         0         0         0
## OTU_97.10006         0         0         0         0         0         0
## OTU_97.10007         0         0         0         0         0         0
## OTU_97.10008         0         1         0         0         0         0
## OTU_97.10009         0         0         1         0         0         0
## OTU_97.1001          0         0         0         0         0         0
## OTU_97.10010         0         0         0         0         0         0
```

]

---

# High Dimensional Microbiome Data

.pad-left[

- How to deal with high-dimensional microbiome data?

- __Descriptive (e.g., heatmaps and stacked barplots)__
    
- Test a priori hypotheses regarding specific OTUs/taxa

- Reduce dimensions:

- single summary statistic (alpha diversity)
    
    - pairwise distances (beta diversity) with PCoA or PERMANOVA
    
    - community types (mixture modeling)

]

---
background-image: url(data:image/png;base64,#img/hmp_heatmap.png)
background-size: contain

---

# High Dimensional Microbiome Data

.pad-left[

- How to deal with high-dimensional microbiome data?

- Descriptive (e.g., heatmaps and stacked barplots)
    
- Test a priori hypotheses regarding specific OTUs/taxa

- __Reduce dimensions:__

- __single summary statistic (alpha diversity)__
    
    - pairwise distances (beta diversity) with PCoA or PERMANOVA
    
    - community types (mixture modeling)

]

---

# Shannon Diversity

.pad-left[

- __Richness__ & __evenness__

- Shannon diversity:

`$$H' = - \sum{ p_{i} * \log_{b}{(p_{i})} }$$`
    
- "The uncertainty contained in a probability distribution is the average log-probability of an event." (McElreath _Statistical Rethinking, 2nd_ 2020)

]

---
background-image: url(data:image/png;base64,#img/hmp_shannon.png)
background-size: contain

---
background-image: url(data:image/png;base64,#svg/geompoint.svg)
background-size: 500px
background-position: 85% 50%
class: center, middle, inverse

# β-diversity

---

# High Dimensional Microbiome Data

.pad-left[

- How to deal with high-dimensional microbiome data?

- Descriptive (e.g., heatmaps and stacked barplots)
    
- Test a priori hypotheses regarding specific OTUs/taxa

- __Reduce dimensions:__

- single summary statistic (alpha diversity)
    
    - __pairwise distances (beta diversity) with PCoA or PERMANOVA__
    
    - community types (mixture modeling)

]

---

# Beta Diversity as Dimension Reduction

.pad-left[

- Summarize each sample’s relationship to other samples:

- pairwise distances
    
    - OTU table → square matrix
    
- Many beta diversity metrics:

- just counts versus counts + phylogeny
    
    - weighted versus unweighted
    
]

---
background-image: url(data:image/png;base64,#img/legendre_otu_to_dm.png)
background-size: contain

---

# Distance Metrics for Beta Diversity

.pad-left[

- Just counts versus counts + phylogeny:

- Jaccard: `$J(A,B) = \frac{A∩B}{A∪B}$`  &  `$d_{J}(A,B) = 1 - J(A,B)$`

- UniFrac: fraction of unique branch length in tree

- Weighted versus unweighted:

- weighted: counts matter

- unweighted: binary (presence-absence)

]

---

# Pairwise Distances ⇾ PCoA

.pad-left[

- PCoA: principal coordinate analysis

- any metric distance, even if non-Euclidean
    
    - like PCA, eigenvalue decomposition (maximum variance) but mediated by distance function (no original descriptors)

- unlike PCA, does not allow projection of original descriptors in reduced-dimension space

]

---
background-image: url(data:image/png;base64,#img/weighted_unifrac.png)
background-size: contain

---
background-image: url(data:image/png;base64,#img/within_vs_between_group.png)
background-size: contain

---
background-image: url(data:image/png;base64,#img/anderson_adonis.png)
background-size: contain

---
background-image: url(data:image/png;base64,#img/kelly_bioinformatics_highlight.png)
background-size: contain

---
background-image: url(data:image/png;base64,#svg/geompoint.svg)
background-size: 500px
background-position: 85% 50%
class: center, middle, inverse

# Linear regression with α/β-diversity

---

# Linear Regression with `lm()`

.pull-left[

```r
# install.packages("tidyverse") 
library(tidyverse)

# install.packages("vegan")
library(vegan)

# install.packages("ape")
library(ape)

*set.seed(16)

otu_tab <- read_rds(
"./data/HMP_OTU_table_matrix_stool_nares.rds"
)

otu_tab %>%
  str(vec.len = 3)
```

]

.pull-right[

```
##  num [1:43140, 1:10] 0 0 0 0 0 0 0 0 ...
##  - attr(*, "dimnames")=List of 2
##   ..$ : chr [1:43140] "OTU_97.1" "OTU_97.10" "OTU_97.100" ...
##   ..$ : chr [1:10] "700014718" "700014767" "700014923" ...
```

]

---

# Linear Regression with `lm()`

.pull-left[

```r
otu_tab %>%
  as_tibble(rownames = "otu_id") %>%
  gather(key = "specimen_id",
         value = "read_count",
         -otu_id) %>%
  distinct() -> otu_long

otu_long
```

]

.pull-right[

```
## # A tibble: 431,400 x 3
##    otu_id       specimen_id read_count
##    <chr>        <chr>            <dbl>
##  1 OTU_97.1     700014718            0
##  2 OTU_97.10    700014718            0
##  3 OTU_97.100   700014718            0
##  4 OTU_97.1000  700014718            0
##  5 OTU_97.10000 700014718            0
##  6 OTU_97.10001 700014718            0
##  7 OTU_97.10002 700014718            0
##  8 OTU_97.10003 700014718            0
##  9 OTU_97.10004 700014718            0
## 10 OTU_97.10005 700014718            0
## # … with 431,390 more rows
```

]

---

# Linear Regression with `lm()`

.pull-left[

```r
read_tsv(file = "./data/v13_map_uniquebyPSN.txt.bz2") %>%
  rename_all(.funs = ~ gsub("#","",tolower(.x))) %>%
  rename(specimen_id = sampleid) %>%
  distinct() -> specimen_data

specimen_data %>%
  group_by(hmpbodysubsite) %>%
* mutate(dummy_variable_site =
*          rnorm(n = length(hmpbodysubsite),
*                mean = nchar(unique(hmpbodysubsite)),
*                sd = 0.5)) %>%
  ungroup() %>%
* filter(hmpbodysubsite %in%
*          c("Anterior_nares","Stool")) %>%
  select(specimen_id,
         hmpbodysubsite,
         dummy_variable_site) %>%
  mutate(specimen_id = as.character(specimen_id)) %>%
  distinct() -> specimen_data

specimen_data
```

]

.pull-right[

```
## # A tibble: 361 x 3
##    specimen_id hmpbodysubsite dummy_variable_site
##    <chr>       <chr>                        <dbl>
##  1 700013549   Stool                         4.39
##  2 700014386   Stool                         4.10
##  3 700014445   Anterior_nares               14.6 
##  4 700014488   Stool                         5.27
##  5 700014497   Stool                         4.49
##  6 700014527   Anterior_nares               13.8 
##  7 700014555   Stool                         5.44
##  8 700014718   Stool                         5.00
##  9 700014767   Anterior_nares               13.5 
## 10 700014797   Anterior_nares               14.0 
## # … with 351 more rows
```

]

---

# Linear Regression with `lm()`

.pull-left[

```r
specimen_data %>%
  qplot(data = .,
*       x = dummy_variable_site,
*       fill = hmpbodysubsite,
        alpha = 0.8,
        geom = "histogram",
        position = "identity") +
  scale_alpha(guide = FALSE) +
  theme_bw() +
  theme(legend.position = "bottom")
```

]

.pull-right[

![](data:image/png;base64,#lecture_microbiome-regression_files/figure-html/otu4-out-1.png)

]

---

# Linear Regression with `lm()`

.pull-left[

```r
otu_long %>%
  group_by(specimen_id) %>%
* summarise(shannon = diversity(x = read_count,
*                            index = "shannon")) %>%
  ungroup() %>%
  left_join(specimen_data, by = "specimen_id") %>%
* mutate(dummy_variable_shannon =
*          rnorm(n = length(shannon),
*                mean = 0,
*                sd = 0.2) +
*          shannon) %>%
  distinct() -> shannon_summary

shannon_summary
```

]

.pull-right[

```
## # A tibble: 10 x 5
##    specimen_id shannon hmpbodysubsite dummy_variable_site dummy_variable_shannon
##    <chr>         <dbl> <chr>                        <dbl>                  <dbl>
##  1 700014718      5.58 Stool                         5.00                   5.75
##  2 700014767      2.14 Anterior_nares               13.5                    1.91
##  3 700014923      2.32 Anterior_nares               14.5                    2.64
##  4 700016920      4.96 Anterior_nares               14.7                    5.20
##  5 700023706      4.98 Anterior_nares               14.0                    4.58
##  6 700038343      5.56 Anterior_nares               13.7                    5.53
##  7 700095956      4.86 Stool                         4.87                   4.35
##  8 700105834      4.58 Stool                         4.67                   4.54
##  9 700107189      4.27 Stool                         4.47                   4.24
## 10 700109383      4.44 Stool                         5.14                   4.47
```

]

---

# Linear Regression with `lm()`

.pull-left[

```r
shannon_summary %>%
    qplot(data = .,
*       x = shannon,
*       y = dummy_variable_site,
*       color = hmpbodysubsite,
        geom = c("point","smooth"),
        method = "lm") +
  theme_bw() +
  theme(legend.position = "bottom")
```

]

.pull-right[

![](data:image/png;base64,#lecture_microbiome-regression_files/figure-html/shannon2-out-1.png)

]

---

# Linear Regression with `lm()`

.pull-left[

```r
shannon_summary %>%
* lm(formula = dummy_variable_site ~ shannon,
     data = .) %>%
  summary()
```

]

.pull-right[

```
## 
## Call:
## lm(formula = dummy_variable_site ~ shannon, data = .)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.1130 -4.1482 -0.8615  4.5794  6.0493 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept)   15.136      6.140   2.465    0.039 *
## shannon       -1.301      1.359  -0.957    0.367  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.912 on 8 degrees of freedom
## Multiple R-squared:  0.1027,	Adjusted R-squared:  -0.00942 
## F-statistic: 0.916 on 1 and 8 DF,  p-value: 0.3665
```

]

---

# Linear Regression with `lm()`

.pull-left[

```r
shannon_summary %>%
    qplot(data = .,
*       x = shannon,
*       y = dummy_variable_shannon,
*       color = hmpbodysubsite,
        geom = c("point","smooth"),
        method = "lm") +
  theme_bw() +
  theme(legend.position = "bottom")
```

]

.pull-right[

![](data:image/png;base64,#lecture_microbiome-regression_files/figure-html/shannon3-out-1.png)

]

---

# Linear Regression with `lm()`

.pull-left[

```r
shannon_summary %>%
* lm(formula = dummy_variable_shannon ~ shannon,
     data = .) %>%
  summary()
```

]

.pull-right[

```
## 
## Call:
## lm(formula = dummy_variable_shannon ~ shannon, data = .)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.44584 -0.17415  0.03089  0.20184  0.31593 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.06029    0.35097   0.172    0.868    
## shannon      0.97515    0.07770  12.550 1.52e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2808 on 8 degrees of freedom
## Multiple R-squared:  0.9517,	Adjusted R-squared:  0.9456 
## F-statistic: 157.5 on 1 and 8 DF,  p-value: 1.523e-06
```

]

---

# Linear Regression with `lm()`

.pull-left[

```r
shannon_summary %>%
* lm(formula = dummy_variable_shannon ~ shannon +
*      hmpbodysubsite,
     data = .) %>%
  summary()
```

]

.pull-right[

```
## 
## Call:
## lm(formula = dummy_variable_shannon ~ shannon + hmpbodysubsite, 
##     data = .)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.42645 -0.17805  0.03369  0.21833  0.30672 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          0.05434    0.37478   0.145    0.889    
## shannon              0.98167    0.08769  11.194 1.01e-05 ***
## hmpbodysubsiteStool -0.04511    0.20042  -0.225    0.828    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2991 on 7 degrees of freedom
## Multiple R-squared:  0.952,	Adjusted R-squared:  0.9383 
## F-statistic: 69.43 on 2 and 7 DF,  p-value: 2.422e-05
```

]

---

# Linear Regression with `lm()`

.pull-left[

```r
otu_tab %>%
* t() %>%  # TRANSPOSE
* vegdist(x = ., method = "jaccard") %>%
* pcoa(D = .) -> pc

*pc$vectors %>%
* as_tibble(rownames = "specimen_id") %>%
* select(specimen_id, Axis.1, Axis.2) %>%
  left_join(shannon_summary, by = "specimen_id") %>%
  mutate(dummy_variable_pc = 
*          rnorm(n = length(shannon),
*                mean = 0,
*                sd = 0.2) +
*          Axis.1) %>%
  distinct() -> pc_summary

pc_summary
```

]

.pull-right[

```
## # A tibble: 10 x 8
##    specimen_id    Axis.1  Axis.2 shannon hmpbodysubsite dummy_variable_site
##    <chr>           <dbl>   <dbl>   <dbl> <chr>                        <dbl>
##  1 700014718   -0.264     0.222     5.58 Stool                         5.00
##  2 700014767    0.516     0.174     2.14 Anterior_nares               13.5 
##  3 700014923    0.505     0.183     2.32 Anterior_nares               14.5 
##  4 700016920    0.00350  -0.434     4.96 Anterior_nares               14.7 
##  5 700023706   -0.000828 -0.290     4.98 Anterior_nares               14.0 
##  6 700038343   -0.00154  -0.435     5.56 Anterior_nares               13.7 
##  7 700095956   -0.222     0.189     4.86 Stool                         4.87
##  8 700105834   -0.225     0.174     4.58 Stool                         4.67
##  9 700107189   -0.0978    0.0363    4.27 Stool                         4.47
## 10 700109383   -0.213     0.180     4.44 Stool                         5.14
## # … with 2 more variables: dummy_variable_shannon <dbl>,
## #   dummy_variable_pc <dbl>
```

]

---

# Linear Regression with `lm()`

.pull-left[

```r
pc_summary %>%
  qplot(data = .,
*       x = Axis.1,
*       y = dummy_variable_pc,
*       color = hmpbodysubsite,
        geom = c("point","smooth"),
        method = "lm") +
  theme_bw() +
  theme(legend.position = "bottom")
```

]

.pull-right[

![](data:image/png;base64,#lecture_microbiome-regression_files/figure-html/pc2-out-1.png)

]

---

# Linear Regression with `lm()`

.pull-left[

```r
pc_summary %>%
* lm(formula = dummy_variable_pc ~ Axis.1,
     data = .) %>%
  summary()
```

]

.pull-right[

```
## 
## Call:
## lm(formula = dummy_variable_pc ~ Axis.1, data = .)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.20028 -0.14828 -0.04132  0.16145  0.26277 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept) -0.00859    0.06096  -0.141  0.89143   
## Axis.1       1.02072    0.22325   4.572  0.00182 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1928 on 8 degrees of freedom
## Multiple R-squared:  0.7232,	Adjusted R-squared:  0.6886 
## F-statistic:  20.9 on 1 and 8 DF,  p-value: 0.001821
```

]

---

# Linear Regression with `lm()`

.pull-left[

```r
pc_summary %>%
* lm(formula = dummy_variable_pc ~ Axis.2,
     data = .) %>%
  summary()
```

]

.pull-right[

```
## 
## Call:
## lm(formula = dummy_variable_pc ~ Axis.2, data = .)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.42213 -0.21484  0.01566  0.06357  0.71955 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.00859    0.11587  -0.074    0.943
## Axis.2       0.01052    0.44610   0.024    0.982
## 
## Residual standard error: 0.3664 on 8 degrees of freedom
## Multiple R-squared:  6.955e-05,	Adjusted R-squared:  -0.1249 
## F-statistic: 0.0005564 on 1 and 8 DF,  p-value: 0.9818
```

]

---
background-image: url(data:image/png;base64,#svg/geompoint.svg)
background-size: 500px
background-position: 85% 50%
class: center, middle, inverse

# PERMANOVA with β-diversity

---

# PERMANOVA with `adonis()`

.pull-left[

```r
otu_tab %>%
* t() %>%  # TRANSPOSE
* vegdist(x = ., method = "jaccard") -> otu_dist

otu_dist %>%
  str(vec.len = 2)
```

]

.pull-right[

```
##  'dist' num [1:45] 1 0.996 ...
##  - attr(*, "Size")= int 10
##  - attr(*, "Labels")= chr [1:10] "700014718" "700014767" ...
##  - attr(*, "Diag")= logi FALSE
##  - attr(*, "Upper")= logi FALSE
##  - attr(*, "method")= chr "jaccard"
##  - attr(*, "call")= language vegdist(x = ., method = "jaccard")
```

]

---

# PERMANOVA with `adonis()`

.pull-left[

```r
*labels(otu_dist) %>% #match order from dist
  enframe(value = "specimen_id") %>%
  select(specimen_id) %>%
  left_join(pc_summary, by = "specimen_id") %>%
* mutate(dummy_category = Axis.1 > mean(Axis.1)) %>%
  distinct() -> sorted_summary

sorted_summary
```

]

.pull-right[

```
## # A tibble: 10 x 9
##    specimen_id    Axis.1  Axis.2 shannon hmpbodysubsite dummy_variable_site
##    <chr>           <dbl>   <dbl>   <dbl> <chr>                        <dbl>
##  1 700014718   -0.264     0.222     5.58 Stool                         5.00
##  2 700014767    0.516     0.174     2.14 Anterior_nares               13.5 
##  3 700014923    0.505     0.183     2.32 Anterior_nares               14.5 
##  4 700016920    0.00350  -0.434     4.96 Anterior_nares               14.7 
##  5 700023706   -0.000828 -0.290     4.98 Anterior_nares               14.0 
##  6 700038343   -0.00154  -0.435     5.56 Anterior_nares               13.7 
##  7 700095956   -0.222     0.189     4.86 Stool                         4.87
##  8 700105834   -0.225     0.174     4.58 Stool                         4.67
##  9 700107189   -0.0978    0.0363    4.27 Stool                         4.47
## 10 700109383   -0.213     0.180     4.44 Stool                         5.14
## # … with 3 more variables: dummy_variable_shannon <dbl>,
## #   dummy_variable_pc <dbl>, dummy_category <lgl>
```

]

---

# PERMANOVA with `adonis()`

.pad-left[

```r
*# distance matrix is response variable
*adonis(otu_dist ~ hmpbodysubsite,
*      data = sorted_summary)
```

```
## 
## Call:
## adonis(formula = otu_dist ~ hmpbodysubsite, data = sorted_summary) 
## 
## Permutation: free
## Number of permutations: 999
## 
## Terms added sequentially (first to last)
## 
##                Df SumsOfSqs MeanSqs F.Model      R2 Pr(>F)  
## hmpbodysubsite  1    0.7036 0.70364  1.5719 0.16422  0.014 *
## Residuals       8    3.5811 0.44764         0.83578         
## Total           9    4.2847                 1.00000         
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

]

---

# PERMANOVA with `adonis()`

.pad-left[

```r
*# multivariable possible...
*adonis(otu_dist ~ hmpbodysubsite + dummy_category,
*      data = sorted_summary)
```

```
## 
## Call:
## adonis(formula = otu_dist ~ hmpbodysubsite + dummy_category,      data = sorted_summary) 
## 
## Permutation: free
## Number of permutations: 999
## 
## Terms added sequentially (first to last)
## 
##                Df SumsOfSqs MeanSqs F.Model      R2 Pr(>F)   
## hmpbodysubsite  1    0.7036 0.70364  1.6100 0.16422  0.002 **
## dummy_category  1    0.5218 0.52180  1.1939 0.12178  0.125   
## Residuals       7    3.0593 0.43704         0.71400          
## Total           9    4.2847                 1.00000          
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

]

---

# PERMANOVA with `adonis()`

.pad-left[

```r
*# ... but order matters!!!
*adonis(otu_dist ~ dummy_category + hmpbodysubsite,
*      data = sorted_summary)
```

```
## 
## Call:
## adonis(formula = otu_dist ~ dummy_category + hmpbodysubsite,      data = sorted_summary) 
## 
## Permutation: free
## Number of permutations: 999
## 
## Terms added sequentially (first to last)
## 
##                Df SumsOfSqs MeanSqs F.Model      R2 Pr(>F)  
## dummy_category  1    0.6329 0.63293  1.4482 0.14772  0.018 *
## hmpbodysubsite  1    0.5925 0.59251  1.3557 0.13828  0.036 *
## Residuals       7    3.0593 0.43704         0.71400         
## Total           9    4.2847                 1.00000         
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

]

---

# PERMANOVA with `adonis()`

.pad-left[

```r
*# ... do you mean strata?
*adonis(otu_dist ~ dummy_category,
*      strata = sorted_summary$hmpbodysubsite,
*      data = sorted_summary)
```

```
## 
## Call:
## adonis(formula = otu_dist ~ dummy_category, data = sorted_summary,      strata = sorted_summary$hmpbodysubsite) 
## 
## Blocks:  strata 
## Permutation: free
## Number of permutations: 999
## 
## Terms added sequentially (first to last)
## 
##                Df SumsOfSqs MeanSqs F.Model      R2 Pr(>F)
## dummy_category  1    0.6329 0.63293  1.3865 0.14772  0.292
## Residuals       8    3.6518 0.45648         0.85228       
## Total           9    4.2847                 1.00000
```

]

---

# PERMANOVA with `adonis()`

.pad-left[

```r
*# ... or do you mean nestedness?
*adonis(otu_dist ~ dummy_category / hmpbodysubsite,
*      data = sorted_summary)
```

```
## 
## Call:
## adonis(formula = otu_dist ~ dummy_category/hmpbodysubsite, data = sorted_summary) 
## 
## Permutation: free
## Number of permutations: 999
## 
## Terms added sequentially (first to last)
## 
##                               Df SumsOfSqs MeanSqs F.Model      R2 Pr(>F)   
## dummy_category                 1    0.6329 0.63293  1.4482 0.14772  0.009 **
## dummy_category:hmpbodysubsite  1    0.5925 0.59251  1.3557 0.13828  0.023 * 
## Residuals                      7    3.0593 0.43704         0.71400          
## Total                          9    4.2847                 1.00000          
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

]

---
background-image: url(data:image/png;base64,#svg/geompoint.svg)
background-size: 500px
background-position: 85% 50%
class: center, middle, inverse

# Regression & Compositional Data?

---

# Regression & Compositional Data?

.pad-left[

- Compositional data approaches correct OTU dependency:

- e.g., `compositions::clr()` or `philr::philr()`

- p >> n challenges persist

- Must pair compositional transform with regularization:

- `glmnet::glmnet` for LASSO/ridge/elastic net

- Bayesian methods

]

---
class: center, middle, inverse
background-image: url(data:image/png;base64,#svg/conjugation.svg)
background-size: 500px
background-position: 85% 50%

# Questions?
### Post to the discussion board!

---
background-image: url(data:image/png;base64,#svg/bacteria.svg)
background-size: 100px
background-position: 98% 90%
class: center, middle

# Thank you!
#### Slides available: [github.com/bjklab](https://github.com/bjklab/https://github.com/bjklab/EPID674_008_microbiome-regression.git)
#### [brendank@pennmedicine.upenn.edu](brendank@pennmedicine.upenn.edu)