class: center, middle, inverse, title-slide # Beta Diversity:Inter-Community Difference ## 📚EPID 674📚 ### Brendan J. Kelly, MD, MS ### Updated: 11 June 2020 --- background-image: url(data:image/png;base64,#svg/distance.svg) background-size: 500px background-position: 85% 50% class: middle, inverse .pad-left[ ### Beta diversity ### Pairwise distances ### Principal coordinates ### PERMANOVA ### R's vegan package ] --- background-image: url(data:image/png;base64,#svg/distance.svg) background-size: 500px background-position: 85% 50% class: center, middle, inverse # Beta diversity --- # High Dimensional Microbiome Data .center[ ``` ## 700013549 700014386 700014403 700014409 700014412 700014415 ## OTU_97.1 0 0 0 0 0 0 ## OTU_97.10 0 0 6 4 1 5 ## OTU_97.100 0 0 133 7 1 4 ## OTU_97.1000 0 0 0 0 0 0 ## OTU_97.10000 0 0 0 0 0 0 ## OTU_97.10001 0 0 0 0 0 1 ## OTU_97.10002 0 0 0 0 0 0 ## OTU_97.10003 0 0 0 0 0 0 ## OTU_97.10004 0 0 0 0 0 0 ## OTU_97.10005 0 0 0 0 0 0 ## OTU_97.10006 0 0 0 0 0 0 ## OTU_97.10007 0 0 0 0 0 0 ## OTU_97.10008 0 1 0 0 0 0 ## OTU_97.10009 0 0 1 0 0 0 ## OTU_97.1001 0 0 0 0 0 0 ## OTU_97.10010 0 0 0 0 0 0 ``` ] --- # High Dimensional Microbiome Data .pad-left[ - How to deal with high-dimensional microbiome data? - __Descriptive (e.g., heatmaps and stacked barplots)__ - Test a priori hypotheses regarding specific OTUs/taxa - Reduce dimensions: - single summary statistic (alpha diversity) - pairwise distances (beta diversity) with PCoA or PERMANOVA - community types (mixture modeling) ] --- background-image: url(data:image/png;base64,#img/hmp_heatmap.png) background-size: contain --- # High Dimensional Microbiome Data .pad-left[ - How to deal with high-dimensional microbiome data? - Descriptive (e.g., heatmaps and stacked barplots) - Test a priori hypotheses regarding specific OTUs/taxa - __Reduce dimensions:__ - __single summary statistic (alpha diversity)__ - pairwise distances (beta diversity) with PCoA or PERMANOVA - community types (mixture modeling) ] --- background-image: url(data:image/png;base64,#img/hmp_shannon.png) background-size: contain --- # High Dimensional Microbiome Data .pad-left[ - How to deal with high-dimensional microbiome data? - Descriptive (e.g., heatmaps and stacked barplots) - Test a priori hypotheses regarding specific OTUs/taxa - __Reduce dimensions:__ - single summary statistic (alpha diversity) - __pairwise distances (beta diversity) with PCoA or PERMANOVA__ - community types (mixture modeling) ] --- # Beta Diversity as Dimension Reduction .pad-left[ - Summarize each sample’s relationship to other samples: - pairwise distances - OTU table → square matrix - Many beta diversity metrics: - just counts versus counts + phylogeny - weighted versus unweighted ] --- background-image: url(data:image/png;base64,#svg/distance.svg) background-size: 500px background-position: 85% 50% class: center, middle, inverse # Pairwise distances --- background-image: url(data:image/png;base64,#img/legendre_cover.png) background-size: contain --- # What's in a distance? .pad-left[ - “The most usual approach to assess the resemblance among objects or descriptors is to first condense all (or the relevant part of) the information available in the ecological data matrix (Section 2.1) into __a square matrix of association__ among the objects or descriptors (Section 2.2). In most instances, the association matrix is symmetric.” - Compare variable-variable: “R-mode” (like Pearson’s r coefficient) - Compare object-object: “Q-mode” - Six modes of analysis if incorporate time series (Cattell 1966) ] --- background-image: url(data:image/png;base64,#img/legendre_3d.png) background-size: contain --- # What's in a distance? .pad-left[ - “... association will be used as a general term to describe any measure or coefficient used to quantify the __resemblance or difference__ between objects or descriptors, as proposed by Orlóci (1975)." - Q-mode studies: - similarity coefficients (identical = 1) - distance (or dissimilarity) coefficients (identical = 0) ] --- background-image: url(data:image/png;base64,#img/legendre_otu_to_dm.png) background-size: contain --- # OTU Table: OTUs x Specimens .center[ ``` ## 700013549 700014386 700014403 700014409 700014412 700014415 ## OTU_97.1 0 0 0 0 0 0 ## OTU_97.10 0 0 6 4 1 5 ## OTU_97.100 0 0 133 7 1 4 ## OTU_97.1000 0 0 0 0 0 0 ## OTU_97.10000 0 0 0 0 0 0 ## OTU_97.10001 0 0 0 0 0 1 ## OTU_97.10002 0 0 0 0 0 0 ## OTU_97.10003 0 0 0 0 0 0 ## OTU_97.10004 0 0 0 0 0 0 ## OTU_97.10005 0 0 0 0 0 0 ## OTU_97.10006 0 0 0 0 0 0 ## OTU_97.10007 0 0 0 0 0 0 ## OTU_97.10008 0 1 0 0 0 0 ## OTU_97.10009 0 0 1 0 0 0 ## OTU_97.1001 0 0 0 0 0 0 ## OTU_97.10010 0 0 0 0 0 0 ``` ] --- # OTU Table: Specimens x OTUs .center[ ``` ## OTU_97.1 OTU_97.10 OTU_97.100 OTU_97.1000 OTU_97.10000 OTU_97.10001 ## 700013549 0 0 0 0 0 0 ## 700014386 0 0 0 0 0 0 ## 700014403 0 6 133 0 0 0 ## 700014409 0 4 7 0 0 0 ## 700014412 0 1 1 0 0 0 ## 700014415 0 5 4 0 0 1 ## 700014418 0 2 0 0 0 0 ## 700014421 0 3 25 0 0 0 ## 700014424 0 1 5 0 0 0 ## 700014427 0 1 0 0 0 0 ## 700014430 0 6 0 0 0 0 ## 700014445 0 0 0 0 0 0 ## 700014501 0 2 1 0 0 0 ## 700014515 0 0 0 0 0 0 ## 700014516 0 0 0 0 0 0 ## 700014517 0 0 0 0 0 0 ``` ] --- background-image: url(data:image/png;base64,#img/legendre_otu_to_dm.png) background-size: contain --- # Distance Metrics for Beta Diversity .pad-left[ - Just counts versus counts + phylogeny: - Jaccard: `\(J(A,B) = \frac{A∩B}{A∪B}\)` & `\(d_{J}(A,B) = 1 - J(A,B)\)` - UniFrac: fraction of unique branch length in tree - Weighted versus unweighted: - weighted: counts matter - unweighted: binary (presence-absence) ] --- # The "Double Zero" Problem .pad-left[ - “The proportion of zeros in community composition data generally increases with the variability in environmental conditions among the sampling sites. If sampling has been conducted along one or several environmental axes, the species present are likely to differ at least partly from site to site. __Including double zeros in the comparison between sites would result in high values of similarity for the many pairs of sites holding only a few species__, these pairs presenting many double zeros; this would not provide a correct ecological assessment of the situation.” ] --- # The "Double Zero" Problem .pad-left[ - “Because double zeros are not informative, their interpretation generates __the double zero problem: is the value of an association coefficient affected by inclusion of double zeros in its calculation?__ When choosing an association coefficient, ecologists must pay attention to the interpretation of double zeros: except in very limited cases (e.g. controlled experiments involving very few species and with small uncontrolled ecological variation), it is preferable to draw no ecological conclusion from the simultaneous absence of a species at two sites.... In numerical terms, this means to __skip double zeros when computing similarity or distance coefficients__ using species presence-absence or abundance data.” ] --- # UniFrac .pad-left[ - UniFrac measures the distance between communities based on the lineages they contain. - Satisfies the technical requirements of a distance metric: - always positive - transitive - satisfies the triangle inequality - Can thus be used with standard multivariate statistics (e.g., UPGMA, clustering, and __PCoA__). ] --- # UniFrac .pad-left[ - UniFrac “exploits the different degrees of similarity between sequences”: - “the unique fraction metric, or UniFrac, measures the phylogenetic distance between sets of taxa in a phylogenetic tree as the fraction of the branch length of the tree that leads to descendants from either one environment or the other, but not both” - “captures the total amount of evolution that is unique to each state, presumably reflecting adaptation to one environment that would be deleterious in the other” (designed to be based on rRNA) ] --- background-image: url(data:image/png;base64,#img/unifrac_from_tree.png) background-size: contain --- background-image: url(data:image/png;base64,#img/legendre_distances.png) background-size: contain --- # Beta Diversity: Which Distance Metric? .pad-left[ - Why use Jaccard? UniFrac? - Why use weighted? Unweighted? ] --- background-image: url(data:image/png;base64,#svg/distance.svg) background-size: 500px background-position: 85% 50% class: center, middle, inverse # Principal Coordinates --- # Original Discriptors ⇾ PCA .pad-left[ - PCA: principal component analysis - rigid rotation for successive directions of maximum variance - lots of restrictions (Euclidean) - but allows projection of original descriptors in PCA space ] --- # Pairwise Distances ⇾ PCoA .pad-left[ - PCoA: principal coordinate analysis - any metric distance, even if non-Euclidean - like PCA, eigenvalue decomposition (maximum variance) but mediated by distance function (no original descriptors) - unlike PCA, does not allow projection of original descriptors in reduced-dimension space ] --- background-image: url(data:image/png;base64,#img/pca_vs_pcoa.png) background-size: contain --- background-image: url(data:image/png;base64,#img/weighted_unifrac.png) background-size: contain --- background-image: url(data:image/png;base64,#img/unweighted_unifrac.png) background-size: contain --- background-image: url(data:image/png;base64,#img/weighted_jaccard.png) background-size: contain --- background-image: url(data:image/png;base64,#img/unweighted_jaccard.png) background-size: contain --- background-image: url(data:image/png;base64,#svg/distance.svg) background-size: 500px background-position: 85% 50% class: center, middle, inverse # PERMANOVA --- # Pairwise Distances ⇾ PERMANOVA .pad-left[ - Pairwise distance matrix can be partitioned by group assignment and ANOVA-like analysis can be applied to detect difference between groups - __PERMANOVA__: permutational ANOVA (aka, adonis) - pseudo F-ratio: conceptually similar but not F-distributed - testing by label permutation - quantification of effect size by R-squared or omega-squared - (the latter a less biased estimator of true effect) ] --- background-image: url(data:image/png;base64,#img/legendre_otu_to_dm.png) background-size: contain --- background-image: url(data:image/png;base64,#img/within_vs_between_group.png) background-size: contain --- background-image: url(data:image/png;base64,#img/within_between_weighted_unifrac.png) background-size: contain --- background-image: url(data:image/png;base64,#img/within_between_unweighted_unifrac.png) background-size: contain --- background-image: url(data:image/png;base64,#img/within_between_weighted_jaccard.png) background-size: contain --- background-image: url(data:image/png;base64,#img/within_between_unweighted_jaccard.png) background-size: contain --- background-image: url(data:image/png;base64,#img/kelly_bioinformatics.png) background-size: contain --- background-image: url(data:image/png;base64,#img/kelly_bioinformatics_highlight.png) background-size: contain --- background-image: url(data:image/png;base64,#img/kelly_bioinformatics_power.png) background-size: contain --- background-image: url(data:image/png;base64,#svg/distance.svg) background-size: 500px background-position: 85% 50% class: center, middle, inverse # R's vegan package --- # `vegan::vegdist()` .pull-left[ ```r # install.packages("tidyverse") library(tidyverse) # install.packages("vegan") library(vegan) otu_long <- read_csv( "./data/HMP_OTU_table_longformat_stool_nares.csv.gz" ) otu_long ``` ] .pull-right[ ``` ## # A tibble: 431,400 x 4 ## otu_id specimen_id read_count HMPbodysubsite ## <chr> <dbl> <dbl> <chr> ## 1 OTU_97.1 700014718 0 Stool ## 2 OTU_97.10 700014718 0 Stool ## 3 OTU_97.100 700014718 0 Stool ## 4 OTU_97.1000 700014718 0 Stool ## 5 OTU_97.10000 700014718 0 Stool ## 6 OTU_97.10001 700014718 0 Stool ## 7 OTU_97.10002 700014718 0 Stool ## 8 OTU_97.10003 700014718 0 Stool ## 9 OTU_97.10004 700014718 0 Stool ## 10 OTU_97.10005 700014718 0 Stool ## # … with 431,390 more rows ``` ] --- # `vegan::vegdist()` .pull-left[ ```r otu_matrix <- read_rds( "./data/HMP_OTU_table_matrix_stool_nares.rds" ) otu_matrix %>% str(vec.len = 2) ``` ] .pull-right[ ``` ## num [1:43140, 1:10] 0 0 0 0 0 ... ## - attr(*, "dimnames")=List of 2 ## ..$ : chr [1:43140] "OTU_97.1" "OTU_97.10" ... ## ..$ : chr [1:10] "700014718" "700014767" ... ``` ] --- # `vegan::vegdist()` .pull-left[ ```r otu_matrix <- read_rds( "./data/HMP_OTU_table_matrix_stool_nares.rds" ) otu_matrix %>% * t() %>% # TRANSPOSE str(vec.len = 2) ``` ] .pull-right[ ``` ## num [1:10, 1:43140] 0 0 0 0 0 ... ## - attr(*, "dimnames")=List of 2 ## ..$ : chr [1:10] "700014718" "700014767" ... ## ..$ : chr [1:43140] "OTU_97.1" "OTU_97.10" ... ``` ] --- # `vegan::vegdist()` .pull-left[ ```r otu_matrix <- read_rds( "./data/HMP_OTU_table_matrix_stool_nares.rds" ) otu_matrix %>% * t() %>% #TRANSPOSE * vegdist(x = ., * method = "jaccard", * binary = TRUE) %>% str(vec.len=2) ``` ] .pull-right[ ``` ## 'dist' num [1:45] 1 0.982 ... ## - attr(*, "Size")= int 10 ## - attr(*, "Labels")= chr [1:10] "700014718" "700014767" ... ## - attr(*, "Diag")= logi FALSE ## - attr(*, "Upper")= logi FALSE ## - attr(*, "method")= chr "binary jaccard" ## - attr(*, "call")= language vegdist(x = ., method = "jaccard", binary = TRUE) ``` ] --- # `vegan::vegdist()` .pull-left[ ```r otu_matrix <- read_rds( "./data/HMP_OTU_table_matrix_stool_nares.rds" ) otu_matrix %>% * t() %>% #TRANSPOSE * vegdist(x = ., * method = "jaccard", * binary = TRUE) %>% * as.matrix() %>% str(vec.len=2) ``` ] .pull-right[ ``` ## num [1:10, 1:10] 0 1 ... ## - attr(*, "dimnames")=List of 2 ## ..$ : chr [1:10] "700014718" "700014767" ... ## ..$ : chr [1:10] "700014718" "700014767" ... ``` ] --- background-image: url(data:image/png;base64,#img/legendre_otu_to_dm.png) background-size: contain --- class: center, middle, inverse background-image: url(data:image/png;base64,#svg/conjugation.svg) background-size: 500px background-position: 85% 50% # Questions? ### Post to the discussion board! --- background-image: url(data:image/png;base64,#svg/bacteria.svg) background-size: 100px background-position: 98% 90% class: center, middle # Thank you! #### Slides available: [github.com/bjklab](https://github.com/bjklab/EPID674_002_sequences-to-counts.git) #### [brendank@pennmedicine.upenn.edu](brendank@pennmedicine.upenn.edu)