class: inverse, left, nonum, clear background-image: url("figs/cover.jpg") background-size: cover <link rel="stylesheet" type="text/css" href="https://fonts.googleapis.com/css?family=Oswald" /> .titlestyle[] <br> .titlestyle[Describing] <br> .titlestyle[composition and configuration of patterns] <br> .titlestyle[in] <br> .titlestyle[categorical] <br> .titlestyle[rasters] .captionstyle[Jakub Nowosad] .pull-right2[.captionstyle[2021-01-15]] <!-- idea: background - entropy map with 3d rayshader --> --- # Spatial patterns - **Discovering and describing patterns is a vital part of many spatial analysis** - However, spatial data is gathered in many ways and forms, which requires different approaches to understanding spatial patterns .pull-left[ <img src="figs/covid.png" width="1445" style="display: block; margin: auto;" /> .font60[*https://www.ft.com/content/a2901ce8-5eb7-4633-b89c-cbdf5b386938*] ] .pull-right[ <img src="figs/temperature.png" width="1703" style="display: block; margin: auto;" /> .font60[*https://climate.copernicus.eu/copernicus-2020-warmest-year-record-europe-globally-2020-ties-2016-warmest-year-recorded*] ] <!-- Other methods are applied when we work with numerical or categorical variables, also other methods are used to find patterns in point datasets, lines datasets, or raster datasets. --> --- # Spatial patterns in categorical rasters - **Categorical rasters express spatial patterns by two inter-related properties**: composition and configuration - **Composition** shows how many different categories we have, and how much area they occupy - **Configuration** focuses on the spatial arrangement of the categories <br> <img src="figs/lc_map3.png" width="100%" style="display: block; margin: auto;" /> <br> -- - There is a relationship between an area's pattern composition and configuration and ecosystem characteristics, such as vegetation diversity, animal distributions, and water quality within this area (*Hunsaker i Levine, 1995; Fahrig i Nuttle, 2005; Klingbeil i Willig, 2009; Holzschuh et al., 2010; Fahrig et al., 2011; Carrara et al., 2015; Arroyo-Rodŕıguez et al. 2016; Duflot et al., 2017, many others..*) - **Understanding and quantifying of spatial patterns is also useful in many** other **fields**, including demography or medicine... --- # Importance of spatial patterns .pull-left[ **Assessing the ecological vulnerability of forest landscape to agricultural frontier expansion:** <img src="figs/wietnam-paper.png" width="50%" style="display: block; margin: auto;" /> *Bourgoin et al., 2020, https://doi.org/10.1016/j.jag.2019.101958* ] -- .pull-right[ **Reinterpreting classified histological images as categorical rasters and using them for disease-classification (e.g., liver cancer): ** <img src="figs/histo-paper.png" width="65%" style="display: block; margin: auto;" /> *Kendall et al., 2020, https://doi.org/10.1038/s41598-020-74691-9* ] --- # Importance of spatial patterns .lc[ **Quantifing racial diversity and segregation:** <img src="figs/raceland-fig.png" width="85%" style="display: block; margin: auto;" /> *Dmowska et at., 2020, https://doi.org/10.1016/j.apgeog.2020.102239* ] .rc[ <img src="figs/raceland-paper.png" width="85%" style="display: block; margin: auto;" /> ] --- class: inverse, mline, center, middle, clear # Problem <h2><center>How can we universally quantify composition and configuration of patterns?</center></h2> --- # Example data .lc[ - [Land cover data for the year 2016 from the CCI-LC project](https://cds.climate.copernicus.eu/cdsapp#!/dataset/satellite-land-cover?tab=overview) - Simplified into nine main categories ] .rc[ <img src="figs/lc_map1.png" width="75%" style="display: block; margin: auto;" /> ] --- # Example data .lc[ - [Land cover data for the year 2016 from the CCI-LC project](https://cds.climate.copernicus.eu/cdsapp#!/dataset/satellite-land-cover?tab=overview) - Simplified into nine main categories - Partitioned into **30 x 30 kilometers square blocks** - 13,909 categorical rasters (100x100 cells) ] .rc[ <img src="figs/lc_map2.png" width="75%" style="display: block; margin: auto;" /> ] --- # Example data **I randomely selected 16 rasters** with different proportions of forest (green) areas: <br><br> <img src="figs/rid_map.png" width="3200" style="display: block; margin: auto;" /> --- # Boltzmann entropy *Cushman, 2018, https://doi.org/10.3390/e20040298*: - Entropy of a categorical raster is **related to the number of ways a raster** with a given dimensionality and number of classes **can be arranged** to produce the same total amount of edge between cells of different classes -- - **Problem no. 1:** intractably large numbers of possible arrangements of raster cells in large landscapes -- - **Partial solution to problem no. 1:** linear model as a function of the size, patch richness, and diversity of a landscape <img src="figs/cushman_map.png" width="3200" style="display: block; margin: auto;" /> -- - **Problem no. 2:** the above model is not universal -- - **Problem no. 3:** is one metric enough to describe rasters' composition and configuration? --- # Landscape metrics <!-- simple, several metrics - e.g., SHDI, something related to aggregation --> <!-- - problem - which one to use? how we can do that? --> .lc[ Landscape metrics often describes **Spatial patterns in categorical raster data** (landscape indices) - In the last 40 or so years, several hundred different spatial metrics were developed - **SHDI** - [Shannon's diversity index](https://r-spatialecology.github.io/landscapemetrics/reference/lsm_l_shdi.html) - takes both the number of classes and the abundance of each class into account - **AI** - [Aggregation index](https://r-spatialecology.github.io/landscapemetrics/reference/lsm_l_ai.html) - from 0 for maximally disaggregated to 100 for maximally aggregated classes ] -- .rc[ **SHDI:** <img src="figs/shdi_map.png" width="3200" style="display: block; margin: auto;" /> <br> **AI:** <img src="figs/ai_map.png" width="3200" style="display: block; margin: auto;" /> ] --- # Landscape metrics .pull-left[ - **Problem no. 1:** which of the hundreds of spatial metrics should we choose? - **Problem no. 2:** many landscape metrics are highly correlated... ] .pull-right[ <img src="figs/shdi_ai_map.png" width="2740" style="display: block; margin: auto;" /> ] --- # PCA of landscape metrics <!-- Empirical descriptors --> .lc[ - I performed a **principal component analysis (PCA) using 17 landscape-level metrics**: <table> <thead> <tr> <th style="text-align:left;"> Type </th> <th style="text-align:left;"> Landscape-level metrics </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Shape </td> <td style="text-align:left;"> PAFRAG; CONTIG AM; CONTIG RA </td> </tr> <tr> <td style="text-align:left;"> Aggregation </td> <td style="text-align:left;"> AI; CONTAG; IJI; PLATJ; PD; DIVISION; LPI </td> </tr> <tr> <td style="text-align:left;"> Connectivity </td> <td style="text-align:left;"> COHESION </td> </tr> <tr> <td style="text-align:left;"> Diversity </td> <td style="text-align:left;"> SHDI; SIDI; MSIDI; SHEI; SIEI; MSIEI </td> </tr> </tbody> </table> - First two principal components explained **~71% of variability** ] -- .rc[ **PC1:** <img src="figs/PC1_map.png" width="3200" style="display: block; margin: auto;" /> <br> **PC2:** <img src="figs/PC2_map.png" width="3200" style="display: block; margin: auto;" /> ] --- # PCA of landscape metrics .pull-left[ The result allows to distinguish between: - **simple** and **complex** rasters (left<->right) - **fragmented** and **consolidated** rasters (bottom<->top) However, there are still some problems here... ] .pull-right[ <img src="figs/PC1_PC2_map.png" width="2736" style="display: block; margin: auto;" /> ] --- # PCA of landscape metrics .lc[ - I performed **a second PCA just using data from the United Kingdom only** - Next, I predict the results on the data for the whole Europe ] .rc[ **PC1:** <img src="figs/PC1UK_map.png" width="3200" style="display: block; margin: auto;" /> <br> **PC2:** <img src="figs/PC2UK_map.png" width="3200" style="display: block; margin: auto;" /> ] <!-- https://nowosad.github.io/iale_19/presentation/#7 --> <!-- - problem - --> --- # PCA of landscape metrics .pull-left[ **Issues with the PCA approach:** - Each new dataset requires recalculation of both, landscape metrics and principal components analysis (PCA) - Highly correlated landscape metrics are used - PCA results interpretation is not straightforward ] .pull-right[ <img src="figs/PC1UK_PC2UK_map.png" width="2736" style="display: block; margin: auto;" /> ] --- # IT metrics <!-- - Different patterns generate different co-occurrence matrices --> <!-- - **Marginal entropy [H(x)]** and **relative mutual information [U]** were used in this study --> <!-- Important note: when entropy is zero, when we set relative mutual information to 1 --> .lc[ - Five information theory metrics based on a co-occurrence matrix exist (*Nowosad and Stepinski, 2019, https://doi.org/10.1007/s10980-019-00830-x*) - **Marginal entropy [H(x)]** - diversity (*composition*) of spatial categories - from monothematic patterns to multithematic patterns - **Relative mutual information [U]** - clumpiness (*configuration*) of spatial categories from fragmented patterns to consolidated patterns) - **H(x) and U** are uncorrelated ] .rc[ **Entropy:** <img src="figs/ent_map.png" width="3200" style="display: block; margin: auto;" /> <br> **Relative mutual information:** <img src="figs/relmutinf_map.png" width="3200" style="display: block; margin: auto;" /> ] --- # IT metrics .pull-left[ **2D parametrization** of categorical rasters' configurations based on two weakly correlated IT metrics **groups similar patterns into distinct regions** of the parameters space ] .pull-right[ <img src="figs/ent_relmutinf_map.png" width="2741" style="display: block; margin: auto;" /> ] --- # IT metrics - final results .pull-left[ **Land cover data:** <img src="figs/lc_map1.png" width="2600" style="display: block; margin: auto;" /> ] -- .pull-right[ **Parametrization using two IT metrics:** <img src="figs/2dmap_ent_relmutinf.png" width="2581" style="display: block; margin: auto;" /> ] --- # IT metrics .pull-left[ These metrics still leave some questions open... - Relative mutual information is a result of dividing mutual information by entropy. **What to do when the entropy is zero?** - **How to incorporate the meaning of categories into the analysis?** ] .pull-right[ **Parametrization using two IT metrics:** <img src="figs/2dmap_ent_relmutinf.png" width="2581" style="display: block; margin: auto;" /> ] --- # Related questions and problems .pull-left[ <img src="figs/Composition_A_by_Piet_Mondrian_Galleria_Nazionale_d'Arte_Moderna_e_Contemporanea.jpg" width="90%" style="display: block; margin: auto;" /> *Composition A by Piet Mondrian* ] .pull-right[ **Depending on the problem:** - Is the categorical raster the type of data we should use? - Do raster cells represent objects, or do they are the result of some classification/aggregation? - How to incorporate the meaning of categories into the analysis? - How to decide on the extent of the study area? - What is the optimal data resolution? - What is the scale of the process we want to study? How to decide which scale is valid? - ... ] <!-- - Data type --> <!-- - Resulution --> <!-- - Extent --> <!-- - Moving window?? --> <!-- - Performance?? --> <!-- - Using of non-categorical rasters instead or even point clouds --> <!-- - How to incorporate categories? --> <!-- Next, patterns and their relevance depend on a studied scale, with different patterns found on small or large scales, or data of different spatial resolutions. Finally, the way we describe patterns should depend on our main goal. --> --- class: left, top, clear .pull-left[ ## Summary - **Marginal entropy** and **relative mutual information** are universal indicators of categorical raster configuration and composition - These metrics are **not dependent on a specific category in raster** (but this information can be incorporated into analyses) - They can be applied for studies in many fields, including ecology, demography, and medicine - There are **still many questions and problems to solve**, including scale-dependence of the patterns, selection of input data, or extent of the study area ## Contact: Twitter: <svg role="img" viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg" style="height:1em;fill:currentColor;position:relative;display:inline-block;top:.1em;"> <title></title> <path d="M23.954 4.569c-.885.389-1.83.654-2.825.775 1.014-.611 1.794-1.574 2.163-2.723-.951.555-2.005.959-3.127 1.184-.896-.959-2.173-1.559-3.591-1.559-2.717 0-4.92 2.203-4.92 4.917 0 .39.045.765.127 1.124C7.691 8.094 4.066 6.13 1.64 3.161c-.427.722-.666 1.561-.666 2.475 0 1.71.87 3.213 2.188 4.096-.807-.026-1.566-.248-2.228-.616v.061c0 2.385 1.693 4.374 3.946 4.827-.413.111-.849.171-1.296.171-.314 0-.615-.03-.916-.086.631 1.953 2.445 3.377 4.604 3.417-1.68 1.319-3.809 2.105-6.102 2.105-.39 0-.779-.023-1.17-.067 2.189 1.394 4.768 2.209 7.557 2.209 9.054 0 13.999-7.496 13.999-13.986 0-.209 0-.42-.015-.63.961-.689 1.8-1.56 2.46-2.548l-.047-.02z"></path></svg> [jakub_nowosad](https://twitter.com/jakub_nowosad) Website: https://nowosad.github.io ] .pull-right[ <img src="figs/2dmap_ent_relmutinf.png" width="80%" style="display: block; margin: auto;" /> ## Resources: - **Slides:** [nowosad.github.io/rasters-revealed](https://nowosad.github.io/rasters-revealed) <!-- - **Code:** --> - **Software:** R packages [motif](https://nowosad.github.io/motif/) and [landscapemetrics](https://r-spatialecology.github.io/landscapemetrics/index.html) - **Blog post:** [Information theory provides a consistent framework for the analysis of spatial patterns](https://nowosad.github.io/post/ent-bp1/) ]