Describing composition and configuration of patterns in categorical rasters

class: inverse, left, nonum, clear
background-image: url("figs/cover.jpg")
background-size: cover

.titlestyle[] <br>
.titlestyle[Describing] <br>
.titlestyle[composition and configuration of patterns] <br>
.titlestyle[in] <br>
.titlestyle[categorical] <br>
.titlestyle[rasters]

.captionstyle[Jakub Nowosad]
.pull-right2[.captionstyle[2021-01-15]]

---
# Spatial patterns

- **Discovering and describing patterns is a vital part of many spatial analysis**
- However, spatial data is gathered in many ways and forms, which requires different approaches to understanding spatial patterns

.pull-left[
<img src="figs/covid.png" width="1445" style="display: block; margin: auto;" />
.font60[*https://www.ft.com/content/a2901ce8-5eb7-4633-b89c-cbdf5b386938*]
]

.pull-right[
<img src="figs/temperature.png" width="1703" style="display: block; margin: auto;" />
.font60[*https://climate.copernicus.eu/copernicus-2020-warmest-year-record-europe-globally-2020-ties-2016-warmest-year-recorded*]
]

---
# Spatial patterns in categorical rasters

- **Categorical rasters express spatial patterns by two inter-related properties**: composition and configuration
- **Composition** shows how many different categories we have, and how much area they occupy
- **Configuration** focuses on the spatial arrangement of the categories

- There is a relationship between an area's pattern composition and configuration and ecosystem characteristics, such as vegetation diversity, animal distributions, and water quality within this area
(*Hunsaker i Levine, 1995; Fahrig i Nuttle, 2005; Klingbeil i Willig, 2009; Holzschuh et al., 2010; Fahrig et al., 2011; Carrara et al., 2015; Arroyo-Rodŕıguez et al. 2016; Duflot et al., 2017, many others..*)
- **Understanding and quantifying of spatial patterns is also useful in many** other **fields**, including demography or medicine...

---
# Importance of spatial patterns

.pull-left[
**Assessing the ecological vulnerability of forest landscape to agricultural frontier expansion:**

*Bourgoin et al., 2020, https://doi.org/10.1016/j.jag.2019.101958*
]

.pull-right[
**Reinterpreting classified histological images as categorical rasters and using them for disease-classification (e.g., liver cancer): **

*Kendall et al., 2020, https://doi.org/10.1038/s41598-020-74691-9*
]

---
# Importance of spatial patterns

.lc[
**Quantifing racial diversity and segregation:**

*Dmowska et at., 2020, https://doi.org/10.1016/j.apgeog.2020.102239*
]

.rc[
<img src="figs/raceland-paper.png" width="85%" style="display: block; margin: auto;" />
]

---
class: inverse, mline, center, middle, clear

# Problem

<h2><center>How can we universally quantify composition and configuration of patterns?</center></h2>

---
# Example data

.lc[
- [Land cover data for the year 2016 from the CCI-LC project](https://cds.climate.copernicus.eu/cdsapp#!/dataset/satellite-land-cover?tab=overview)
- Simplified into nine main categories
]
.rc[
<img src="figs/lc_map1.png" width="75%" style="display: block; margin: auto;" />
]
---
# Example data

.lc[
- [Land cover data for the year 2016 from the CCI-LC project](https://cds.climate.copernicus.eu/cdsapp#!/dataset/satellite-land-cover?tab=overview)
- Simplified into nine main categories
- Partitioned into **30 x 30 kilometers square blocks**
- 13,909 categorical rasters (100x100 cells)
]
.rc[
<img src="figs/lc_map2.png" width="75%" style="display: block; margin: auto;" />
]
---
# Example data

**I randomely selected 16 rasters** with different proportions of forest (green) areas:

---
# Boltzmann entropy

*Cushman, 2018, https://doi.org/10.3390/e20040298*:

- Entropy of a categorical raster is **related to the number of ways a raster** with a given dimensionality and number of classes **can be arranged** to produce the same total amount of edge between cells of different classes
--

- **Problem no. 1:** intractably large numbers of possible arrangements of raster cells in large landscapes
--

- **Partial solution to problem no. 1:** linear model as a function of the size, patch richness, and diversity of a landscape

- **Problem no. 2:** the above model is not universal
--

- **Problem no. 3:** is one metric enough to describe rasters' composition and configuration?

---
# Landscape metrics

.lc[
Landscape metrics often describes **Spatial patterns in categorical raster data** (landscape indices)
- In the last 40 or so years, several hundred different spatial metrics were developed
- **SHDI** - [Shannon's diversity index](https://r-spatialecology.github.io/landscapemetrics/reference/lsm_l_shdi.html) -  takes both the number of classes and the abundance of each class into account
- **AI** - [Aggregation index](https://r-spatialecology.github.io/landscapemetrics/reference/lsm_l_ai.html) - from 0 for maximally disaggregated to 100 for maximally aggregated classes
]

.rc[
**SHDI:**
<img src="figs/shdi_map.png" width="3200" style="display: block; margin: auto;" />
<br>
**AI:**
<img src="figs/ai_map.png" width="3200" style="display: block; margin: auto;" />
]

---
# Landscape metrics

.pull-left[
- **Problem no. 1:** which of the hundreds of spatial metrics should we choose?
- **Problem no. 2:** many landscape metrics are highly correlated...

]
.pull-right[
<img src="figs/shdi_ai_map.png" width="2740" style="display: block; margin: auto;" />
]

---
# PCA of landscape metrics

.lc[
- I performed a **principal component analysis (PCA) using 17 landscape-level metrics**:

<table>
 <thead>
  <tr>
   <th style="text-align:left;"> Type </th>
   <th style="text-align:left;"> Landscape-level metrics </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> Shape </td>
   <td style="text-align:left;"> PAFRAG;  CONTIG AM;  CONTIG RA </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Aggregation </td>
   <td style="text-align:left;"> AI;  CONTAG;  IJI;  PLATJ;  PD;  DIVISION;   LPI </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Connectivity </td>
   <td style="text-align:left;"> COHESION </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Diversity </td>
   <td style="text-align:left;"> SHDI;   SIDI;   MSIDI;  SHEI;  SIEI;  MSIEI </td>
  </tr>
</tbody>
</table>
- First two principal components explained **~71% of variability**

]

.rc[
**PC1:**
<img src="figs/PC1_map.png" width="3200" style="display: block; margin: auto;" />
<br>
**PC2:**
<img src="figs/PC2_map.png" width="3200" style="display: block; margin: auto;" />
]

---
# PCA of landscape metrics

.pull-left[
The result allows to distinguish between:

- **simple** and **complex** rasters (left<->right)
- **fragmented** and **consolidated** rasters (bottom<->top)

However, there are still some problems here...
]
.pull-right[
<img src="figs/PC1_PC2_map.png" width="2736" style="display: block; margin: auto;" />
]

---
# PCA of landscape metrics

.lc[
- I performed **a second PCA just using data from the United Kingdom only**
- Next, I predict the results on the data for the whole Europe
]
.rc[
**PC1:**
<img src="figs/PC1UK_map.png" width="3200" style="display: block; margin: auto;" />
<br>
**PC2:**
<img src="figs/PC2UK_map.png" width="3200" style="display: block; margin: auto;" />
]

---
# PCA of landscape metrics

.pull-left[
**Issues with the PCA approach:**

- Each new dataset requires recalculation of both, landscape metrics and principal components analysis (PCA)
- Highly correlated landscape metrics are used
- PCA results interpretation is not straightforward
]
.pull-right[
<img src="figs/PC1UK_PC2UK_map.png" width="2736" style="display: block; margin: auto;" />
]

---
# IT metrics

.lc[
- Five information theory metrics based on a co-occurrence matrix exist (*Nowosad and Stepinski, 2019, https://doi.org/10.1007/s10980-019-00830-x*)
- **Marginal entropy [H(x)]** - diversity (*composition*) of spatial categories - from monothematic patterns to multithematic patterns
- **Relative mutual information [U]** - clumpiness (*configuration*) of spatial categories from fragmented patterns to consolidated patterns)
- **H(x) and U** are uncorrelated
]
.rc[
**Entropy:**
<img src="figs/ent_map.png" width="3200" style="display: block; margin: auto;" />
<br>
**Relative mutual information:**
<img src="figs/relmutinf_map.png" width="3200" style="display: block; margin: auto;" />
]

---
# IT metrics

.pull-left[
**2D parametrization** of categorical rasters' configurations based on two weakly correlated IT metrics **groups similar patterns into distinct regions** of the parameters space
]
.pull-right[
<img src="figs/ent_relmutinf_map.png" width="2741" style="display: block; margin: auto;" />
]

---
# IT metrics - final results

.pull-left[
**Land cover data:**
<img src="figs/lc_map1.png" width="2600" style="display: block; margin: auto;" />
]

.pull-right[
**Parametrization using two IT metrics:**
<img src="figs/2dmap_ent_relmutinf.png" width="2581" style="display: block; margin: auto;" />
]

---
# IT metrics

.pull-left[
These metrics still leave some questions open...

- Relative mutual information is a result of dividing mutual information by entropy.
**What to do when the entropy is zero?**
- **How to incorporate the meaning of categories into the analysis?**
]
.pull-right[
**Parametrization using two IT metrics:**
<img src="figs/2dmap_ent_relmutinf.png" width="2581" style="display: block; margin: auto;" />
]

---
# Related questions and problems

.pull-left[
<img src="figs/Composition_A_by_Piet_Mondrian_Galleria_Nazionale_d'Arte_Moderna_e_Contemporanea.jpg" width="90%" style="display: block; margin: auto;" />
*Composition A by Piet Mondrian*
]

.pull-right[
**Depending on the problem:**
- Is the categorical raster the type of data we should use?
- Do raster cells represent objects, or do they are the result of some classification/aggregation?
- How to incorporate the meaning of categories into the analysis?
- How to decide on the extent of the study area?
- What is the optimal data resolution?
- What is the scale of the process we want to study? How to decide which scale is valid?
- ...
]

---
class: left, top, clear

.pull-left[
## Summary

- **Marginal entropy** and **relative mutual information** are universal indicators of categorical raster configuration and composition
- These metrics are **not dependent on a specific category in raster** (but this information can be incorporated into analyses)
- They can be applied for studies in many fields, including ecology, demography, and medicine
- There are **still many questions and problems to solve**, including scale-dependence of the patterns, selection of input data, or extent of the study area

## Contact:

Twitter: <svg role="img" viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg" style="height:1em;fill:currentColor;position:relative;display:inline-block;top:.1em;">  <title></title>  <path d="M23.954 4.569c-.885.389-1.83.654-2.825.775 1.014-.611 1.794-1.574 2.163-2.723-.951.555-2.005.959-3.127 1.184-.896-.959-2.173-1.559-3.591-1.559-2.717 0-4.92 2.203-4.92 4.917 0 .39.045.765.127 1.124C7.691 8.094 4.066 6.13 1.64 3.161c-.427.722-.666 1.561-.666 2.475 0 1.71.87 3.213 2.188 4.096-.807-.026-1.566-.248-2.228-.616v.061c0 2.385 1.693 4.374 3.946 4.827-.413.111-.849.171-1.296.171-.314 0-.615-.03-.916-.086.631 1.953 2.445 3.377 4.604 3.417-1.68 1.319-3.809 2.105-6.102 2.105-.39 0-.779-.023-1.17-.067 2.189 1.394 4.768 2.209 7.557 2.209 9.054 0 13.999-7.496 13.999-13.986 0-.209 0-.42-.015-.63.961-.689 1.8-1.56 2.46-2.548l-.047-.02z"></path></svg> [jakub_nowosad](https://twitter.com/jakub_nowosad)

Website: https://nowosad.github.io

]

.pull-right[
<img src="figs/2dmap_ent_relmutinf.png" width="80%" style="display: block; margin: auto;" />

## Resources:

- **Slides:** [nowosad.github.io/rasters-revealed](https://nowosad.github.io/rasters-revealed)

- **Software:** R packages [motif](https://nowosad.github.io/motif/) and [landscapemetrics](https://r-spatialecology.github.io/landscapemetrics/index.html)
- **Blog post:** [Information theory provides a consistent framework for the analysis of spatial patterns](https://nowosad.github.io/post/ent-bp1/)

]