`R/clonalDiversity.R`

`clonalDiversity.Rd`

This function calculates traditional measures of diversity - **Shannon**,
**inverse Simpson**, **normalized entropy**, **Gini-Simpson**, **Chao1 index**, and
**abundance-based coverage estimators (ACE)** measure of species evenness by sample or group.
The function automatically down samples the diversity metrics using
100 boot straps (**n.boots = 100**) and outputs the mean of the values.
The group parameter can be used to condense the individual
samples. If a matrix output for the data is preferred, set **exportTable** = TRUE.

```
clonalDiversity(
input.data,
cloneCall = "strict",
chain = "both",
group.by = NULL,
order.by = NULL,
x.axis = NULL,
metrics = c("shannon", "inv.simpson", "norm.entropy", "gini.simpson", "chao1", "ACE"),
exportTable = FALSE,
palette = "inferno",
n.boots = 100,
return.boots = FALSE,
skip.boots = FALSE
)
```

- input.data
The product of

`combineTCR`

,`combineBCR`

, or`combineExpression`

.- cloneCall
How to call the clone - VDJC gene (

**gene**), CDR3 nucleotide (**nt**), CDR3 amino acid (**aa**), VDJC gene + CDR3 nucleotide (**strict**) or a custom variable in the data- chain
indicate if both or a specific chain should be used - e.g. "both", "TRA", "TRG", "IGH", "IGL"

- group.by
Variable in which to combine for the diversity calculation

- order.by
A vector of specific plotting order or "alphanumeric" to plot groups in order

- x.axis
Additional variable grouping that will space the sample along the x-axis

- metrics
The indices to use in diversity calculations - "shannon", "inv.simpson", "norm.entropy", "gini.simpson", "chao1", "ACE"

- exportTable
Exports a table of the data into the global environment in addition to the visualization

- palette
Colors to use in visualization - input any hcl.pals

- n.boots
number of bootstraps to down sample in order to get mean diversity

- return.boots
export boot strapped values calculated - will automatically exportTable = TRUE.

- skip.boots
remove down sampling and boot strapping from the calculation.

ggplot of the diversity of clones by group

The formulas for the indices and estimators are as follows:

**Shannon Index:**
$$Index = - \sum p_i * \log(p_i)$$

**Inverse Simpson Index:**
$$Index = \frac{1}{(\sum_{i=1}^{S} p_i^2)}$$

**Normalized Entropy:**
$$Index = -\frac{\sum_{i=1}^{S} p_i \ln(p_i)}{\ln(S)}$$

**Gini-Simpson Index:**
$$Index = 1 - \sum_{i=1}^{S} p_i^2$$

**Chao1 Index:**
$$Index = S_{obs} + \frac{n_1(n_1-1)}{2*n_2+1}$$

**Abundance-based Coverage Estimator (ACE):**
$$Index = S_{abund} + \frac{S_{rare}}{C_{ace}} + \frac{F_1}{C_{ace}}$$

Where:

\(p_i\) is the proportion of species \(i\) in the dataset.

\(S\) is the total number of species.

\(n_1\) and \(n_2\) are the number of singletons and doubletons, respectively.

\(S_{abund}\), \(S_{rare}\), \(C_{ace}\), and \(F_1\) are parameters derived from the data.

```
#Making combined contig data
combined <- combineTCR(contig_list,
samples = c("P17B", "P17L", "P18B", "P18L",
"P19B","P19L", "P20B", "P20L"))
clonalDiversity(combined, cloneCall = "gene")
```