This function calculates traditional measures of diversity - Shannon, inverse Simpson, normalized entropy, Gini-Simpson, Chao1 index, and abundance-based coverage estimators (ACE) measure of species evenness by sample or group. The function automatically down samples the diversity metrics using 100 boot straps (n.boots = 100) and outputs the mean of the values. The group parameter can be used to condense the individual samples. If a matrix output for the data is preferred, set exportTable = TRUE.

clonalDiversity(
  input.data,
  cloneCall = "strict",
  chain = "both",
  group.by = NULL,
  order.by = NULL,
  x.axis = NULL,
  metrics = c("shannon", "inv.simpson", "norm.entropy", "gini.simpson", "chao1", "ACE"),
  exportTable = FALSE,
  palette = "inferno",
  n.boots = 100,
  return.boots = FALSE,
  skip.boots = FALSE
)

Arguments

input.data

The product of combineTCR(), combineBCR(), or combineExpression().

cloneCall

How to call the clone - VDJC gene (gene), CDR3 nucleotide (nt), CDR3 amino acid (aa), VDJC gene + CDR3 nucleotide (strict) or a custom variable in the data

chain

indicate if both or a specific chain should be used - e.g. "both", "TRA", "TRG", "IGH", "IGL"

group.by

Variable in which to combine for the diversity calculation

order.by

A vector of specific plotting order or "alphanumeric" to plot groups in order

x.axis

Additional variable grouping that will space the sample along the x-axis

metrics

The indices to use in diversity calculations - "shannon", "inv.simpson", "norm.entropy", "gini.simpson", "chao1", "ACE"

exportTable

Exports a table of the data into the global environment in addition to the visualization

palette

Colors to use in visualization - input any hcl.pals

n.boots

number of bootstraps to down sample in order to get mean diversity

return.boots

export boot strapped values calculated - will automatically exportTable = TRUE.

skip.boots

remove down sampling and boot strapping from the calculation.

Value

ggplot of the diversity of clones by group

Details

The formulas for the indices and estimators are as follows:

Shannon Index: $$Index = - \sum p_i * \log(p_i)$$

Inverse Simpson Index: $$Index = \frac{1}{(\sum_{i=1}^{S} p_i^2)}$$

Normalized Entropy: $$Index = -\frac{\sum_{i=1}^{S} p_i \ln(p_i)}{\ln(S)}$$

Gini-Simpson Index: $$Index = 1 - \sum_{i=1}^{S} p_i^2$$

Chao1 Index: $$Index = S_{obs} + \frac{n_1(n_1-1)}{2*n_2+1}$$

Abundance-based Coverage Estimator (ACE): $$Index = S_{abund} + \frac{S_{rare}}{C_{ace}} + \frac{F_1}{C_{ace}}$$

Where:

  • \(p_i\) is the proportion of species \(i\) in the dataset.

  • \(S\) is the total number of species.

  • \(n_1\) and \(n_2\) are the number of singletons and doubletons, respectively.

  • \(S_{abund}\), \(S_{rare}\), \(C_{ace}\), and \(F_1\) are parameters derived from the data.

Author

Andrew Malone, Nick Borcherding

Examples

#Making combined contig data
combined <- combineTCR(contig_list, 
                        samples = c("P17B", "P17L", "P18B", "P18L", 
                                    "P19B","P19L", "P20B", "P20L"))
clonalDiversity(combined, cloneCall = "gene")