Overview

The scRepertoire package provides robust tools for the initial processing, filtering, and combining of single-cell immune receptor sequencing data. In addition to scRepertoire, users have used the immunarch package, which offers a powerful and comprehensive suite of immune profiling functions.

This vignette demonstrates the seamless integration between scRepertoire and immunarch. We will use the exportClones() function with the format = "immunarch" option to generate a compatible data object, which can then be directly used for downstream analysis and visualization with immunarch.

Citation

If using immunarch, please cite the package

@Manual{,
  title = {immunarch: Bioinformatics Analysis of T-Cell and B-Cell Immune Repertoires},
  author = {Vadim I. Nazarov and Vasily O. Tsvetkov and Siarhei Fiadziushchanka and Eugene Rumynskiy and Aleksandr A. Popov and Ivan Balashov and Maria Samokhina},
  year = {2023},
  note = {https://immunarch.com/, https://github.com/immunomind/immunarch},
}

Setup

First, we need to load the necessary libraries and prepare the initial data using scRepertoire. We will use the built-in contig_list example data.

suppressMessages(library(scRepertoire))
suppressMessages(library(immunarch))
suppressMessages(library(ggplot2))

# Load example data from scRepertoire
data("contig_list") 

# Combine contigs into a single list
combined <- combineTCR(contig_list, 
                       samples = c("P17B", "P17L", "P18B", "P18L", 
                                   "P19B", "P19L", "P20B", "P20L"))

Exporting for immunarch

The exportClones() function can now format the scRepertoire object into a list that immunarch can immediately use. This list contains two key components:

  • $data: A list of data frames, where each data frame is a repertoire from a single sample.
  • $meta: A metadata data frame describing the samples.
# Export the data with write.file = FALSE to get the R object
immunarch_data <- exportClones(combined, 
                               format = "immunarch", 
                               write.file = FALSE)

# We can inspect the structure of the output
str(immunarch_data, max.level = 2)
## List of 2
##  $ data:List of 8
##   ..$ P17B:'data.frame': 745 obs. of  9 variables:
##   ..$ P17L:'data.frame': 2117 obs. of  9 variables:
##   ..$ P18B:'data.frame': 1254 obs. of  9 variables:
##   ..$ P18L:'data.frame': 1202 obs. of  9 variables:
##   ..$ P19B:'data.frame': 5544 obs. of  9 variables:
##   ..$ P19L:'data.frame': 1619 obs. of  9 variables:
##   ..$ P20B:'data.frame': 6087 obs. of  9 variables:
##   ..$ P20L:'data.frame': 192 obs. of  9 variables:
##  $ meta:'data.frame':    8 obs. of  1 variable:
##   ..$ Sample: chr [1:8] "P17B" "P17L" "P18B" "P18L" ...

As you can see, the output is already in the format required by immunarch. Now we can proceed with downstream analysis.

Basic Analysis with immunarch

Let’s perform a few common analyses to demonstrate the workflow. For visualization purposes, we will add a “SampleType” column to the metadata to group the samples by their origin (B for blood, L for lymph node).

# Add a grouping variable to the metadata
immunarch_data$meta$Patient <- substr(immunarch_data$meta$Sample, 1, 3)
immunarch_data$meta$SampleType <- substr(immunarch_data$meta$Sample, 4, 4)
head(immunarch_data$meta)
##   Sample Patient SampleType
## 1   P17B     P17          B
## 2   P17L     P17          L
## 3   P18B     P18          B
## 4   P18L     P18          L
## 5   P19B     P19          B
## 6   P19L     P19          L

Repertoire Overlap

We can measure the similarity between repertoires by calculating the number of shared clonotypes (public clonotypes) using repOverlap(). The vis() function can then generate a heatmap to visualize the pairwise overlap.

# Calculate overlap using the number of public clonotypes
imm_ov <- repOverlap(immunarch_data$data, .method = "public", .verbose = FALSE)

# Visualize the overlap as a heatmap
vis(imm_ov)

Clonal Homeostasis

Next, let’s assess the clonal space homeostasis, which is the proportion of the repertoire occupied by clones of different sizes (e.g., Small, Medium, Large). The repClonality() function calculates this, and vis() creates a bar plot.

# Calculate clonal space homeostasis
imm_hom <- repClonality(immunarch_data$data,
                        .method = "homeo",
                        .clone.types = c(Small = 0.0001, Medium = 0.001, 
                                         Large = 0.01, Hyperexpanded = 1))

# Visualize homeostasis, grouped by sample type
vis(imm_hom, .by = "SampleType", .meta = immunarch_data$meta)

Gene Usage

immunarch provides powerful tools to analyze and visualize the usage of V, D, and J genes. Here, we’ll compute the usage of TRAV/TRBV genes and visualize their distribution across samples. It is important to note, in paired mode, immunarch calculates usage for both V genes.

# Compute TRAV/TRBV gene usage
imm_gu <- geneUsage(immunarch_data$data[1], "hs.trbv", .norm = TRUE)

# Visualize gene usage as a heatmap
vis(imm_gu[30:60,], .grid = TRUE, .title = "TRAV/TRBV Gene Usage")

We can also visualize the gene usage grouped by our metadata variable. Here, we analyze the imm_gu object using the Jensen-Shannon (js) divergence and correlation analysis. JS divergence is a measure of similarity between two probability distributions (lower = more similar). Conversely, correlation (cor) measures the linear relationship between the gene usage profiles of different samples.

imm_gu <- geneUsage(immunarch_data$data, "hs.trbv", 
                    .norm = T)
imm_gu_js <- geneUsageAnalysis(imm_gu, 
                               .method = "js", 
                               .verbose = F)
imm_gu_cor <- geneUsageAnalysis(imm_gu, 
                                .method = "cor", 
                                .verbose = F)

p1 <- vis(imm_gu_js, .title = "Gene usage JS-divergence", 
          .leg.title = "JS", 
          .text.size = 1.5)
p2 <- vis(imm_gu_cor, .title = "Gene usage correlation", 
          .leg.title = "Cor", 
          .text.size = 1.5)

p1 + p2

Calculating Diversity

immunarch and scRepertoire have many overlapping features, such as diversity. One key difference is immunarch separation of calculation and visualization. This offers immunarch users the ability to group calculations in a post-hoc fashion. Here we can use the nonparametric chao1 estimation to look at diversity across samples or by sampleType (B = Blood and L = Lung).

div_chao <- repDiversity(immunarch_data$data, "chao1")

p1 <- vis(div_chao)
p2 <- vis(div_chao, .by = c("SampleType"), 
          .meta = immunarch_data$meta)

p1 + p2 

This concludes the vignette on integrating scRepertoire with immunarch. By using the exportClones() function, you can easily leverage the extensive analytical and visualization capabilities of immunarch for your single-cell TCR sequencing data. Immunarch has extensive documentation and vignettes available at immunarch.com.