1 Introduction

scRepertoire is designed to take filter contig outputs from the 10x Genomics Cell Ranger pipeline, processes that data to assign clonotype based on two TCR or Ig chains, and analyze the clonotype dynamics. The latter can be separated into 1) clonotype-only analysis functions, such as unique clonotypes or clonal space quantification and 2) interaction with mRNA expression data using Seurat, SingleCellExperiment or Monocle 3 packages.

1.1 Loading Libraries

suppressMessages(library(scRepertoire))
suppressMessages(library(Seurat))

1.2 Loading and Processing Contig Data

1.2.1 What data to load into scRepertoire?

scRepertoire functions using the filtered_contig_annotations.csv output from the 10x Genomics Cell Ranger. This file is located in the ./outs/ directory of the VDJ alignment folder. To generate a list of contigs to use for scRepertoire, just 1) load the filtered_contig_annotations.csv for each of your samples and then 2) make a list. For example:

S1 <- read.csv(".../Sample1/outs/filtered_contig_annotations.csv")
S2 <- read.csv(".../Sample2/outs/filtered_contig_annotations.csv")
S3 <- read.csv(".../Sample3/outs/filtered_contig_annotations.csv")
S4 <- read.csv(".../Sample4/outs/filtered_contig_annotations.csv")

contig_list <- list(S1, S2, S3, S4)

1.2.2 Multiplexed Experiment?

It is now easy to create the contig list from a multiplexed experiment by first generating a single-cell RNA object (either Seurat or Single Cell Experiment), loading the filtered contig file and then using createHTOContigList(). This function will return a list, separated by the group.by variable(s).

This function depends on the match of barcodes between the single-cell object and contigs. If there is a prefix or different suffix added to the barcode, this will result in no contigs recovered. As of right now, it is recommended you do this step before the integrated, as integration workflows commonly alter the barcodes. There is a multi.run variable that can be used on the integrated object, but it assumes you have modified the barcodes with the Seurat pipeline (automatic addition of _# to end) and your contig list is in the same order.

contigs <- read.csv(".../outs/filtered_contig_annotations.csv")

contig.list <- createHTOContigList(contigs, Seurat.Obj, group.by = "HTO_maxID")

1.3 Built-in Data

scRepertoire comes with a data set derived from T cells from three patients with renal clear cell carcinoma to demonstrate the functionality of the R package. More information on the data set can be found at the corresponding manuscript. The samples consist of paired peripheral-blood and tumor-infiltrating runs, effectively creating 6 distinct runs for T cell receptor (TCR) enrichment. We can preview the elements in the list by using the head function and looking at the first contig annotation.

data("contig_list") #the data built into scRepertoire

head(contig_list[[1]])
##            barcode is_cell                   contig_id high_confidence length
## 1 AAACCTGAGAGCTGGT    TRUE AAACCTGAGAGCTGGT-1_contig_1            TRUE    705
## 2 AAACCTGAGAGCTGGT    TRUE AAACCTGAGAGCTGGT-1_contig_2            TRUE    502
## 3 AAACCTGAGCATCATC    TRUE AAACCTGAGCATCATC-1_contig_1            TRUE    693
## 4 AAACCTGAGCATCATC    TRUE AAACCTGAGCATCATC-1_contig_2            TRUE    567
## 5 AAACCTGAGCATCATC    TRUE AAACCTGAGCATCATC-1_contig_5            TRUE    361
## 6 AAACCTGAGTGGTCCC    TRUE AAACCTGAGTGGTCCC-1_contig_1            TRUE    593
##   chain   v_gene d_gene  j_gene c_gene full_length productive             cdr3
## 1   TRB TRBV20-1  TRBD1 TRBJ1-5  TRBC1        TRUE       TRUE CSASMGPVVSNQPQHF
## 2   TRB     None   None TRBJ1-5  TRBC1       FALSE       None             None
## 3   TRB  TRBV5-1  TRBD2 TRBJ2-2  TRBC2        TRUE       TRUE  CASSWSGAGDGELFF
## 4   TRA TRAV12-1   None  TRAJ37   TRAC        TRUE       TRUE CVVNDEGSSNTGKLIF
## 5   TRB     None   None TRBJ1-5  TRBC1       FALSE       None             None
## 6   TRB  TRBV7-9  TRBD1 TRBJ2-5  TRBC2        TRUE       TRUE CASSPSEGGRQETQYF
##                                            cdr3_nt reads umis raw_clonotype_id
## 1 TGCAGTGCTAGCATGGGACCGGTAGTGAGCAATCAGCCCCAGCATTTT 16718    6      clonotype96
## 2                                             None  6706    3      clonotype96
## 3    TGCGCCAGCAGCTGGTCAGGAGCGGGAGACGGGGAGCTGTTTTTT 26719   11      clonotype97
## 4 TGTGTGGTGAACGATGAAGGCTCTAGCAACACAGGCAAACTAATCTTT 18297    6      clonotype97
## 5                                             None   882    4      clonotype97
## 6 TGTGCCAGCAGCCCCTCCGAAGGGGGGAGACAAGAGACCCAGTACTTC 11218    6      clonotype98
##          raw_consensus_id
## 1 clonotype96_consensus_1
## 2                    None
## 3 clonotype97_consensus_2
## 4 clonotype97_consensus_1
## 5                    None
## 6 clonotype98_consensus_1

Some workflows will have the additional labeling of the standard barcode. Before we proceeding, we can use the function stripBarcode() to avoid any labeling issues down the line. Importantly, stripBarcode() is for removing prefixes on barcodes that have resulted from other pipelines.

No need for stripBarcode function, if the barcodes look like:

  • AAACGGGAGATGGCGT-1
  • AAACGGGAGATGGCGT

Please think about the following parameters for using stripBarcode():

column
The column in which the barcodes are present

connector
The character that is connecting the barcode with the prefix

num_connects
The levels of barcode prefix, where X_X_AAACGGGAGATGGCGT-1 == 3, X_AAACGGGAGATGGCGT-1 = 2.

for (i in seq_along(contig_list)) {
    contig_list[[i]] <- stripBarcode(contig_list[[i]], 
                            column = 1, connector = "_", num_connects = 3)
}

1.4 Loading from other pipelines

New to this version of the package is support for BD Rhapsody, AIRR, WAT3R, and TRUST4 single-cell outputs. By indicating the directory of the outputs from these pipelines are located, loadContigs() will generate a contig list that is compatible with the rest of scRepertoire. Alternatively, loadContigs() will also accept a list of data frames that you have loaded individualy. In addition, although not necessary, loadContigs() can be used for 10X filtered_contig_annotation.csv files.

contig.output <- c("~/Documents/MyExperiment")
contig.list <- loadContigs(dir = contig.output, 
                           format = "TRUST4")

S1 <- read.csv("~/Documents/MyExperiment/Sample1/outs/barcode_results.csv")
S2 <- read.csv("~/Documents/MyExperiment/Sample2/outs/barcode_results.csv")
S3 <- read.csv("~/Documents/MyExperiment/Sample3/outs/barcode_results.csv")
S4 <- read.csv("~/Documents/MyExperiment/Sample4/outs/barcode_results.csv")

contig_list <- list(S1, S2, S3, S4)
contig.list <- loadContigs(dir = contig.output, 
                           format = "WAT3R")

2 Combining the Contigs

As the output of CellRanger are quantification of both the TCRA and TCRB chains, the next step is to create a single list object with the TCR genes (comprised of the VDJC genes) and CDR3 sequences by cell barcode. This is performed using the combineTCR(), where the input is the stripped contig_list. There is also the relabeling of the barcodes by sample and ID information to prevent duplicates.

removeNA

  • TRUE - this is a stringent filter to remove any cell barcode with an NA value in at least one of the chains
  • FALSE - the default setting to include and incorporate cells with 1 NA value

removeMulti

  • TRUE - this is a stringent filter to remove any cell barcode with more than 2 immune receptor chains
  • FALSE - the default setting to include and incorporate cells with > 2 chains

filterMulti

  • TRUE - Isolated the top 2 expressed chains in cell barcodes with multiple chains
  • FALSE - the default setting to include and incorporate cells with > 2 chains
combined <- combineTCR(contig_list, 
                samples = c("PY", "PY", "PX", "PX", "PZ","PZ"), 
                ID = c("P", "T", "P", "T", "P", "T"))

The output of combineTCR() will be a list of contig data frames that will be reduced to the reads associated with a single-cell barcode. It will also combine the multiple reads into clonotype calls by either the nucleotide sequence (CTnt), amino acid sequence (CTaa), the VDJC gene sequence (CTgene) or the combination of the nucleotide and gene sequence (CTstrict). The analogous function for B cells, combineBCR() functions similarly with 2 major caveats: 1) Each barcode can only have a maximum of 2 sequences, if greater exists, the 2 with the highest reads are selected. 2) The strict definition of clonotype (CTstrict) is based on the V gene and >85% normalized Levenshtein distance of the nucleotide sequence. The Levenshtein distance is calculated across all BCR sequences recovered, regardless of the run.

2.1 Loading from other pipelines

New to this version of the package is support for BD Rhapsody, AIRR, WAT3R, and TRUST4 single-cell outputs. By indicating the directory of the outputs from these pipelines are located, loadContigs() will generate a contig list that is compatible with the rest of scRepertoire. Alternatively, loadContigs() will also accept a list of data frames that you have loaded individualy. In addition, although not necessary, loadContigs() can be used for 10X filtered_contig_annotation.csv files.

contig.output <- c("~/Documents/MyExperiment")
contig.list <- loadContigs(dir = contig.output, 
                           format = "TRUST4")

S1 <- read.csv("~/Documents/MyExperiment/Sample1/outs/barcode_results.csv")
S2 <- read.csv("~/Documents/MyExperiment/Sample2/outs/barcode_results.csv")
S3 <- read.csv("~/Documents/MyExperiment/Sample3/outs/barcode_results.csv")
S4 <- read.csv("~/Documents/MyExperiment/Sample4/outs/barcode_results.csv")

contig_list <- list(S1, S2, S3, S4)
contig.list <- loadContigs(dir = contig.output, 
                           format = "WAT3R")

3 Other Processing Functions

3.1 Adding Additional Variables

What if there are more variables to add than just sample and ID? We can add them by using the addVariable() function. All we need is the name of the variable you’d like to add and the specific character or numeric values (variables). As an example, here we add the batches in which the samples were processed and sequenced.

example <- addVariable(combined, name = "batch", 
                variables = c("b1", "b1", "b2", "b2", "b2", "b2"))

example[[1]][1:5,ncol(example[[1]])] # This is showing the first 5 values of the new column added
## [1] "b1" "b1" "b1" "b1" "b1"

3.2 Subsetting Contigs

Likewise we can remove specific list elements after combineTCR() using the subsetContig() function. In order to subset, we need to identify the vector we would like to use for subsetting (name) and also the variable values to subset (variables). Below you can see us isolate just the 4 sequencing results from PX and PY.

subset <- subsetContig(combined, name = "sample", 
                       variables = c("PX", "PY"))

4 Visualizing Contigs

cloneCall

  • “gene” - use the VDJC genes comprising the TCR/Ig
  • “nt” - use the nucleotide sequence of the CDR3 region
  • “aa” - use the amino acid sequence of the CDR3 region
  • “strict” - use the VDJC genes comprising the TCR/Ig + the nucleotide sequence of the CDR3 region. This is the proper definition of clonotype

Important to note that the clonotype is called using the combination of genes or nt/aa CDR3 sequences for both loci. As of this implementation of scRepertoire, clonotype calling is not incorporating small variations within the CDR3 sequences. The gene approach will be the most sensitive, while the use of nt or aa moderately so, and the most specific for clonotypes being strict. Additionally, the clonotype call is trying to incorporate both loci, i.e, both TCRA and TCRB chains and if a single-cell barcode has multiple sequences identified (i.e., 2 TCRA chains expressed in one cell). Using the 10x approach, there is a subset of barcodes that only return one of the immune receptor chains, the unreturned chain is assigned an NA value.

4.1 Quantify Clonotypes

The first function to explore the clonotypes is quantContig() to return the total or relative numbers of unique clonotypes.

scale

  • TRUE - relative percent of unique clonotypes scaled by the total size of the clonotype repertoire
  • FALSE - Report the total number of unique clonotypes

chain

  • “both” for combined chain visualization
  • “TRA”, “TRB”, “TRD”, “TRG”, “IGH” or “IGL” to select single chain
quantContig(combined, cloneCall="gene+nt", scale = T)

quantContig(combined, cloneCall="gene+nt", chain = "TRA")

quantContig(combined, cloneCall="gene+nt", chain = "TRB")