scRepertoire 1.11.0
scRepertoire is designed to take filter contig outputs from the 10x Genomics Cell Ranger pipeline, processes that data to assign clonotype based on two TCR or Ig chains, and analyze the clonotype dynamics. The latter can be separated into 1) clonotype-only analysis functions, such as unique clonotypes or clonal space quantification and 2) interaction with mRNA expression data using Seurat, SingleCellExperiment or Monocle 3 packages.
suppressMessages(library(scRepertoire))
suppressMessages(library(Seurat))
scRepertoire functions using the filtered_contig_annotations.csv output from the 10x Genomics Cell Ranger. This file is located in the ./outs/ directory of the VDJ alignment folder. To generate a list of contigs to use for scRepertoire, just 1) load the filtered_contig_annotations.csv for each of your samples and then 2) make a list. For example:
S1 <- read.csv(".../Sample1/outs/filtered_contig_annotations.csv")
S2 <- read.csv(".../Sample2/outs/filtered_contig_annotations.csv")
S3 <- read.csv(".../Sample3/outs/filtered_contig_annotations.csv")
S4 <- read.csv(".../Sample4/outs/filtered_contig_annotations.csv")
contig_list <- list(S1, S2, S3, S4)
It is now easy to create the contig list from a multiplexed experiment by first generating a single-cell RNA object (either Seurat or Single Cell Experiment), loading the filtered contig file and then using createHTOContigList()
. This function will return a list, separated by the group.by variable(s).
This function depends on the match of barcodes between the single-cell object and contigs. If there is a prefix or different suffix added to the barcode, this will result in no contigs recovered. As of right now, it is recommended you do this step before the integrated, as integration workflows commonly alter the barcodes. There is a multi.run variable that can be used on the integrated object, but it assumes you have modified the barcodes with the Seurat pipeline (automatic addition of _# to end) and your contig list is in the same order.
contigs <- read.csv(".../outs/filtered_contig_annotations.csv")
contig.list <- createHTOContigList(contigs, Seurat.Obj, group.by = "HTO_maxID")
scRepertoire comes with a data set derived from T cells from three patients with renal clear cell carcinoma to demonstrate the functionality of the R package. More information on the data set can be found at the corresponding manuscript. The samples consist of paired peripheral-blood and tumor-infiltrating runs, effectively creating 6 distinct runs for T cell receptor (TCR) enrichment. We can preview the elements in the list by using the head function and looking at the first contig annotation.
data("contig_list") #the data built into scRepertoire
head(contig_list[[1]])
## barcode is_cell contig_id high_confidence length
## 1 AAACCTGAGAGCTGGT TRUE AAACCTGAGAGCTGGT-1_contig_1 TRUE 705
## 2 AAACCTGAGAGCTGGT TRUE AAACCTGAGAGCTGGT-1_contig_2 TRUE 502
## 3 AAACCTGAGCATCATC TRUE AAACCTGAGCATCATC-1_contig_1 TRUE 693
## 4 AAACCTGAGCATCATC TRUE AAACCTGAGCATCATC-1_contig_2 TRUE 567
## 5 AAACCTGAGCATCATC TRUE AAACCTGAGCATCATC-1_contig_5 TRUE 361
## 6 AAACCTGAGTGGTCCC TRUE AAACCTGAGTGGTCCC-1_contig_1 TRUE 593
## chain v_gene d_gene j_gene c_gene full_length productive cdr3
## 1 TRB TRBV20-1 TRBD1 TRBJ1-5 TRBC1 TRUE TRUE CSASMGPVVSNQPQHF
## 2 TRB None None TRBJ1-5 TRBC1 FALSE None None
## 3 TRB TRBV5-1 TRBD2 TRBJ2-2 TRBC2 TRUE TRUE CASSWSGAGDGELFF
## 4 TRA TRAV12-1 None TRAJ37 TRAC TRUE TRUE CVVNDEGSSNTGKLIF
## 5 TRB None None TRBJ1-5 TRBC1 FALSE None None
## 6 TRB TRBV7-9 TRBD1 TRBJ2-5 TRBC2 TRUE TRUE CASSPSEGGRQETQYF
## cdr3_nt reads umis raw_clonotype_id
## 1 TGCAGTGCTAGCATGGGACCGGTAGTGAGCAATCAGCCCCAGCATTTT 16718 6 clonotype96
## 2 None 6706 3 clonotype96
## 3 TGCGCCAGCAGCTGGTCAGGAGCGGGAGACGGGGAGCTGTTTTTT 26719 11 clonotype97
## 4 TGTGTGGTGAACGATGAAGGCTCTAGCAACACAGGCAAACTAATCTTT 18297 6 clonotype97
## 5 None 882 4 clonotype97
## 6 TGTGCCAGCAGCCCCTCCGAAGGGGGGAGACAAGAGACCCAGTACTTC 11218 6 clonotype98
## raw_consensus_id
## 1 clonotype96_consensus_1
## 2 None
## 3 clonotype97_consensus_2
## 4 clonotype97_consensus_1
## 5 None
## 6 clonotype98_consensus_1
Some workflows will have the additional labeling of the standard barcode. Before we proceeding, we can use the function stripBarcode()
to avoid any labeling issues down the line. Importantly, stripBarcode()
is for removing prefixes on barcodes that have resulted from other pipelines.
No need for stripBarcode function, if the barcodes look like:
Please think about the following parameters for using stripBarcode()
:
column
The column in which the barcodes are present
connector
The character that is connecting the barcode with the prefix
num_connects
The levels of barcode prefix, where X_X_AAACGGGAGATGGCGT-1 == 3, X_AAACGGGAGATGGCGT-1 = 2.
for (i in seq_along(contig_list)) {
contig_list[[i]] <- stripBarcode(contig_list[[i]],
column = 1, connector = "_", num_connects = 3)
}
New to this version of the package is support for BD Rhapsody, AIRR, WAT3R, and TRUST4 single-cell outputs. By indicating the directory of the outputs from these pipelines are located, loadContigs()
will generate a contig list that is compatible with the rest of scRepertoire. Alternatively, loadContigs()
will also accept a list of data frames that you have loaded individualy. In addition, although not necessary, loadContigs()
can be used for 10X filtered_contig_annotation.csv files.
contig.output <- c("~/Documents/MyExperiment")
contig.list <- loadContigs(dir = contig.output,
format = "TRUST4")
S1 <- read.csv("~/Documents/MyExperiment/Sample1/outs/barcode_results.csv")
S2 <- read.csv("~/Documents/MyExperiment/Sample2/outs/barcode_results.csv")
S3 <- read.csv("~/Documents/MyExperiment/Sample3/outs/barcode_results.csv")
S4 <- read.csv("~/Documents/MyExperiment/Sample4/outs/barcode_results.csv")
contig_list <- list(S1, S2, S3, S4)
contig.list <- loadContigs(dir = contig.output,
format = "WAT3R")
As the output of CellRanger are quantification of both the TCRA and TCRB chains, the next step is to create a single list object with the TCR genes (comprised of the VDJC genes) and CDR3 sequences by cell barcode. This is performed using the combineTCR()
, where the input is the stripped contig_list. There is also the relabeling of the barcodes by sample and ID information to prevent duplicates.
removeNA
removeMulti
filterMulti
combined <- combineTCR(contig_list,
samples = c("PY", "PY", "PX", "PX", "PZ","PZ"),
ID = c("P", "T", "P", "T", "P", "T"))
The output of combineTCR()
will be a list of contig data frames that will be reduced to the reads associated with a single-cell barcode. It will also combine the multiple reads into clonotype calls by either the nucleotide sequence (CTnt), amino acid sequence (CTaa), the VDJC gene sequence (CTgene) or the combination of the nucleotide and gene sequence (CTstrict). The analogous function for B cells, combineBCR()
functions similarly with 2 major caveats: 1) Each barcode can only have a maximum of 2 sequences, if greater exists, the 2 with the highest reads are selected. 2) The strict definition of clonotype (CTstrict) is based on the V gene and >85% normalized Levenshtein distance of the nucleotide sequence. The Levenshtein distance is calculated across all BCR sequences recovered, regardless of the run.
New to this version of the package is support for BD Rhapsody, AIRR, WAT3R, and TRUST4 single-cell outputs. By indicating the directory of the outputs from these pipelines are located, loadContigs()
will generate a contig list that is compatible with the rest of scRepertoire. Alternatively, loadContigs()
will also accept a list of data frames that you have loaded individualy. In addition, although not necessary, loadContigs()
can be used for 10X filtered_contig_annotation.csv files.
contig.output <- c("~/Documents/MyExperiment")
contig.list <- loadContigs(dir = contig.output,
format = "TRUST4")
S1 <- read.csv("~/Documents/MyExperiment/Sample1/outs/barcode_results.csv")
S2 <- read.csv("~/Documents/MyExperiment/Sample2/outs/barcode_results.csv")
S3 <- read.csv("~/Documents/MyExperiment/Sample3/outs/barcode_results.csv")
S4 <- read.csv("~/Documents/MyExperiment/Sample4/outs/barcode_results.csv")
contig_list <- list(S1, S2, S3, S4)
contig.list <- loadContigs(dir = contig.output,
format = "WAT3R")
What if there are more variables to add than just sample and ID? We can add them by using the addVariable()
function. All we need is the name of the variable you’d like to add and the specific character or numeric values (variables). As an example, here we add the batches in which the samples were processed and sequenced.
example <- addVariable(combined, name = "batch",
variables = c("b1", "b1", "b2", "b2", "b2", "b2"))
example[[1]][1:5,ncol(example[[1]])] # This is showing the first 5 values of the new column added
## [1] "b1" "b1" "b1" "b1" "b1"
Likewise we can remove specific list elements after combineTCR()
using the subsetContig()
function. In order to subset, we need to identify the vector we would like to use for subsetting (name) and also the variable values to subset (variables). Below you can see us isolate just the 4 sequencing results from PX and PY.
subset <- subsetContig(combined, name = "sample",
variables = c("PX", "PY"))
cloneCall
Important to note that the clonotype is called using the combination of genes or nt/aa CDR3 sequences for both loci. As of this implementation of scRepertoire, clonotype calling is not incorporating small variations within the CDR3 sequences. The gene approach will be the most sensitive, while the use of nt or aa moderately so, and the most specific for clonotypes being strict. Additionally, the clonotype call is trying to incorporate both loci, i.e, both TCRA and TCRB chains and if a single-cell barcode has multiple sequences identified (i.e., 2 TCRA chains expressed in one cell). Using the 10x approach, there is a subset of barcodes that only return one of the immune receptor chains, the unreturned chain is assigned an NA value.
The first function to explore the clonotypes is quantContig()
to return the total or relative numbers of unique clonotypes.
scale
chain
quantContig(combined, cloneCall="gene+nt", scale = T)
quantContig(combined, cloneCall="gene+nt", chain = "TRA")
quantContig(combined, cloneCall="gene+nt", chain = "TRB")
quantContig(combined, cloneCall="gene+nt", chain = "TRA", scale = TRUE)
quantContig(combined, cloneCall="gene+nt", scale = T, chain = "both")
Within each of the general analysis functions, there is the ability to export the data frame used to create the visualization. To get the exported values, use exportTable == T.
quantContig_output <- quantContig(combined, cloneCall="gene+nt",
scale = T, exportTable = T)
quantContig_output
## contigs values total scaled
## 1 2712 PY_P 3208 84.53865
## 2 1585 PY_T 3119 50.81757
## 3 823 PX_P 1068 77.05993
## 4 918 PX_T 1678 54.70799
## 5 1143 PZ_P 1434 79.70711
## 6 768 PZ_T 2768 27.74566
We can also examine the relative distribution of clonotypes by abundance. Here abundanceContig()
will produce a line graph with a total number of clonotypes by the number of instances within the sample or run. Like above, we can also group this by vectors within the contig object using the group variable in the function.
abundanceContig(combined, cloneCall = "gene", scale = F)
We can look at the length distribution of the CDR3 sequences by calling the lengtheContig()
function. Importantly, unlike the other basic visualizations, the cloneCall can only be “nt” or “aa”. Due to the method of calling clonotypes as outlined above, the length should reveal a multimodal curve, this is a product of using the NA for the unreturned chain sequence and multiple chains within a single barcode.
chain
lengthContig(combined, cloneCall="aa", chain = "both")
lengthContig(combined, cloneCall="nt", chain = "TRA")
We can also look at clonotypes between samples and changes in dynamics by using the compareClonotypes()
function.
samples
Can be used to isolate specific samples based on the name of the list element
graph
number
The top number of clonotypes to graph, this will be calculated based on the frequency of the individual sample. This can also be left blank.
clonotypes
Can be used to isolate specific clonotype sequences, ensure the call matches the sequences you would like to visualize.
compareClonotypes(combined,
numbers = 10,
samples = c("PX_P", "PX_T"),
cloneCall="aa",
graph = "alluvial")
Last of the basic analysis visualizations is the relative usage of genes of the TCR or BCR, using vizGenes()
.
gene
chain
plot
y.axis
Variable to separate the counts along the y-axis
scale
order
vizGenes(combined, gene = "V",
chain = "TRB",
plot = "bar",
order = "variance",
scale = TRUE)
We can also use vizGenes()
to look at the usage of genes in a single chain. So for example say we are interesting in the diference in TRB V and J usage between tumor and peripheral blood samples - we can easily take a look at this using the following code:
#Peripheral Blood
vizGenes(combined[c(1,3,5)],
gene = "V",
chain = "TRB",
y.axis = "J",
plot = "heatmap",
scale = TRUE,
order = "gene")
#Tumor Infiltrating
vizGenes(combined[c(2,4,6)],
gene = "V",
chain = "TRB",
y.axis = "J",
plot = "heatmap",
scale = TRUE,
order = "gene")
After we have completed the basic processing and summary functions in scRepertoire, we can begin to explore the clonotypes of the single-cell data in more detail.
By examining the clonal space, we are effectively looking at the relative space occupied by clones at specific proportions. Another way to think about this would be thinking of the total immune receptor sequencing run as a measuring cup. In this cup, we will fill liquids of different viscosity - or a different number of clonal proportions. Clonal space homeostasis asks what percentage of the cup is filled by clones in distinct proportions (or liquids of different viscosity, to extend the analogy). The proportional cutpoints are set under the cloneType variable in the function and can be adjusted at baseline the bins are as follows:
cloneTypes
clonalHomeostasis(combined, cloneCall = "gene",
cloneTypes = c(Rare = 1e-04,
Small = 0.001,
Medium = 0.01,
Large = 0.1,
Hyperexpanded = 1))
clonalHomeostasis(combined, cloneCall = "aa")
Like clonal space homeostasis above, clonal proportion acts to place clones into separate bins. The key difference is instead of looking at the relative proportion of the clone to the total, the clonalProportion()
function will rank the clones by total number and place them into bins.
The split represents the ranking of clonotypes by copy or frequency of occurrence, meaning 1:10 are the top 10 clonotypes in each sample. The default bins are under the split variable in the function and can be adjusted, but they are as follows at baseline.
split
clonalProportion(combined, cloneCall = "gene",
split = c(10, 100, 1000, 10000, 30000, 1e+05))
clonalProportion(combined, cloneCall = "nt")
If you are interested in measures of similarity between the samples loaded into scRepertoire, using clonalOverlap()
can assist in the visualization. Three methods currently can be performed in clonalOverlap()
1) overlap coefficient, 2) Morisita index, or 3) Jaccard index. The former is looking at the overlap of clonotypes scaled to the length of unique clonotypes in the smaller sample. The Morisita index is more complex, it is an ecological measure of the dispersion of individuals within a population, incorporating the size of the population. The Jaccard Similarity Index is very similar to the overlap coefficient - instead of using the length of the smaller sample, the denominator for the Jaccard Index is the union of the two comparisons, leading to a much smaller number.
clonalOverlap(combined,
cloneCall = "gene+nt",
method = "morisita")
Another recent addition to scRepertoire is the ability to cluster the samples by the clone size distribution using clonesizeDistribution()
adapted from the powerTCR R package. If using this function, please read and cite the respective citation to analyze the similarities of sample clone size distributions. In this function, method refers to the hierarchical clustering approach that will be based on.
clonesizeDistribution(combined,
cloneCall = "gene+nt",
method="ward.D2")
## NULL
Diversity can also be measured for samples or by other variables. Diversity is calculated using five metrics: 1) Shannon, 2) inverse Simpson, 3) Chao1, 4) Abundance-based Coverage Estimator (ACE), and 5) inverse Pielou’s measure of species evenness. The former two are generally used to estimate baseline diversity and Chao/ACE indices are used to estimate the richness of the samples. New implementation of this function includes downsampling with 100 bootstraps (n.boots) using the minimum number of unique clonotypes, as a more robust diversity estimate.
clonalDiversity(combined,
cloneCall = "gene",
group.by = "sample",
x.axis = "ID",
n.boots = 100)
A number of users requested a visualization from the work of Wu, et al 2020, PMID: 32103181 that allows for the direct comparison of clonotypes between 2 samples. scatterClonotype()
will organize two samples from the combineTCR/BCR product, count the relative clonotypes, and produce a scatter plot comparing the two samples.
x.axis and y.axis
Names of the list element to place on the x-axis and y-axis - so for example “PX_P” and “PX_T”
dot.size
graph
scatterClonotype(combined,
cloneCall ="gene",
x.axis = "PY_P",
y.axis = "PY_T",
dot.size = "total",
graph = "proportion")
As mentioned previously, this data set is derived from work performed in the laboratory of Weizhou Zhang. We have elected to pair the workflow of scRepertoire with the excellent Seurat package, for greater usability. The built-in package vignette explores the use of SingleCellExperiment formats as well. The first step is to load the Seurat object and visualize the data.
seurat <- Seurat::UpdateSeuratObject(get(load("~/seurat2.rda")))
DimPlot(seurat, label = T) + NoLegend()
#Can directly download the seurat object using:
#seurat <- readRDS(url("https://drive.google.com/uc?export=download&id=1wqakP2JQz9B62ofMfjWD0MB2SyPPoDE-&confirm=t&uuid=d4b1a2bc-465b-4c41-8258-5d4b100f1cbb&at=ANzk5s7lfBxMcg-RPpDFo6zykmXv:1682179250290"))
Here you can see we have 12 total clusters (C1-12), which we have labeled as such for simplicity. We can also get a little more granular information on the number of cells by using the table()
function.
table(Idents(seurat))
##
## C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12
## 2293 2138 1746 1419 1167 1128 807 792 495 357 328 241
Next we can take the clonotypic information and attach it to our Seurat object using the combineExpression() function. Importantly, the major requirement for the attachment is matching contig cell barcodes and barcodes in the row names of the metadata of the Seurat or SCE object. If these do not match, the attachment will fail. Based on ease, we suggest you make the changes to the Seurat object row names.
We can call (cloneCall) the 4 variations of clonotypes: 1) VDJC genes, 2) CDR3 amino acid sequence, 3) CDR3 nucleotide sequence, or 4) VDJC genes and CDR3 nucleotide sequence. The attaching function will also calculate the frequency of the clonotype based on the group.by variable. If blank, group.by will calculate frequencies of clonotypes by individual run, but because we have 6 samples of paired peripheral and tumor T cells, we are actually going to use the group.by variable to call “sample” in order to calculate frequencies across both the peripheral blood and tumor T cells of the same patient.
In order to categorize the frequency, we have the variable proportion which if TRUE allows for the relative proportion or when FALSE will use absolute frequency to define clonotype groups cloneTypes acts as a bin to place labels. As a default, cloneTypes is set to equal cloneTypes=c(Rare = 1e-4, Small = 0.001, Medium = 0.01, Large = 0.1, Hyperexpanded = 1). However, below you can see an example of using total frequency as expansion assignments.
seurat <- combineExpression(combined, seurat,
cloneCall="gene",
group.by = "sample",
proportion = FALSE,
cloneTypes=c(Single=1, Small=5, Medium=20, Large=100, Hyperexpanded=500))
If you have performed TCR/BCR enrichment or want to add info for both gamma-delta and alpha-beta T cells, now we can just make a single list and use combineExpression()
. Major note if there are duplicate barcodes (if a cell has both Ig and TCR), the immune receptor information will not be added. As an anecdote, the testing data we used to improve this function had 5-6% of barcode overlap.
#This is an example of the process, will not be evaluated during knit
TCR <- combineTCR(...)
BCR <- combineBCR(...)
list.receptors <- c(TCR, BCR)
seurat <- combineExpression(list.receptors,
seurat,
cloneCall="gene",
proportion = TRUE)
We first want to look at the distribution of peripheral versus tumor T cells. We can use the same color scheme as the rest of the scRepertoire package by calling the object colorblind_vector using the following hex codes.
colorblind_vector <- colorRampPalette(rev(c("#0D0887FF", "#47039FFF",
"#7301A8FF", "#9C179EFF", "#BD3786FF", "#D8576BFF",
"#ED7953FF","#FA9E3BFF", "#FDC926FF", "#F0F921FF")))
DimPlot(seurat, group.by = "Type") + NoLegend() +
scale_color_manual(values=colorblind_vector(2)) +
theme(plot.title = element_blank())
Now we can look at the distribution of the clonotype bins by first ordering the clonoType as a factor, this prevents the coloring from being in alphabetical order. Next we use the DimPlot()
function call in Seurat with our scale_color_manual additional layer.
slot(seurat, "meta.data")$cloneType <- factor(slot(seurat, "meta.data")$cloneType,
levels = c("Hyperexpanded (100 < X <= 500)",
"Large (20 < X <= 100)",
"Medium (5 < X <= 20)",
"Small (1 < X <= 5)",
"Single (0 < X <= 1)", NA))
DimPlot(seurat, group.by = "cloneType") +
scale_color_manual(values = colorblind_vector(5), na.value="grey") +
theme(plot.title = element_blank())
Using the dimensional reduction graphs as a reference, we can also generate an overlay of the position of clonally expanded cells using clonalOverlay()
. Select the reduction for the visualization, default is “PCA” and the freq.cutpoint or lowest clonal frequency or proportion to generate the contour plot. We can modify the contours by selecting the number of bins or number of contours drawn. clonalOverlay()
can be used to look across all cells or faceted by a metadata variable using facet. As we facet, the overall dimensional reduction will be maintained, while the contour plots will adjust based on the facet variable. Coloring of the dot plot is based on the active identity of the single-cell object. This visualization was authored by Dr. Francesco Mazziotta from Johns Hopkins and inspired by Drs. Carmona and Andreatta and their work with ProjectTIL.
clonalOverlay(seurat,
reduction = "umap",
freq.cutpoint = 30,
bins = 10,
facet = "Patient") +
guides(color = "none")
Similar to clonalOverlay()
, we can look at the network interaction of clonotypes shared between clusters along the single-cell dimensional reduction using clonalNetwork()
. This function shows the relative proportion of clones that come from the starting node, with the ending node indicated by the arrow.
Filtering for clones can be accomplished using 3 methods:
filter.clones
filter.identity
For the identity chosen to visualize, show the to and from network connections for a single group
filter.proportion
Remove clones that comprise less than a certain proportion of clones in groups.
#ggraph needs to be loaded due to issues with ggplot
library(ggraph)
#No Identity filter
clonalNetwork(seurat,
reduction = "umap",
identity = "ident",
filter.clones = NULL,
filter.identity = NULL,
cloneCall = "aa")
#Examining Cluster 2 only
clonalNetwork(seurat,
reduction = "umap",
identity = "ident",
filter.identity = "C2",
cloneCall = "aa")
We can also use exportClones to quickly get clones that are shared across multiple identity groups, along with the total number of clones in the data set.
shared.clones <- clonalNetwork(seurat,
reduction = "umap",
identity = "ident",
cloneCall = "aa",
exportClones = TRUE)
head(shared.clones)
## # A tibble: 6 × 2
## clone sum
## <chr> <int>
## 1 CAVNGGSQGNLIF_CSAEREDTDTQYF 352
## 2 NA_CATSATLRVVAEKLFF 258
## 3 CAFMDSWDAGNKLTF_CASSPLEGVYTDTQYF 90
## 4 NA_CASSEGTSGRMDEQFF 90
## 5 CALSDLALIQGAQKLVF_CASSFFGTSVTDTQYF 74
## 6 CAVEDQDTGKLIF_CASSAGIDTDTQYF 69
We can also look at the clonotypes by calling specific sequences in the highlightClonotypes()
below. In order to highlight the clonotypes, we first need to use the cloneCall the type of sequence we will be using and then the specific sequences themselves using sequence. Below you can see the steps to highlight the two most prominent sequences “CAVNGGSQGNLIF_CSAEREDTDTQYF” with a frequency = 482 (clonotype 1) and “NA_CATSATLRVVAEKLFF” with a frequency = 287 (Clonotype2).
seurat <- highlightClonotypes(seurat,
cloneCall= "aa",
sequence = c("CAVNGGSQGNLIF_CSAEREDTDTQYF",
"NA_CATSATLRVVAEKLFF"))
DimPlot(seurat, group.by = "highlight") +
theme(plot.title = element_blank())
We can also look at the count of cells by cluster assigned into specific frequency ranges, by using the occupiedscRepertoire()
function and selecting the x.axis to display cluster or other variables in the metadata of the single-cell object.
occupiedscRepertoire(seurat, x.axis = "ident")
After the metadata has been modified, we can look at clonotypes across multiple categories using the alluvialClonotypes()
function. To understand the basic concepts of this graphing method, I’d highly recommend reading this post, essentially we are able to use the plots to examine the interchange of categorical variables. Because this function will produce a graph with each clonotype arranged by called stratifications, this will take some time depending on the size of your total cells. To expedite, we will actually subset the seurat object before using alluvialClonotypes()
.
alluvialClonotypes(seurat, cloneCall = "gene",
y.axes = c("Patient", "ident", "Type"),
color = "TRAV12-2.TRAJ42.TRAC_TRBV20-1.TRBD2.TRBJ2-3.TRBC2") +
scale_fill_manual(values = c("grey", colorblind_vector(2)[2]))
alluvialClonotypes(seurat, cloneCall = "gene",
y.axes = c("Patient", "ident", "Type"),
color = "ident")
Like alluvial graphs, we can also visualize the interconnection of clusters using the chord diagrams from the circlize R package. The first step is getting the data frame output to feed into the chordDiagram()
function in circlize, which can be done using getCirclize()
. This will calculate the relative number of clonotypes shared based on the group.by variable using the product of combineExpression()
. Important to note, getCirclize()
will create a matrix the size of the group.by variable and then simplify into set of instructions to be read by the circlize R package. The output of which is the total number of unique and shared clones by the group.by variable - this will be greater than the number of unique clonotypes due to the shared clone information.
library(circlize)
library(scales)
circles <- getCirclize(seurat,
group.by = "ident")
#Just assigning the normal colors to each cluster
grid.cols <- scales::hue_pal()(length(unique(seurat@active.ident)))
names(grid.cols) <- levels(seurat@active.ident)
#Graphing the chord diagram
circlize::chordDiagram(circles,
self.link = 1,
grid.col = grid.cols)
This can also be used if we wanted to explore just the tumor-specific T cells by just subsetting the single-cell object.
subset <- subset(seurat, Type == "T")
circles <- getCirclize(subset, group.by = "ident")
grid.cols <- scales::hue_pal()(length(unique(subset@active.ident)))
names(grid.cols) <- levels(subset@active.ident)
chordDiagram(circles, self.link = 1,
grid.col = grid.cols, directional = 1,
direction.type = "arrows",
link.arr.type = "big.arrow")
The nucleotide or amino acid sequences of the chains can be used to cluster clonotypes, by examining the edit distance of the sequences. This approach is underlying the combineBCR()
function, but now can be applied to the T cell receptors. It will add to the end of each list element a cluster by generating a network connected by the similarity in sequence. This network is directed by the threshold variable, where 0.85 is the normalized mean edit distance.
Clonotype Cluster by distance:
LD denotes if the cluster was called using the normalized Levenshtein distance
If performed over the number of samples, such as the list elements, group.by can be used to calculate only the clusters on the setting of patient sample (group.by = “Patient”) or tissue type (group.by = “Type”). This will add the selected group to the beginning of the cluster designation.
sub_combined <- clusterTCR(combined[[2]],
chain = "TRA",
sequence = "aa",
threshold = 0.85,
group.by = NULL)
We can also use the single-cell object directly to cluster cells within the single-cell object using the same approach as above.
seurat <- clusterTCR(seurat,
chain = "TRB",
group.by = "Patient",
sequence = "aa",
threshold = 0.85)
DimPlot(seurat, group.by = "TRB_cluster") +
scale_color_manual(values = colorblind_vector(length(unique(seurat@meta.data[,"TRB_cluster"])))) +
NoLegend()
From the excellent work by Lei Zhang, et al in Lineage tracking reveals dynamic relationships of T cells in colorectal cancer, the authors introduce new methods for looking at clonotypes by cellular origins and cluster identification. Their startrac software has been incorporated into scRepertoire. If you are using the output of this specific function, please cite their excellent work.
In order to use the StartracDiversity()
function, you will need to include the product of the combinedSeurat()
function. The second requirement is a column header in the metadata of the Seurat object that has tissue of origin. In the example data,type corresponds to the column “Type”, which includes the “P” and “T” classifier. The indices can be subsetted for a specific patient or examined overall using the by variable. Importantly, the function uses only the strict definition of clonotype of the VDJC genes and the CDR3 nucleotide sequence.
The indices output includes:
StartracDiversity(seurat,
type = "Type",
sample = "Patient",
by = "overall")
## [2023-07-25 05:53:07.183396] initialize Startrac ...
## [2023-07-25 05:53:07.253709] calculate startrac index ...
## [2023-07-25 05:53:07.260898] calculate pairwise index ...
## [2023-07-25 05:53:07.374484] calculate indices of each patient ...
## [2023-07-25 05:53:07.613355] collect result
## [2023-07-25 05:53:07.615393] return
A new metric proposed by Massimo et al, clonal bias, like Startac is a clonal metric that seeks to quantify how individual clones are skewed towards a specific cellular compartment or cluster.
split.by
Variable used to calculate the baseline frequencies
group.by
The compartment/variable used for the purpose of the comparison
clonotypeBias(seurat,
cloneCall = "aa",
split.by = "Patient",
group.by = "ident",
n.boots = 20,
min.expand =10)
For users that would like greater ability to use the metadata in the Seurat objects to perform the analysis that scRepertoire provides, there is also the option of using the expression2List()
function that will take the metadata and output the data as a list by cluster.
combined2 <- expression2List(seurat,
split.by = "ident")
length(combined2) #now listed by cluster
## [1] 12
clonalDiversity(combined2,
cloneCall = "nt")
clonalHomeostasis(combined2,
cloneCall = "nt")
clonalProportion(combined2,
cloneCall = "nt")
clonalOverlap(combined2,
cloneCall="aa",
method="overlap")
Alternatively, most of the visualizations can also be called directly using the single-cell object after combineExpression()
.
clonalDiversity(seurat,
split.by = "ident",
cloneCall = "nt")
This has been a general overview of the capabilities for scRepertoire from the initial processing and visualization to attach to the mRNA expression values in a Seurat object. If you have any questions, comments or suggestions, feel free to visit the GitHub repository or email me.
sessionInfo()
## R version 4.3.0 (2023-04-21)
## Platform: aarch64-apple-darwin20 (64-bit)
## Running under: macOS Ventura 13.3.1
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## time zone: America/Chicago
## tzcode source: internal
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] scales_1.2.1 circlize_0.4.15 ggraph_2.1.0
## [4] Seurat_4.9.9.9058 SeuratObject_4.9.9.9091 sp_2.0-0
## [7] scRepertoire_1.11.0 ggplot2_3.4.2 BiocStyle_2.28.0
##
## loaded via a namespace (and not attached):
## [1] cubature_2.1.0 RcppAnnoy_0.0.21
## [3] splines_4.3.0 later_1.3.1
## [5] bitops_1.0-7 tibble_3.2.1
## [7] polyclip_1.10-4 fastDummies_1.7.3
## [9] lifecycle_1.0.3 doParallel_1.0.17
## [11] globals_0.16.2 lattice_0.21-8
## [13] MASS_7.3-60 magrittr_2.0.3
## [15] plotly_4.10.2 sass_0.4.7
## [17] rmarkdown_2.23 jquerylib_0.1.4
## [19] yaml_2.3.7 httpuv_1.6.11
## [21] sctransform_0.3.5 spam_2.9-1
## [23] spatstat.sparse_3.0-2 reticulate_1.30
## [25] cowplot_1.1.1 pbapply_1.7-2
## [27] RColorBrewer_1.1-3 abind_1.4-5
## [29] zlibbioc_1.46.0 Rtsne_0.16
## [31] GenomicRanges_1.52.0 purrr_1.0.1
## [33] BiocGenerics_0.46.0 RCurl_1.98-1.12
## [35] tweenr_2.0.2 evmix_2.12
## [37] GenomeInfoDbData_1.2.10 IRanges_2.34.1
## [39] S4Vectors_0.38.1 ggrepel_0.9.3
## [41] irlba_2.3.5.1 listenv_0.9.0
## [43] spatstat.utils_3.0-3 vegan_2.6-4
## [45] MatrixModels_0.5-2 goftest_1.2-3
## [47] RSpectra_0.16-1 spatstat.random_3.1-5
## [49] fitdistrplus_1.1-11 parallelly_1.36.0
## [51] permute_0.9-7 leiden_0.4.3
## [53] codetools_0.2-19 DelayedArray_0.26.6
## [55] ggforce_0.4.1 shape_1.4.6
## [57] tidyselect_1.2.0 farver_2.1.1
## [59] viridis_0.6.4 matrixStats_1.0.0
## [61] stats4_4.3.0 spatstat.explore_3.2-1
## [63] jsonlite_1.8.7 ellipsis_0.3.2
## [65] tidygraph_1.2.3 progressr_0.13.0
## [67] ggridges_0.5.4 ggalluvial_0.12.5
## [69] survival_3.5-5 iterators_1.0.14
## [71] foreach_1.5.2 tools_4.3.0
## [73] stringdist_0.9.10 ica_1.0-3
## [75] Rcpp_1.0.11 glue_1.6.2
## [77] gridExtra_2.3 xfun_0.39
## [79] mgcv_1.9-0 MatrixGenerics_1.12.2
## [81] GenomeInfoDb_1.36.1 dplyr_1.1.2
## [83] withr_2.5.0 BiocManager_1.30.21.1
## [85] fastmap_1.1.1 fansi_1.0.4
## [87] SparseM_1.81 digest_0.6.33
## [89] R6_2.5.1 mime_0.12
## [91] colorspace_2.1-0 scattermore_1.2
## [93] tensor_1.5 spatstat.data_3.0-1
## [95] utf8_1.2.3 tidyr_1.3.0
## [97] generics_0.1.3 data.table_1.14.8
## [99] graphlayouts_1.0.0 httr_1.4.6
## [101] htmlwidgets_1.6.2 S4Arrays_1.0.4
## [103] uwot_0.1.16 pkgconfig_2.0.3
## [105] gtable_0.3.3 lmtest_0.9-40
## [107] SingleCellExperiment_1.22.0 XVector_0.40.0
## [109] powerTCR_1.20.0 htmltools_0.5.5
## [111] dotCall64_1.0-2 bookdown_0.34
## [113] Biobase_2.60.0 png_0.1-8
## [115] knitr_1.43 rstudioapi_0.15.0
## [117] reshape2_1.4.4 nlme_3.1-162
## [119] GlobalOptions_0.1.2 cachem_1.0.8
## [121] zoo_1.8-12 stringr_1.5.0
## [123] KernSmooth_2.23-22 parallel_4.3.0
## [125] miniUI_0.1.1.1 pillar_1.9.0
## [127] grid_4.3.0 vctrs_0.6.3
## [129] RANN_2.6.1 VGAM_1.1-8
## [131] promises_1.2.0.1 xtable_1.8-4
## [133] cluster_2.1.4 evaluate_0.21
## [135] isoband_0.2.7 magick_2.7.4
## [137] truncdist_1.0-2 cli_3.6.1
## [139] compiler_4.3.0 rlang_1.1.1
## [141] crayon_1.5.2 future.apply_1.11.0
## [143] labeling_0.4.2 plyr_1.8.8
## [145] stringi_1.7.12 deldir_1.0-9
## [147] viridisLite_0.4.2 munsell_0.5.0
## [149] gsl_2.1-8 lazyeval_0.2.2
## [151] spatstat.geom_3.2-4 quantreg_5.96
## [153] Matrix_1.6-0 RcppHNSW_0.4.1
## [155] patchwork_1.1.2 future_1.33.0
## [157] shiny_1.7.4.1 highr_0.10
## [159] SummarizedExperiment_1.30.2 evd_2.3-6.1
## [161] ROCR_1.0-11 igraph_1.5.0.1
## [163] bslib_0.5.0