vignettes/articles/Repertoire_Summary.Rmd
Repertoire_Summary.Rmd
Quantify the proportion of amino acids along the cdr3 sequence with
percentAA()
. By default, the function will pad the
sequences with NAs up to the maximum of aa.length.
Sequences longer than aa.length will be removed before
visualization (default aa.length = 20).
percentAA(combined.TCR,
chain = "TRB",
aa.length = 20)
We can also quantify the level of entropy/diversity across amino acid
residues along the cdr3 sequence. positionalEntropy()
combines the quantification by residue of percentAA()
with
the diversity calls in clonalDiversity()
.
method
positionalEntropy(combined.TCR,
chain = "TRB",
aa.length = 20)
Like positionalEntropy()
, we can also examine a series
of amino acid properties along the cdr3 sequences using
positionalProperty()
. Important differences from the above
function for positionalProperty()
is dropping NA values as
they would void the mean calculation. positionalProperty()
also display a ribbon with the 95% confidence interval surrounding the
mean value for the selected properties.
method
positionalProperty(combined.TCR[c(1,2)],
chain = "TRB",
aa.length = 20,
method = "Atchley") +
scale_color_manual(values = hcl.colors(5, "inferno")[c(2,4)])
A visualization of the relative usage of genes of the TCR or BCR,
using vizGenes()
. There is some functional crossover
between vizGenes()
and two functions below called
percentGenes()
and percentVJ()
. But
vizGenes()
is more adaptable to allow for comparisons
across chains, scaling, etc.
x.axis
y.axis
plot
scale
order
vizGenes(combined.TCR,
x.axis = "TRBV",
y.axis = NULL,
plot = "barplot",
scale = TRUE)
vizGenes()
can also be used to look at the usage of
genes in a single chain. For example, say we are interested in the
difference in TRB V and J usage between lung and peripheral blood
samples - we can easily take a look at this using the following
code:
#Peripheral Blood
vizGenes(combined.TCR[c(1,3,5,7)],
x.axis = "TRBV",
y.axis = "TRBJ",
plot = "heatmap",
scale = TRUE)
#Lung
vizGenes(combined.TCR[c(2,4,6,8)],
x.axis = "TRBV",
y.axis = "TRBJ",
plot = "heatmap",
scale = TRUE)
For the P17 patient samples, what if we are interested in chain pairings, we can look at TRBV and TRAV at the same time using them as inputs to x.axis and y.axis.
Quantify the proportion of V or J gene usage with
percentGenes()
. Like percentAA()
, we select
the chain of interest and then indicate the gene of interest with the
gene parameter. Two major limitations of
percentGenes()
are, 1) the function quantifies only V or J
genes, and 2) the quantification of the genes are limited to all the V
or J genes seen across the samples, not all possible V or J genes.
percentGenes(combined.TCR,
chain = "TRB",
gene = "Vgene")
We can also use the output percentGenes()
for
dimensional reduction to summarise the gene usage by sample. This can be
done with a simple principal component analysis (below) or even more
complex reductions.
df.genes <- percentGenes(combined.TCR,
chain = "TRB",
gene = "Vgene",
exportTable = TRUE)
#Performing PCA
pc <- prcomp(df.genes)
#Getting data frame to plot from
df <- as.data.frame(cbind(pc$x[,1:2], rownames(df.genes)))
df$PC1 <- as.numeric(df$PC1)
df$PC2 <- as.numeric(df$PC2)
#Plotting
ggplot(df, aes(x = PC1, y = PC2)) +
geom_point(aes(fill =df[,3]), shape = 21, size = 5) +
guides(fill=guide_legend(title="Samples")) +
scale_fill_manual(values = hcl.colors(nrow(df), "inferno")) +
theme_classic()
Quantify the proportion of V and J gene usage with
percentVJ()
. Like percentGenes()
, this
function will quantify the percentage of V and J paired together across
individual repertoires. The output can be visualized using a heatmap or
as input for further dimensional reduction.
percentVJ(combined.TCR,
chain = "TRB")
df.genes <- percentVJ(combined.TCR,
chain = "TRB",
exportTable = TRUE)
#Performing PCA
pc <- prcomp(df.genes)
#Getting data frame to plot from
df <- as.data.frame(cbind(pc$x[,1:2], rownames(df.genes)))
df$PC1 <- as.numeric(df$PC1)
df$PC2 <- as.numeric(df$PC2)
#Plotting
ggplot(df, aes(x = PC1, y = PC2)) +
geom_point(aes(fill =df[,3]), shape = 21, size = 5) +
guides(fill=guide_legend(title="Samples")) +
scale_fill_manual(values = hcl.colors(nrow(df), "inferno")) +
theme_classic()
Another quantification of the composition of the CDR3 sequence is to define motifs by sliding across the amino acid or nucleotide sequences at set intervals resulting in substrings or kmers.
motif.length
top.motifs
percentKmer(combined.TCR,
cloneCall = "aa",
chain = "TRB",
motif.length = 3,
top.motifs = 25)
percentKmer(combined.TCR,
cloneCall = "nt",
chain = "TRB",
motif.length = 3,
top.motifs = 25)