This function calculates and visualizes the frequency of k-mer motifs for either nucleotide (nt) or amino acid (aa) sequences. It produces a heatmap showing the relative composition of the most variable motifs across samples or groups.
percentKmer(
input.data,
chain = "TRB",
cloneCall = "aa",
group.by = NULL,
order.by = NULL,
motif.length = 3,
min.depth = 3,
top.motifs = 30,
exportTable = FALSE,
palette = "inferno",
...
)The product of combineTCR(),
combineBCR(), or combineExpression()
The TCR/BCR chain to use. Accepted values: TRA, TRB, TRG,
TRD, IGH, or IGL (for both light chains).
Defines the clonal sequence grouping. Accepted values
are: nt (CDR3 nucleotide sequence) or aa (CDR3 amino acid sequence).
A column header in the metadata or lists to group the analysis
by (e.g., "sample", "treatment"). If NULL, data will be analyzed as
by list element or active identity in the case of single-cell objects.
A character vector defining the desired order of elements
of the group.by variable. Alternatively, use alphanumeric to sort groups
automatically.
The length of the kmer to analyze
Minimum count a motif must reach to be retained in the
output (>= 1). Default: 3.
Return the n most variable motifs as a function of median absolute deviation
If TRUE, returns a data frame or matrix of the results
instead of a plot.
Colors to use in visualization - input any hcl.pals
Additional arguments passed to the ggplot theme
A ggplot object displaying a heatmap of motif percentages.
If exportTable = TRUE, a matrix of the raw data is returned.
The function first calculates k-mer frequencies for each sample/group. By default, it then identifies the 30 most variable motifs based on the Median Absolute Deviation (MAD) across all samples and displays their frequencies in a heatmap.
# Making combined contig data
combined <- combineTCR(contig_list,
samples = c("P17B", "P17L", "P18B", "P18L",
"P19B","P19L", "P20B", "P20L"))
# Using percentKmer()
percentKmer(combined,
chain = "TRB",
motif.length = 3)