vignettes/articles/Processing.Rmd
Processing.Rmd
What if there are more variables to add than just sample and ID? We
can add them by using the addVariable()
function. For each
element, the function will add a column (labeled by
variable.name
) with the variable
. The length
of the variables
parameter needs to match the length of the
combined object.
Key Parameter(s) for addVariable()
variable.name
: A character string that defines the new
variable to add (e.g., “Type”, “Treatment”).variables
: A character vector defining the desired
column value for each list element. Its length must match the number of
elements in the input.data list.As an example, here we add the Type in which the samples were
processed and sequenced to the combined.TCR
object:
combined.TCR <- addVariable(combined.TCR,
variable.name = "Type",
variables = rep(c("B", "L"), 4))
head(combined.TCR[[1]])
## barcode sample TCR1 cdr3_aa1
## 1 P17B_AAACCTGAGTACGACG-1 P17B TRAV25.TRAJ20.TRAC CGCSNDYKLSF
## 3 P17B_AAACCTGCAACACGCC-1 P17B TRAV38-2/DV8.TRAJ52.TRAC CAYRSAQAGGTSYGKLTF
## 5 P17B_AAACCTGCAGGCGATA-1 P17B TRAV12-1.TRAJ9.TRAC CVVSDNTGGFKTIF
## 7 P17B_AAACCTGCATGAGCGA-1 P17B TRAV12-1.TRAJ9.TRAC CVVSDNTGGFKTIF
## 9 P17B_AAACGGGAGAGCCCAA-1 P17B TRAV20.TRAJ8.TRAC CAVRGEGFQKLVF
## 10 P17B_AAACGGGAGCGTTTAC-1 P17B TRAV12-1.TRAJ9.TRAC CVVSDNTGGFKTIF
## cdr3_nt1
## 1 TGTGGGTGTTCTAACGACTACAAGCTCAGCTTT
## 3 TGTGCTTATAGGAGCGCGCAGGCTGGTGGTACTAGCTATGGAAAGCTGACATTT
## 5 TGTGTGGTCTCCGATAATACTGGAGGCTTCAAAACTATCTTT
## 7 TGTGTGGTCTCCGATAATACTGGAGGCTTCAAAACTATCTTT
## 9 TGTGCTGTGCGAGGAGAAGGCTTTCAGAAACTTGTATTT
## 10 TGTGTGGTCTCCGATAATACTGGAGGCTTCAAAACTATCTTT
## TCR2 cdr3_aa2
## 1 TRBV5-1.None.TRBJ2-7.TRBC2 CASSLTDRTYEQYF
## 3 TRBV10-3.None.TRBJ2-2.TRBC2 CAISEQGKGELFF
## 5 TRBV9.None.TRBJ2-2.TRBC2 CASSVRRERANTGELFF
## 7 TRBV9.None.TRBJ2-2.TRBC2 CASSVRRERANTGELFF
## 9 <NA> <NA>
## 10 TRBV9.None.TRBJ2-2.TRBC2 CASSVRRERANTGELFF
## cdr3_nt2
## 1 TGCGCCAGCAGCTTGACCGACAGGACCTACGAGCAGTACTTC
## 3 TGTGCCATCAGTGAACAGGGGAAAGGGGAGCTGTTTTTT
## 5 TGTGCCAGCAGCGTAAGGAGGGAAAGGGCGAACACCGGGGAGCTGTTTTTT
## 7 TGTGCCAGCAGCGTAAGGAGGGAAAGGGCGAACACCGGGGAGCTGTTTTTT
## 9 <NA>
## 10 TGTGCCAGCAGCGTAAGGAGGGAAAGGGCGAACACCGGGGAGCTGTTTTTT
## CTgene
## 1 TRAV25.TRAJ20.TRAC_TRBV5-1.None.TRBJ2-7.TRBC2
## 3 TRAV38-2/DV8.TRAJ52.TRAC_TRBV10-3.None.TRBJ2-2.TRBC2
## 5 TRAV12-1.TRAJ9.TRAC_TRBV9.None.TRBJ2-2.TRBC2
## 7 TRAV12-1.TRAJ9.TRAC_TRBV9.None.TRBJ2-2.TRBC2
## 9 TRAV20.TRAJ8.TRAC_NA
## 10 TRAV12-1.TRAJ9.TRAC_TRBV9.None.TRBJ2-2.TRBC2
## CTnt
## 1 TGTGGGTGTTCTAACGACTACAAGCTCAGCTTT_TGCGCCAGCAGCTTGACCGACAGGACCTACGAGCAGTACTTC
## 3 TGTGCTTATAGGAGCGCGCAGGCTGGTGGTACTAGCTATGGAAAGCTGACATTT_TGTGCCATCAGTGAACAGGGGAAAGGGGAGCTGTTTTTT
## 5 TGTGTGGTCTCCGATAATACTGGAGGCTTCAAAACTATCTTT_TGTGCCAGCAGCGTAAGGAGGGAAAGGGCGAACACCGGGGAGCTGTTTTTT
## 7 TGTGTGGTCTCCGATAATACTGGAGGCTTCAAAACTATCTTT_TGTGCCAGCAGCGTAAGGAGGGAAAGGGCGAACACCGGGGAGCTGTTTTTT
## 9 TGTGCTGTGCGAGGAGAAGGCTTTCAGAAACTTGTATTT_NA
## 10 TGTGTGGTCTCCGATAATACTGGAGGCTTCAAAACTATCTTT_TGTGCCAGCAGCGTAAGGAGGGAAAGGGCGAACACCGGGGAGCTGTTTTTT
## CTaa
## 1 CGCSNDYKLSF_CASSLTDRTYEQYF
## 3 CAYRSAQAGGTSYGKLTF_CAISEQGKGELFF
## 5 CVVSDNTGGFKTIF_CASSVRRERANTGELFF
## 7 CVVSDNTGGFKTIF_CASSVRRERANTGELFF
## 9 CAVRGEGFQKLVF_NA
## 10 CVVSDNTGGFKTIF_CASSVRRERANTGELFF
## CTstrict
## 1 TRAV25.TRAJ20.TRAC;TGTGGGTGTTCTAACGACTACAAGCTCAGCTTT_TRBV5-1.None.TRBJ2-7.TRBC2;TGCGCCAGCAGCTTGACCGACAGGACCTACGAGCAGTACTTC
## 3 TRAV38-2/DV8.TRAJ52.TRAC;TGTGCTTATAGGAGCGCGCAGGCTGGTGGTACTAGCTATGGAAAGCTGACATTT_TRBV10-3.None.TRBJ2-2.TRBC2;TGTGCCATCAGTGAACAGGGGAAAGGGGAGCTGTTTTTT
## 5 TRAV12-1.TRAJ9.TRAC;TGTGTGGTCTCCGATAATACTGGAGGCTTCAAAACTATCTTT_TRBV9.None.TRBJ2-2.TRBC2;TGTGCCAGCAGCGTAAGGAGGGAAAGGGCGAACACCGGGGAGCTGTTTTTT
## 7 TRAV12-1.TRAJ9.TRAC;TGTGTGGTCTCCGATAATACTGGAGGCTTCAAAACTATCTTT_TRBV9.None.TRBJ2-2.TRBC2;TGTGCCAGCAGCGTAAGGAGGGAAAGGGCGAACACCGGGGAGCTGTTTTTT
## 9 TRAV20.TRAJ8.TRAC;TGTGCTGTGCGAGGAGAAGGCTTTCAGAAACTTGTATTT_NA;NA
## 10 TRAV12-1.TRAJ9.TRAC;TGTGTGGTCTCCGATAATACTGGAGGCTTCAAAACTATCTTT_TRBV9.None.TRBJ2-2.TRBC2;TGTGCCAGCAGCGTAAGGAGGGAAAGGGCGAACACCGGGGAGCTGTTTTTT
## Type
## 1 B
## 3 B
## 5 B
## 7 B
## 9 B
## 10 B
Likewise, we can remove specific list elements after
combineTCR()
or combineBCR()
using the
subsetClones()
function. In order to subset, we need to
identify the column header we would like to use for subsetting
(name
) and the specific values to include
(variables
).
Key Parameter(s) for subsetClones()
name
: The column header/name in the metadata of
input.data to use for subsetting (e.g., “sample”, “Type”).variables
: A character vector of the specific values
within the chosen name column to retain in the subsetted data.Below, we isolate just the two sequencing results from “P18L” and “P18B” samples:
subset1 <- subsetClones(combined.TCR,
name = "sample",
variables = c("P18L", "P18B"))
head(subset1[[1]][,1:4])
## barcode sample TCR1 cdr3_aa1
## 1 P18B_AAACCTGAGGCTCAGA-1 P18B TRAV26-1.TRAJ37.TRAC CIVRGGSSNTGKLIF
## 3 P18B_AAACCTGCATGACATC-1 P18B TRAV3.TRAJ20.TRAC CAVQRSNDYKLSF
## 5 P18B_AAACCTGGTATGCTTG-1 P18B TRAV26-1.TRAJ53.TRAC CIGSSGGSNYKLTF
## 8 P18B_AAACGGGCAGATGGGT-1 P18B <NA> <NA>
## 9 P18B_AAACGGGTCTTACCGC-1 P18B TRAV20.TRAJ9.TRAC CAVQAKRYTGGFKTIF
## 12 P18B_AAAGATGAGTTACGGG-1 P18B TRAV8-3.TRAJ8.TRAC CAVGGDTGFQKLVF
Alternatively, we can also just select the list elements after
combineTCR()
or combineBCR()
.
## barcode sample TCR1 cdr3_aa1
## 1 P18B_AAACCTGAGGCTCAGA-1 P18B TRAV26-1.TRAJ37.TRAC CIVRGGSSNTGKLIF
## 3 P18B_AAACCTGCATGACATC-1 P18B TRAV3.TRAJ20.TRAC CAVQRSNDYKLSF
## 5 P18B_AAACCTGGTATGCTTG-1 P18B TRAV26-1.TRAJ53.TRAC CIGSSGGSNYKLTF
## 8 P18B_AAACGGGCAGATGGGT-1 P18B <NA> <NA>
## 9 P18B_AAACGGGTCTTACCGC-1 P18B TRAV20.TRAJ9.TRAC CAVQAKRYTGGFKTIF
## 12 P18B_AAAGATGAGTTACGGG-1 P18B TRAV8-3.TRAJ8.TRAC CAVGGDTGFQKLVF
After assigning the clone by barcode, we can export the clonal
information using exportClones()
to save for later use or
to integrate with other bioinformatics pipelines. This function supports
various output formats tailored for different analytical needs.
Key Parameter(s) for exportClones()
*
format
: The desired output format for the clonal data. *
airr
: Exports data in an Adaptive Immune Receptor
Repertoire (AIRR) Community-compliant format, with each row representing
a single receptor chain. * immunarch
: Exports a list
containing a data frame and metadata formatted for use with the
immunarch
package. * paired
: Exports a data
frame with paired chain information (amino acid, nucleotide, genes) per
barcode. This is the default. * TCRMatch
: Exports a data
frame specifically for the TCRMatch algorithm, containing TRB chain
amino acid sequence and clonal frequency. * tcrpheno
:
Exports a data frame compatible with the tcrpheno
pipeline,
with TRA and TRB chains in separate columns. * write.file
:
If TRUE
(default), saves the output to a CSV file. If
FALSE
, returns the data frame or list to the R environment.
* dir
: The directory where the output file will be saved.
Defaults to the current working directory. * file.name
: The
name of the CSV file to be saved.
To export the combined clonotypes as a paired
data frame
and save it to a specified directory:
exportClones(combined,
write.file = TRUE,
dir = "~/Documents/MyExperiment/Sample1/"
file.name = "clones.csv")
To return an immunarch
-formatted data frame directly to
your R environment without saving a file:
immunarch <- exportClones(combined.TCR,
format = "immunarch",
write.file = FALSE)
head(immunarch[[1]][[1]])
## Clones Proportion CDR3.nt
## 1 1 0.0003565062 TGCGCCAGCAGTCGGGGACTAGCGGGATACAATGAGCAGTTCTTC;NA
## 2 1 0.0003565062 TGTGCCATCAGCGCGGACCCCCGCTACAATGAGCAGTTCTTC;NA
## 3 1 0.0003565062 TGTGCCAGCAGCTTGAGGGACAGCTATCGGTACTATGGCTACACCTTC;NA
## 4 2 0.0007130125 TGTGCCAGCAGCCGGCAGGGCGCAGATACGCAGTATTTT;NA
## 5 1 0.0003565062 TGTGCCAGCAGTCCCTTTACAGGGTTCTATGGCTACACCTTC;NA
## 6 1 0.0003565062 TGTGCCAGCTCATCCGGGATCAATCAGCCCCAGCATTTT;NA
## CDR3.aa V.name D.name J.name C.name
## 1 CASSRGLAGYNEQFF;NA TRBV10-2;NA None;NA TRBJ2-1;NA TRBC2;NA
## 2 CAISADPRYNEQFF;NA TRBV10-3;NA None;NA TRBJ2-1;NA TRBC2;NA
## 3 CASSLRDSYRYYGYTF;NA TRBV11-3;NA None;NA TRBJ1-2;NA TRBC1;NA
## 4 CASSRQGADTQYF;NA TRBV11-3;NA None;NA TRBJ2-3;NA TRBC2;NA
## 5 CASSPFTGFYGYTF;NA TRBV12-4;NA None;NA TRBJ1-2;NA TRBC1;NA
## 6 CASSSGINQPQHF;NA TRBV18;NA None;NA TRBJ1-5;NA TRBC1;NA
## Barcode
## 1 P17B_AGCGGTCCAAAGGAAG-1
## 2 P17B_GGCTCGAGTCGCGGTT-1
## 3 P17B_CGCGTTTTCGGCTACG-1
## 4 P17B_CACCAGGGTTCCTCCA-1;P17B_TCTATTGCAGGTGCCT-1
## 5 P17B_GCTGGGTGTACGAAAT-1
## 6 P17B_AGGGTGACATTGGTAC-1
The annotateInvariant()
function enables the
identification of mucosal-associated invariant T (MAIT
)
cells and invariant natural killer T (iNKT
) cells in
single-cell sequencing datasets. These specialized T-cell subsets are
defined by their characteristic TCR usage, making them distinguishable
within single-cell immune profiling data. The function extracts TCR
chain information from the provided single-cell dataset and evaluates it
against known invariant TCR criteria for either MAIT or iNKT cells. Each
cell is assigned a score indicating the presence (1) or absence (0) of
the specified invariant T-cell population.
Key Parameter(s) for annotateInavriant()
type
: Character string specifying the type of invariant
T cell to annotate (MAIT
or iNKT
).species
: Character string specifying the species
(mouse
or human
).combined <- annotateInvariant(combined,
type = "MAIT",
species = "human")
combined <- annotateInvariant(combined,
type = "iNKT",
species = "human")