vignettes/articles/Loading.Rmd
Loading.Rmd
scRepertoire
primarily functions using the
filtered_contig_annotations.csv
output generated by 10x
Genomics Cell Ranger. This file is typically found in the ./outs/
directory of your VDJ alignment folder.
To prepare data for scRepertoire from 10x Genomics outputs, you would:
filtered_contig_annotations.csv
file for each
of your samples.S1 <- read.csv(".../Sample1/outs/filtered_contig_annotations.csv")
S2 <- read.csv(".../Sample2/outs/filtered_contig_annotations.csv")
S3 <- read.csv(".../Sample3/outs/filtered_contig_annotations.csv")
S4 <- read.csv(".../Sample4/outs/filtered_contig_annotations.csv")
contig_list <- list(S1, S2, S3, S4)
Beyond the default 10x Genomics Cell Ranger pipeline outputs,
scRepertoire
supports various other single-cell immune
receptor sequencing formats through the loadContigs()
function.
10X
: “filtered_contig_annotations.csv”AIRR
: “airr_rearrangement.tsv”BD
: “Contigs_AIRR.tsv”Dandelion
: “all_contig_dandelion.tsv”Immcantation
: “_data.tsv” (or similar)JSON
: “.json”MiXCR
: “clones.tsv”ParseBio
: “barcode_report.tsv”TRUST4
: “barcode_report.tsv”WAT3R
: “barcode_results.csv”Key Parameter(s) for loadContigs()
input
: A directory path containing your contig files
(the function will recursively search) or a list/data frame of
pre-loaded contig data.format
: A string specifying the data format (e.g.,
10X
, TRUST4
, WAT3R
). If set to
“auto”, the function will attempt to automatically detect the format
based on file names or data structure.You can provide loadContigs()
with a directory where
your sequencing experiments are located, and it will recursively load
and process the contig data based on the file names:
# Directory example
contig.output <- c("~/Documents/MyExperiment")
contig.list <- loadContigs(input = contig.output,
format = "TRUST4")
Alternatively, loadContigs()
can be given a list of
pre-loaded data frames and process the contig data based on the
specified format:
# List of data frames example
S1 <- read.csv("~/Documents/MyExperiment/Sample1/outs/barcode_results.csv")
S2 <- read.csv("~/Documents/MyExperiment/Sample2/outs/barcode_results.csv")
S3 <- read.csv("~/Documents/MyExperiment/Sample3/outs/barcode_results.csv")
S4 <- read.csv("~/Documents/MyExperiment/Sample4/outs/barcode_results.csv")
contig.list <- list(S1, S2, S3, S4)
contig.list <- loadContigs(input = contig.list,
format = "WAT3R")
It is now easy to create the contig list from a multiplexed
experiment by first generating a single-cell RNA object (either Seurat
or Single Cell Experiment), loading the filtered contig file, and then
using createHTOContigList()
. This function will return a
list separated by the group.by
variable(s).
createHTOContigList()
To create a contig list separated by HTO (Hash Tag Oligo) IDs from a single-cell object:
contigs <- read.csv(".../outs/filtered_contig_annotations.csv")
contig.list <- createHTOContigList(contigs,
Seurat.Obj,
group.by = "HTO_maxID")
scRepertoire
scRepertoire
includes a built-in example dataset to
demonstrate the functionality of the R package. This dataset consists of
T cells derived from four patients with acute respiratory distress with
paired peripheral-blood (B) and bronchoalveolar lavage (L), effectively
creating 8 distinct runs for T cell receptor (TCR) enrichment. More
information on the data set can be found in the corresponding manuscript.
The built-in example data is derived from the 10x Cell Ranger pipeline, so it is ready to go for downstream processing and analysis.
To load and preview the example data built into scRepertoire:
## barcode is_cell contig_id high_confidence length
## 1 AAACCTGAGTACGACG-1 True AAACCTGAGTACGACG-1_contig_1 True 500
## 2 AAACCTGAGTACGACG-1 True AAACCTGAGTACGACG-1_contig_2 True 478
## 4 AAACCTGCAACACGCC-1 True AAACCTGCAACACGCC-1_contig_1 True 506
## 5 AAACCTGCAACACGCC-1 True AAACCTGCAACACGCC-1_contig_2 True 470
## 6 AAACCTGCAGGCGATA-1 True AAACCTGCAGGCGATA-1_contig_1 True 558
## 7 AAACCTGCAGGCGATA-1 True AAACCTGCAGGCGATA-1_contig_2 True 505
## chain v_gene d_gene j_gene c_gene full_length productive
## 1 TRA TRAV25 None TRAJ20 TRAC True True
## 2 TRB TRBV5-1 None TRBJ2-7 TRBC2 True True
## 4 TRA TRAV38-2/DV8 None TRAJ52 TRAC True True
## 5 TRB TRBV10-3 None TRBJ2-2 TRBC2 True True
## 6 TRA TRAV12-1 None TRAJ9 TRAC True True
## 7 TRB TRBV9 None TRBJ2-2 TRBC2 True True
## cdr3 cdr3_nt
## 1 CGCSNDYKLSF TGTGGGTGTTCTAACGACTACAAGCTCAGCTTT
## 2 CASSLTDRTYEQYF TGCGCCAGCAGCTTGACCGACAGGACCTACGAGCAGTACTTC
## 4 CAYRSAQAGGTSYGKLTF TGTGCTTATAGGAGCGCGCAGGCTGGTGGTACTAGCTATGGAAAGCTGACATTT
## 5 CAISEQGKGELFF TGTGCCATCAGTGAACAGGGGAAAGGGGAGCTGTTTTTT
## 6 CVVSDNTGGFKTIF TGTGTGGTCTCCGATAATACTGGAGGCTTCAAAACTATCTTT
## 7 CASSVRRERANTGELFF TGTGCCAGCAGCGTAAGGAGGGAAAGGGCGAACACCGGGGAGCTGTTTTTT
## reads umis raw_clonotype_id raw_consensus_id
## 1 8344 4 clonotype123 clonotype123_consensus_2
## 2 65390 38 clonotype123 clonotype123_consensus_1
## 4 18372 8 clonotype124 clonotype124_consensus_1
## 5 34054 9 clonotype124 clonotype124_consensus_2
## 6 5018 2 clonotype1 clonotype1_consensus_2
## 7 25110 11 clonotype1 clonotype1_consensus_1