Loading and Processing Contig Data

What data to load into scRepertoire?

scRepertoire functions using the filtered_contig_annotations.csv output from the 10x Genomics Cell Ranger. This file is located in the ./outs/ directory of the VDJ alignment folder. To generate a list of contigs to use for scRepertoire:

  • load the filtered_contig_annotations.csv for each of the samples.
  • make a list in the R environment.
S1 <- read.csv(".../Sample1/outs/filtered_contig_annotations.csv")
S2 <- read.csv(".../Sample2/outs/filtered_contig_annotations.csv")
S3 <- read.csv(".../Sample3/outs/filtered_contig_annotations.csv")
S4 <- read.csv(".../Sample4/outs/filtered_contig_annotations.csv")

contig_list <- list(S1, S2, S3, S4)

Other alignment workflows

Beyond the default 10x Genomic Cell Ranger pipeline outputs, scRepertoire supports the following single-cell formats:

loadContigs() can be given a directory where the sequencing experiments are located and it will recursively load and process the contig data based on the file names. Alternatively, loadContigs() can be given a list of data frames and process the contig data

#Directory example
contig.output <- c("~/Documents/MyExperiment")
contig.list <- loadContigs(input = contig.output, 
                           format = "TRUST4")

#List of data frames example
S1 <- read.csv("~/Documents/MyExperiment/Sample1/outs/barcode_results.csv")
S2 <- read.csv("~/Documents/MyExperiment/Sample2/outs/barcode_results.csv")
S3 <- read.csv("~/Documents/MyExperiment/Sample3/outs/barcode_results.csv")
S4 <- read.csv("~/Documents/MyExperiment/Sample4/outs/barcode_results.csv")

contig_list <- list(S1, S2, S3, S4)
contig.list <- loadContigs(input = contig.output, 
                           format = "WAT3R")

Multiplexed Experiment

It is now easy to create the contig list from a multiplexed experiment by first generating a single-cell RNA object (either Seurat or Single Cell Experiment), loading the filtered contig file and then using createHTOContigList(). This function will return a list separated by the group.by variable(s).

This function depends on the match of barcodes between the single-cell object and contigs. If there is a prefix or different suffix added to the barcode, this will result in no contigs recovered. Currently, it is recommended you do this step before the integration, as integration workflows commonly alter the barcodes. There is a multi.run variable that can be used on the integrated object. However, it assumes you have modified the barcodes with the Seurat pipeline (automatic addition of _# to end), and your contig list is in the same order.

contigs <- read.csv(".../outs/filtered_contig_annotations.csv")

contig.list <- createHTOContigList(contigs, 
                                   Seurat.Obj, 
                                   group.by = "HTO_maxID")

Example Data in scRepertoire

scRepertoire comes with a data set from T cells derived from four patients with acute respiratory distress to demonstrate the functionality of the R package. More information on the data set can be found in the corresponding manuscript. The samples consist of paired peripheral-blood (B) and bronchoalveolar lavage (L), effectively creating 8 distinct runs for T cell receptor (TCR) enrichment. We can preview the elements in the list by using the head function and looking at the first contig annotation.

The built-in example data is derived from the 10x Cell Ranger pipeline, so it is ready to go for downstream processing and analysis.

data("contig_list") #the data built into scRepertoire

head(contig_list[[1]])
##              barcode is_cell                   contig_id high_confidence length
## 1 AAACCTGAGTACGACG-1    True AAACCTGAGTACGACG-1_contig_1            True    500
## 2 AAACCTGAGTACGACG-1    True AAACCTGAGTACGACG-1_contig_2            True    478
## 4 AAACCTGCAACACGCC-1    True AAACCTGCAACACGCC-1_contig_1            True    506
## 5 AAACCTGCAACACGCC-1    True AAACCTGCAACACGCC-1_contig_2            True    470
## 6 AAACCTGCAGGCGATA-1    True AAACCTGCAGGCGATA-1_contig_1            True    558
## 7 AAACCTGCAGGCGATA-1    True AAACCTGCAGGCGATA-1_contig_2            True    505
##   chain       v_gene d_gene  j_gene c_gene full_length productive
## 1   TRA       TRAV25   None  TRAJ20   TRAC        True       True
## 2   TRB      TRBV5-1   None TRBJ2-7  TRBC2        True       True
## 4   TRA TRAV38-2/DV8   None  TRAJ52   TRAC        True       True
## 5   TRB     TRBV10-3   None TRBJ2-2  TRBC2        True       True
## 6   TRA     TRAV12-1   None   TRAJ9   TRAC        True       True
## 7   TRB        TRBV9   None TRBJ2-2  TRBC2        True       True
##                 cdr3                                                cdr3_nt
## 1        CGCSNDYKLSF                      TGTGGGTGTTCTAACGACTACAAGCTCAGCTTT
## 2     CASSLTDRTYEQYF             TGCGCCAGCAGCTTGACCGACAGGACCTACGAGCAGTACTTC
## 4 CAYRSAQAGGTSYGKLTF TGTGCTTATAGGAGCGCGCAGGCTGGTGGTACTAGCTATGGAAAGCTGACATTT
## 5      CAISEQGKGELFF                TGTGCCATCAGTGAACAGGGGAAAGGGGAGCTGTTTTTT
## 6     CVVSDNTGGFKTIF             TGTGTGGTCTCCGATAATACTGGAGGCTTCAAAACTATCTTT
## 7  CASSVRRERANTGELFF    TGTGCCAGCAGCGTAAGGAGGGAAAGGGCGAACACCGGGGAGCTGTTTTTT
##   reads umis raw_clonotype_id         raw_consensus_id
## 1  8344    4     clonotype123 clonotype123_consensus_2
## 2 65390   38     clonotype123 clonotype123_consensus_1
## 4 18372    8     clonotype124 clonotype124_consensus_1
## 5 34054    9     clonotype124 clonotype124_consensus_2
## 6  5018    2       clonotype1   clonotype1_consensus_2
## 7 25110   11       clonotype1   clonotype1_consensus_1