There are varying definitions of clones or clones in the literature. For the purposes of scRepertoire, we will use clone and define this as the cells with shared/trackable complementarity-determining region 3 (CDR3) sequences. Within this definition, one might use amino acid (aa) sequences of one or both chains to define a clone. Alternatively, we could use nucleotide (nt) or the V(D)JC genes (genes) to define a clone. The latter genes would be a more permissive definition of “clones”, as multiple amino acid or nucleotide sequences can result from the same gene combination. Another option to define clone is the use of the V(D)JC and nucleotide sequence (strict). scRepertoire allows for the use of all these definitions of clones and allows for users to select both or individual chains to examine.

The first step in getting clones is to use the single-cell barcodes to organize cells into paired sequences. This is accomplished using combineTCR() and combineBCR().

combineTCR

input.data

  • List of filtered_contig_annotations.csv data frames from the 10x Cell Ranger.
  • List of data processed using loadContigs().

samples and ID

  • Grouping variables for downstream analysis and will be added as prefixes to prevent issues with duplicate barcodes (optional).

removeNA

  • TRUE - Filter to remove any cell barcode with an NA value in at least one of the chains.
  • FALSE - Include and incorporate cells with 1 NA value (default).

removeMulti

  • TRUE - Filter to remove any cell barcode with more than 2 immune receptor chains.
  • FALSE - Include and incorporate cells with > 2 chains (default).

filterMulti

  • TRUE - Isolate the top 2 expressed chains in cell barcodes with multiple chains.
  • FALSE - Include and incorporate cells with > 2 chains (default).

The output of combineTCR() will be a list of contig data frames that will be reduced to the reads associated with a single cell barcode. It will also combine the multiple reads into clone calls by either the nucleotide sequence (CTnt), amino acid sequence (CTaa), the VDJC gene sequence (CTgene), or the combination of the nucleotide and gene sequence (CTstrict).

combined.TCR <- combineTCR(contig_list, 
                           samples = c("P17B", "P17L", "P18B", "P18L", 
                                            "P19B","P19L", "P20B", "P20L"),
                           removeNA = FALSE, 
                           removeMulti = FALSE, 
                           filterMulti = FALSE)

head(combined.TCR[[1]])
##                    barcode sample                     TCR1           cdr3_aa1
## 1  P17B_AAACCTGAGTACGACG-1   P17B       TRAV25.TRAJ20.TRAC        CGCSNDYKLSF
## 3  P17B_AAACCTGCAACACGCC-1   P17B TRAV38-2/DV8.TRAJ52.TRAC CAYRSAQAGGTSYGKLTF
## 5  P17B_AAACCTGCAGGCGATA-1   P17B      TRAV12-1.TRAJ9.TRAC     CVVSDNTGGFKTIF
## 7  P17B_AAACCTGCATGAGCGA-1   P17B      TRAV12-1.TRAJ9.TRAC     CVVSDNTGGFKTIF
## 9  P17B_AAACGGGAGAGCCCAA-1   P17B        TRAV20.TRAJ8.TRAC      CAVRGEGFQKLVF
## 10 P17B_AAACGGGAGCGTTTAC-1   P17B      TRAV12-1.TRAJ9.TRAC     CVVSDNTGGFKTIF
##                                                  cdr3_nt1
## 1                       TGTGGGTGTTCTAACGACTACAAGCTCAGCTTT
## 3  TGTGCTTATAGGAGCGCGCAGGCTGGTGGTACTAGCTATGGAAAGCTGACATTT
## 5              TGTGTGGTCTCCGATAATACTGGAGGCTTCAAAACTATCTTT
## 7              TGTGTGGTCTCCGATAATACTGGAGGCTTCAAAACTATCTTT
## 9                 TGTGCTGTGCGAGGAGAAGGCTTTCAGAAACTTGTATTT
## 10             TGTGTGGTCTCCGATAATACTGGAGGCTTCAAAACTATCTTT
##                           TCR2          cdr3_aa2
## 1   TRBV5-1.None.TRBJ2-7.TRBC2    CASSLTDRTYEQYF
## 3  TRBV10-3.None.TRBJ2-2.TRBC2     CAISEQGKGELFF
## 5     TRBV9.None.TRBJ2-2.TRBC2 CASSVRRERANTGELFF
## 7     TRBV9.None.TRBJ2-2.TRBC2 CASSVRRERANTGELFF
## 9                         <NA>              <NA>
## 10    TRBV9.None.TRBJ2-2.TRBC2 CASSVRRERANTGELFF
##                                               cdr3_nt2
## 1           TGCGCCAGCAGCTTGACCGACAGGACCTACGAGCAGTACTTC
## 3              TGTGCCATCAGTGAACAGGGGAAAGGGGAGCTGTTTTTT
## 5  TGTGCCAGCAGCGTAAGGAGGGAAAGGGCGAACACCGGGGAGCTGTTTTTT
## 7  TGTGCCAGCAGCGTAAGGAGGGAAAGGGCGAACACCGGGGAGCTGTTTTTT
## 9                                                 <NA>
## 10 TGTGCCAGCAGCGTAAGGAGGGAAAGGGCGAACACCGGGGAGCTGTTTTTT
##                                                  CTgene
## 1         TRAV25.TRAJ20.TRAC_TRBV5-1.None.TRBJ2-7.TRBC2
## 3  TRAV38-2/DV8.TRAJ52.TRAC_TRBV10-3.None.TRBJ2-2.TRBC2
## 5          TRAV12-1.TRAJ9.TRAC_TRBV9.None.TRBJ2-2.TRBC2
## 7          TRAV12-1.TRAJ9.TRAC_TRBV9.None.TRBJ2-2.TRBC2
## 9                                  TRAV20.TRAJ8.TRAC_NA
## 10         TRAV12-1.TRAJ9.TRAC_TRBV9.None.TRBJ2-2.TRBC2
##                                                                                              CTnt
## 1                    TGTGGGTGTTCTAACGACTACAAGCTCAGCTTT_TGCGCCAGCAGCTTGACCGACAGGACCTACGAGCAGTACTTC
## 3  TGTGCTTATAGGAGCGCGCAGGCTGGTGGTACTAGCTATGGAAAGCTGACATTT_TGTGCCATCAGTGAACAGGGGAAAGGGGAGCTGTTTTTT
## 5  TGTGTGGTCTCCGATAATACTGGAGGCTTCAAAACTATCTTT_TGTGCCAGCAGCGTAAGGAGGGAAAGGGCGAACACCGGGGAGCTGTTTTTT
## 7  TGTGTGGTCTCCGATAATACTGGAGGCTTCAAAACTATCTTT_TGTGCCAGCAGCGTAAGGAGGGAAAGGGCGAACACCGGGGAGCTGTTTTTT
## 9                                                      TGTGCTGTGCGAGGAGAAGGCTTTCAGAAACTTGTATTT_NA
## 10 TGTGTGGTCTCCGATAATACTGGAGGCTTCAAAACTATCTTT_TGTGCCAGCAGCGTAAGGAGGGAAAGGGCGAACACCGGGGAGCTGTTTTTT
##                                CTaa
## 1        CGCSNDYKLSF_CASSLTDRTYEQYF
## 3  CAYRSAQAGGTSYGKLTF_CAISEQGKGELFF
## 5  CVVSDNTGGFKTIF_CASSVRRERANTGELFF
## 7  CVVSDNTGGFKTIF_CASSVRRERANTGELFF
## 9                  CAVRGEGFQKLVF_NA
## 10 CVVSDNTGGFKTIF_CASSVRRERANTGELFF
##                                                                                                                                               CTstrict
## 1                           TRAV25.TRAJ20.TRAC;TGTGGGTGTTCTAACGACTACAAGCTCAGCTTT_TRBV5-1.None.TRBJ2-7.TRBC2;TGCGCCAGCAGCTTGACCGACAGGACCTACGAGCAGTACTTC
## 3  TRAV38-2/DV8.TRAJ52.TRAC;TGTGCTTATAGGAGCGCGCAGGCTGGTGGTACTAGCTATGGAAAGCTGACATTT_TRBV10-3.None.TRBJ2-2.TRBC2;TGTGCCATCAGTGAACAGGGGAAAGGGGAGCTGTTTTTT
## 5          TRAV12-1.TRAJ9.TRAC;TGTGTGGTCTCCGATAATACTGGAGGCTTCAAAACTATCTTT_TRBV9.None.TRBJ2-2.TRBC2;TGTGCCAGCAGCGTAAGGAGGGAAAGGGCGAACACCGGGGAGCTGTTTTTT
## 7          TRAV12-1.TRAJ9.TRAC;TGTGTGGTCTCCGATAATACTGGAGGCTTCAAAACTATCTTT_TRBV9.None.TRBJ2-2.TRBC2;TGTGCCAGCAGCGTAAGGAGGGAAAGGGCGAACACCGGGGAGCTGTTTTTT
## 9                                                                                      TRAV20.TRAJ8.TRAC;TGTGCTGTGCGAGGAGAAGGCTTTCAGAAACTTGTATTT_NA;NA
## 10         TRAV12-1.TRAJ9.TRAC;TGTGTGGTCTCCGATAATACTGGAGGCTTCAAAACTATCTTT_TRBV9.None.TRBJ2-2.TRBC2;TGTGCCAGCAGCGTAAGGAGGGAAAGGGCGAACACCGGGGAGCTGTTTTTT

combineBCR

combineBCR() is analogous to combineTCR() with 2 major changes: 1) Each barcode can only have a maximum of 2 sequences, if greater exists, the 2 with the highest reads are selected; 2) The strict definition of a clone is based on the normalized Levenshtein edit distance of CDR3 nucleotide sequences and V-gene usage. For more information on this approach, please see the respective citation. This definition allows for the grouping of BCRs derived from the same progenitor that have undergone mutation as part of somatic hypermutation and affinity maturation.

threshold
The level of similarity in sequences to group together. Default is 0.85.

\[ \text{threshold}(s, t) = 1-\frac{\text{Levenshtein}(s, t)}{\frac{\text{length}(s) + \text{length}(t)}{2}} \]

call.related.clones
Calculate the normalized edit distance (TRUE) or skip the calculation (FALSE). Skipping the edit distance calculation may save time, especially in the context of large data sets, but is not recommended.

BCR.contigs <- read.csv("https://www.borch.dev/uploads/contigs/b_contigs.csv")
combined.BCR <- combineBCR(BCR.contigs, 
                           samples = "P1", 
                           threshold = 0.85)

head(combined.BCR[[1]])
##                 barcode sample                             IGH
## 1 P1_CGAACATTCCCTTGTG-1     P1                            <NA>
## 2 P1_CCTTCGAAGACGCTTT-1     P1                            <NA>
## 3 P1_CAGCTAACAGCTTAAC-1     P1                            <NA>
## 4 P1_CCCAATCAGAGGGCTT-1     P1 IGHV1-69-2.IGHD5-12.IGHJ6.IGHG1
## 5 P1_TACAGTGGTAAGGGCT-1     P1   IGHV1-18.IGHD3-10.IGHJ5.IGHA1
## 6 P1_GACAGAGGTTGGTAAA-1     P1   IGHV1-18.IGHD3-10.IGHJ5.IGHA1
##                cdr3_aa1
## 1                  <NA>
## 2                  <NA>
## 3                  <NA>
## 4  CASGGLVAIRHYYYYGMDVW
## 5 CARVTYHHGSGSLVGGWFDPW
## 6 CARVTYHHGSGSLVGGWFDPW
##                                                          cdr3_nt1
## 1                                                            <NA>
## 2                                                            <NA>
## 3                                                            <NA>
## 4    TGTGCGAGCGGGGGATTAGTGGCTATTAGGCACTACTACTACTACGGTATGGACGTCTGG
## 5 TGTGCGAGAGTGACATATCACCATGGTTCGGGGAGCCTTGTCGGGGGCTGGTTCGACCCCTGG
## 6 TGTGCGAGAGTGACATATCACCATGGTTCGGGGAGCCTTGTCGGGGGCTGGTTCGACCCCTGG
##                   IGLC      cdr3_aa2                                cdr3_nt2
## 1 IGLV2-14.IGLJ6.IGLC2  CNSKGAGGTAVF    TGCAACTCAAAAGGAGCCGGAGGCACTGCGGTTTTC
## 2 IGLV2-14.IGLJ6.IGLC2  CNSKGAGGTAVF    TGCAACTCAAAAGGAGCCGGAGGCACTGCGGTTTTC
## 3 IGLV2-14.IGLJ2.IGLC2 CNSYTTSGTLWVF TGCAACTCATACACAACCAGCGGCACTCTCTGGGTATTC
## 4 IGLV2-14.IGLJ2.IGLC2  CNSYTSSSTLVF    TGCAACTCATATACAAGCAGCAGCACTCTGGTCTTC
## 5 IGLV2-14.IGLJ1.IGLC1  CNSYTSFGTSVF    TGCAACTCATATACAAGCTTCGGCACCTCGGTCTTC
## 6 IGLV2-14.IGLJ1.IGLC1  CNSYTSFGTSVF    TGCAACTCATATACAAGCTTCGGCACCTCGGTCTTC
##                                                 CTgene
## 1                              NA_IGLV2-14.IGLJ6.IGLC2
## 2                              NA_IGLV2-14.IGLJ6.IGLC2
## 3                              NA_IGLV2-14.IGLJ2.IGLC2
## 4 IGHV1-69-2.IGHD5-12.IGHJ6.IGHG1_IGLV2-14.IGLJ2.IGLC2
## 5   IGHV1-18.IGHD3-10.IGHJ5.IGHA1_IGLV2-14.IGLJ1.IGLC1
## 6   IGHV1-18.IGHD3-10.IGHJ5.IGHA1_IGLV2-14.IGLJ1.IGLC1
##                                                                                                   CTnt
## 1                                                              NA_TGCAACTCAAAAGGAGCCGGAGGCACTGCGGTTTTC
## 2                                                              NA_TGCAACTCAAAAGGAGCCGGAGGCACTGCGGTTTTC
## 3                                                           NA_TGCAACTCATACACAACCAGCGGCACTCTCTGGGTATTC
## 4    TGTGCGAGCGGGGGATTAGTGGCTATTAGGCACTACTACTACTACGGTATGGACGTCTGG_TGCAACTCATATACAAGCAGCAGCACTCTGGTCTTC
## 5 TGTGCGAGAGTGACATATCACCATGGTTCGGGGAGCCTTGTCGGGGGCTGGTTCGACCCCTGG_TGCAACTCATATACAAGCTTCGGCACCTCGGTCTTC
## 6 TGTGCGAGAGTGACATATCACCATGGTTCGGGGAGCCTTGTCGGGGGCTGGTTCGACCCCTGG_TGCAACTCATATACAAGCTTCGGCACCTCGGTCTTC
##                                 CTaa
## 1                    NA_CNSKGAGGTAVF
## 2                    NA_CNSKGAGGTAVF
## 3                   NA_CNSYTTSGTLWVF
## 4  CASGGLVAIRHYYYYGMDVW_CNSYTSSSTLVF
## 5 CARVTYHHGSGSLVGGWFDPW_CNSYTSFGTSVF
## 6 CARVTYHHGSGSLVGGWFDPW_CNSYTSFGTSVF
##                                      CTstrict
## 1                     NA.NA_IGLC.549.IGLV2-14
## 2                     NA.NA_IGLC.549.IGLV2-14
## 3                     NA.NA_IGLC.535.IGLV2-14
## 4 IGH.481.IGHV1-69-2_IGLC:Cluster.56.IGLV2-14
## 5   IGH.781.IGHV1-18_IGLC:Cluster.56.IGLV2-14
## 6   IGH.781.IGHV1-18_IGLC:Cluster.56.IGLV2-14