
Getting Started with deepMatchR
Nick Borcherding
Washington University in St. Louis, School of Medicine, St. Louis, MO, USACompiled: January 27, 2026
Source:vignettes/deepMatchR.Rmd
deepMatchR.RmdIntroduction
Human leukocyte antigen (HLA) molecules are central to both innate and adaptive immune responses. Characterizing HLA antibodies and typing plays a pivotal role in bone marrow and solid organ transplantation.
deepMatchR provides a collection of tools that use advanced statistical approaches and deep learning to better characterize HLA immunogenicity for clinical and research applications.
Key capabilities include:
- Sequence-level mismatch quantification (amino acid and eplet)
- Peptide-MHC binding prediction
- Visualization of antibody reactivity data
- Integration with standard HLA nomenclature
For more information, visit the deepMatchR GitHub repository.
Installation
# Install from Bioconductor (when available)
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("deepMatchR")
# Or install the development version from GitHub
# remotes::install_github("BorchLab/deepMatchR")Working with HLA Data
Built-in Example Data
deepMatchR includes simulated assay data for learning and testing:
- Class I SAB - Single Antigen Bead assay results
- Class II SAB - Single Antigen Bead assay results
- PRA - Panel Reactive Antibody assay results
data("deepMatchR_example")Data Format Requirements
For downstream analysis, your data must include these columns:
| Column | Description | Example |
|---|---|---|
BeadID |
Unique bead identifier | 1, 2, 3… |
SpecAbbr |
Abbreviated specificity | A01, B35 |
Specificity |
Full specificity string | “A01:01, B07:01” |
NormalValue |
Measured MFI value | 15234 |
# Class I SAB example
head(deepMatchR_example[[1]], 3)
#> BeadID SpecAbbr Specificity NormalValue RawData CountValue
#> 1 1 -,-,-,-,-,-,-,- -,-,-,-,-,- NA 48.04 77
#> 2 2 -,-,-,-,-,-,-,- -,-,-,-,-,- NA 14068.42 78
#> 3 3 A1,-,-,-,-,-,-,- A*01:01,-,-,-,-,- 0 37.41 73
# Class II SAB example
head(deepMatchR_example[[2]], 3)
#> BeadID SpecAbbr Specificity RawData
#> 1 1 -,-,-,-,-,-,-,- -,-,-,-,-,-,-,-,-,-,-,- 15.51
#> 2 2 -,-,-,-,-,-,-,- -,-,-,-,-,-,-,-,-,-,-,- 13759.52
#> 3 3 DR1,-,-,-,-,-,-,- DRB1*01:01,-,-,-,-,-,-,-,-,-,-,- 131.86
#> NormalValue CountValue
#> 1 NA 128
#> 2 NA 56
#> 3 75.22 117
# PRA example
head(deepMatchR_example[[3]], 3)
#> BeadID SpecAbbr Specificity
#> 98 1
#> 99 2
#> 100 3 A2,-,B46,-,Bw6,-,Cw1,- A*02:01,A*02:07,B*46:01,-,C*01:02,-
#> NormalValue RawData CountValue
#> 98 NA 12.13 272
#> 99 NA 9209.35 277
#> 100 551.9 602.17 152Creating HLA Genotypes
The hlaGeno() function creates standardized genotype
objects used throughout the package.
# Define recipient and donor genotypes
recipient <- data.frame(
A_1 = "A*01:01", A_2 = "A*02:01",
B_1 = "B*07:02", B_2 = "B*08:01",
DQA1_1 = "DQA1*02:01", DQA1_2 = "DQA1*05:05",
DQB1_1 = "DQB1*02:02", DQB1_2 = "DQB1*03:01"
)
donor <- data.frame(
A_1 = "A*03:01", A_2 = "A*24:02",
B_1 = "B*44:02", B_2 = "B*51:01",
DQA1_1 = "DQA1*05:05", DQA1_2 = "DQA1*01:02",
DQB1_1 = "DQB1*06:02", DQB1_2 = "DQB1*03:01"
)
# Create genotype objects
rgeno <- hlaGeno(recipient)
dgeno <- hlaGeno(donor)Sequence-Level Analysis
Retrieving Allele Sequences
The getAlleleSequence() function retrieves protein
sequences from the IMGT/HLA database.
# Get the amino acid sequence for A*01:01
a0101_seq <- getAlleleSequence("A*01:01")
# View first 50 amino acids
substr(a0101_seq, 1, 50)
#> [1] "MAVMAPRTLLLLLSGALALTQTWAGSHSMRYFFTSVSRPGRGEPRFIAVG"Quantifying Amino Acid Mismatches
The quantifyMismatch() function compares two protein
sequences and counts differences. You can optionally filter by
biophysical properties.
Basic Usage
seq1 <- "YFAMYGEKVAHTHVDTLYVRYHY"
seq2 <- "YFDMYGEKVAHTHVDTLYVRFHY"
# Count all mismatches
quantifyMismatch(seq1, seq2)
#> [1] 2Filtering by Biophysical Properties
deepMatchR uses these classification models:
- Charge: K/R/H = positive; D/E = negative; others = neutral
- Polarity: Nonpolar {A,V,L,I,P,M,F,W,G}; Polar {S,T,N,Q,Y,C,H,K,R,D,E}
# Only charge-changing mismatches
quantifyMismatch(seq1,
seq2,
filter_charge = TRUE)
#> [1] 1
# Only polarity-changing mismatches
quantifyMismatch(seq1,
seq2,
filter_polarity = TRUE)
#> [1] 2
# Both charge AND polarity changing
quantifyMismatch(seq1,
seq2,
filter_charge = TRUE,
filter_polarity = TRUE)
#> [1] 1Detailed Position-by-Position Analysis
details <- quantifyMismatch(seq1,
seq2,
return = "detail")
head(details)
#> alignment_position ref alt is_gap_ref is_gap_alt is_mismatch charge_ref
#> 1 1 Y Y FALSE FALSE FALSE neu
#> 2 2 F F FALSE FALSE FALSE neu
#> 3 3 A D FALSE FALSE TRUE neu
#> 4 4 M M FALSE FALSE FALSE neu
#> 5 5 Y Y FALSE FALSE FALSE neu
#> 6 6 G G FALSE FALSE FALSE neu
#> charge_alt charge_change polarity_ref polarity_alt polarity_change counted
#> 1 neu FALSE polar polar FALSE FALSE
#> 2 neu FALSE nonpolar nonpolar FALSE FALSE
#> 3 neg TRUE nonpolar polar TRUE TRUE
#> 4 neu FALSE nonpolar nonpolar FALSE FALSE
#> 5 neu FALSE polar polar FALSE FALSE
#> 6 neu FALSE nonpolar nonpolar FALSE FALSEThe detail output includes:
-
is_mismatch: Whether amino acids differ at this position -
charge_change/polarity_change: Whether that property changed -
counted: Whether this position counts toward the final tally
Function Reference
| Argument | Default | Description |
|---|---|---|
sequence1, sequence2
|
– | Two protein sequences (same length) |
filter_charge |
NULL |
Filter by charge change
(TRUE/FALSE/NULL) |
filter_polarity |
NULL |
Filter by polarity change
(TRUE/FALSE/NULL) |
return |
"count" |
Output type: "count" or "detail"
|
na_action |
"exclude" |
Handle unknown residues: "exclude",
"error", "count"
|
Donor-Recipient Matching
Calculating Mismatch Load
The calculateMismatchLoad() function aggregates amino
acid mismatches across HLA loci between a donor and recipient.
Total Mismatch Load
calculateMismatchLoad(rgeno,
dgeno,
parallel = FALSE)
#> [1] 378Per-Locus Breakdown
per_locus <- calculateMismatchLoad(
rgeno, dgeno,
return = "per_locus",
parallel = FALSE
)
per_locus
#> locus mismatch_load
#> 1 A 96
#> 2 B 128
#> 3 DQA1 71
#> 4 DQB1 83Pairwise Allele Matrix
View mismatches between each recipient-donor allele pair at a specific locus:
mB <- calculateMismatchLoad(
rgeno, dgeno,
return = "pairwise",
pairwise_locus = "B",
parallel = FALSE
)
mB
#> donor
#> recipient B*44:02 B*51:01
#> B*07:02 37 33
#> B*08:01 31 27Applying Filters
# Charge-changing mismatches only
calculateMismatchLoad(rgeno,
dgeno,
filter_charge = TRUE,
parallel = FALSE)
#> [1] 140
# Polarity-changing mismatches only
calculateMismatchLoad(rgeno,
dgeno,
filter_polarity = TRUE,
parallel = FALSE)
#> [1] 138
# Restrict to specific loci
calculateMismatchLoad(rgeno,
dgeno,
loci = "A",
parallel = FALSE)
#> [1] 96Function Reference
| Argument | Default | Description |
|---|---|---|
recipient_geno, donor_geno
|
– | HLA genotypes from hlaGeno()
|
loci |
NULL |
Restrict to specific loci (e.g., c("A","B")) |
filter_charge, filter_polarity
|
NULL |
Biophysical filters |
return |
"total" |
"total", "per_locus", or
"pairwise"
|
pairwise_locus |
NULL |
Locus for pairwise matrix |
parallel |
TRUE |
Use parallel processing |
Eplet Analysis
Eplets are short amino acid sequences that form antibody-binding epitopes on HLA molecules. They are key determinants of HLA immunogenicity.
Quantifying Eplet Mismatches
The quantifyEpletMismatch() function calculates the
number of non-shared eplets between two alleles.
# Basic eplet comparison (A1/A2 evidence level)
quantifyEpletMismatch("A*01:01",
"A*02:01")
#> [1] 13
# Filter by exposition level
quantifyEpletMismatch("B*07:02",
"B*44:02",
exposition_filter = "High")
#> [1] 6
# Filter by reactivity confirmation
quantifyEpletMismatch("C*07:01",
"C*06:02",
evidence_level = NULL,
reactivity_filter = "Confirmed")
#> [1] 9Calculating Eplet Load
The calculateEpletLoad() function calculates total
donor-specific eplets across all loci.
Total Eplet Load
calculateEpletLoad(rgeno,
dgeno)
#> [1] 19Per-Locus Summary
per <- calculateEpletLoad(rgeno,
dgeno,
return = "per_locus")
per
#> locus eplet_load
#> 1 A 4
#> 2 B 6
#> 3 DQA1 1
#> 4 DQB1 8Pairwise Allele Matrix
mB <- calculateEpletLoad(
rgeno, dgeno,
return = "pairwise",
pairwise_locus = "B"
)
mB
#> donor
#> recipient B*44:02 B*51:01
#> B*07:02 5 4
#> B*08:01 4 3Applying Evidence and Exposition Filters
# High-exposition eplets only
calculateEpletLoad(rgeno,
dgeno,
exposition_filter = "High")
#> [1] 15
# Antibody-confirmed eplets only
calculateEpletLoad(rgeno,
dgeno,
reactivity_filter = "Confirmed")
#> [1] 19
# All evidence levels (no filter)
calculateEpletLoad(rgeno,
dgeno,
evidence_level = NULL)
#> [1] 79Peptide Binding Prediction
MHCnuggets Integration
The predictMHCnuggets() function provides an R interface
to the MHCnuggets deep learning model for peptide-MHC binding
prediction.
Note: This section requires Python dependencies and is skipped on Windows due to path length limitations with the TensorFlow/MHCnuggets installation.
# Define peptides and allele
peptides <- c("SIINFEKL", "LLFGYPVYV")
allele <- "A*02:01"
# Predict binding affinity (IC50)
binding_results <- predictMHCnuggets(
peptides = peptides,
allele = allele,
mhc_class = "I",
rank_output = TRUE
)
#> Predicting for 2 peptides
#> Number of peptides skipped/total due to length 0 / 0
#> Building model
#> Closest allele found HLA-A02:01
#> BA_to_HLAp model found, predicting with BA_to_HLAp model...
#> Rank output selected, computing peptide IC50 ranks against human proteome peptides...
#> Writing output files...
print(binding_results)
#> peptide ic50
#> 1 SIINFEKL 5600.06
#> 2 LLFGYPVYV 535.92Calculating Peptide Binding Load
The calculatePeptideBindingLoad() function predicts
transplant risk by analyzing peptide-HLA binding between recipient
molecules and donor-derived peptides.
Workflow:
- Input processing - Standardizes recipient/donor allele formats
- Peptide derivation - Generates mismatched peptides from sequence differences
- Binding prediction - Predicts IC50 using PWM, NetMHCpan, or MHCflurry
- Risk calculation - Aggregates predictions into risk scores
Total Risk Score
total_risk <- calculatePeptideBindingLoad(
recipient = rgeno,
donor = dgeno,
return = "total"
)
print(total_risk)
#> [1] 2366Per-Allele Summary
summary_load <- calculatePeptideBindingLoad(
recipient = rgeno,
donor = dgeno,
return = "summary"
)
print(summary_load)
#> hla_allele n_peptides n_strong n_weak risk_contribution
#> 1 A*01:01 1085 0 652 350.25
#> 2 A*02:01 1085 0 652 350.25
#> 3 B*07:02 1085 0 582 132.25
#> 4 B*08:01 1085 0 582 132.25
#> 5 DQA1*02:01 1085 0 652 350.25
#> 6 DQA1*05:05 1085 0 652 350.25
#> 7 DQB1*02:02 1085 0 652 350.25
#> 8 DQB1*03:01 1085 0 652 350.25Detailed Peptide-Level Results
detailed_load <- calculatePeptideBindingLoad(
recipient = rgeno,
donor = dgeno,
return = "detail"
)
head(detailed_load)
#> peptide hla_allele predicted_ic50 binding_level contribution
#> 1 FDSDAASQR A*01:01 10000 non_binder 0.0
#> 2 DSDAASQRM A*01:01 2500 weak 0.5
#> 3 SDAASQRME A*01:01 10000 non_binder 0.0
#> 4 DAASQRMEP A*01:01 2500 weak 0.5
#> 5 AASQRMEPR A*01:01 2500 weak 0.5
#> 6 ASQRMEPRA A*01:01 2500 weak 0.5Visualizing Antibody Data
Antibody Plots with plotAntibodies()
The plotAntibodies() function creates
publication-quality visualizations of SAB and PRA assay data.
SAB Bar Plot
plotAntibodies(
result_file = deepMatchR_example[[1]],
type = "SAB",
bead_cutoffs = c(2000, 1000, 500, 250),
add_table = TRUE,
palette = "spectral",
highlight_antigen = c("Bw4")
)
PRA Bar Plot
plotAntibodies(
result_file = deepMatchR_example[[3]],
type = "PRA",
class = "I",
add_table = TRUE,
palette = "spectral"
)
Time-Series Trend Plot
Track antibody levels over multiple time points:
# Simulate longitudinal data
set.seed(123)
sab_data_list <- list(
"01/01/2023" = deepMatchR_example[[1]],
"02/01/2023" = deepMatchR_example[[1]] %>%
mutate(NormalValue = NormalValue * runif(n(), 0.5, 0.8)),
"03/01/2023" = deepMatchR_example[[1]] %>%
mutate(NormalValue = NormalValue * runif(n(), 1, 3))
)
plotAntibodies(
result_file = sab_data_list,
type = "SAB",
plot_trend = TRUE,
highlight_threshold = 5000,
vline_dates = c("2023-02-15")
)
Eplet Reactivity Analysis with epletAUC()
The epletAUC() function calculates the Area Under the
Curve (AUC) for eplet reactivity across multiple MFI thresholds. This
provides a more robust measure than a single cutoff.
Visualize Reactivity Curves
epletAUC(
result_file = deepMatchR_example[[1]],
group_by = "evidence",
evidence_level = c("A1", "A2", "B", "D"),
plot_results = TRUE,
cut_min = 250,
cut_max = 10000,
cut_step = 250,
palette = "inferno"
)
Get AUC Values as Data
auc_result <- epletAUC(
result_file = deepMatchR_example[[1]],
plot_results = FALSE,
cut_min = 250,
cut_max = 10000,
cut_step = 250
)
head(auc_result)
#> eplet AUC total_count loci norm_AUC
#> <char> <num> <int> <char> <num>
#> 1: 82LR 7817.708 24 A; B 0.7817708
#> 2: 80I 7022.059 17 A; B 0.7022059
#> 3: 69AA 5641.667 15 B 0.5641667
#> 4: 163EW 6691.667 15 A; B 0.6691667
#> 5: 41T 9583.333 12 B 0.9583333
#> 6: 65QIA 4761.364 11 B 0.4761364Advanced Filtering
epletAUC(
result_file = deepMatchR_example[[1]],
label = FALSE,
plot_results = TRUE,
eplet_filter = 10, # Minimum bead count
percPos_filter = 1.0 # Minimum positivity threshold
)
Eplet Visualization with plotEplets()
The plotEplets() function offers three visualization
types for eplet reactivity data.
Treemap View
Ideal for an overview of eplet prominence. Tile area represents importance (positive beads x positivity rate):
plotEplets(
result_file = deepMatchR_example[[1]],
plot_type = "treemap",
cutoff = 2000,
evidence_level = c("A1", "A2", "B"),
percPos_filter = 0.4,
palette = "spectral"
)
Bar Plot
Ranks eplets by positivity rate at a specific threshold:
plotEplets(
result_file = deepMatchR_example[[1]],
plot_type = "bar",
group_by = "loci",
cutoff = 2000,
evidence_level = c("A1", "A2", "B"),
percPos_filter = 0.4,
top_eplets = 20,
palette = "spectral"
)
AUC Bar Plot
Ranks eplets by performance across multiple thresholds:
plotEplets(
result_file = deepMatchR_example[[1]],
plot_type = "AUC",
percPos_filter = 0.4,
group_by = "evidence_level",
cut_min = 250,
cut_max = 10000,
cut_step = 250,
top_eplets = 20,
palette = "spectral"
)
Session Information
sessionInfo()
#> R version 4.5.2 (2025-10-31)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.3 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
#> [4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
#> [7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
#> [10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] ggplot2_4.0.1 dplyr_1.1.4 deepMatchR_0.99.0 BiocStyle_2.38.0
#>
#> loaded via a namespace (and not attached):
#> [1] ggfittext_0.10.3 gtable_0.3.6 dir.expiry_1.18.0
#> [4] xfun_0.56 bslib_0.10.0 lattice_0.22-7
#> [7] quadprog_1.5-8 vctrs_0.7.1 tools_4.5.2
#> [10] generics_0.1.4 stats4_4.5.2 parallel_4.5.2
#> [13] tibble_3.3.1 pkgconfig_2.0.3 Matrix_1.7-4
#> [16] data.table_1.18.0 RColorBrewer_1.1-3 S7_0.2.1
#> [19] desc_1.4.3 S4Vectors_0.48.0 readxl_1.4.5
#> [22] lifecycle_1.0.5 compiler_4.5.2 farver_2.1.2
#> [25] textshaping_1.0.4 Biostrings_2.78.0 Seqinfo_1.0.0
#> [28] htmltools_0.5.9 sass_0.4.10 yaml_2.3.12
#> [31] pillar_1.11.1 pkgdown_2.2.0 crayon_1.5.3
#> [34] jquerylib_0.1.4 cachem_1.1.0 basilisk_1.22.0
#> [37] tidyselect_1.2.1 rvest_1.0.5 digest_0.6.39
#> [40] stringi_1.8.7 bookdown_0.46 labeling_0.4.3
#> [43] fastmap_1.2.0 grid_4.5.2 treemapify_2.6.0
#> [46] cli_3.6.5 magrittr_2.0.4 patchwork_1.3.2
#> [49] withr_3.0.2 filelock_1.0.3 scales_1.4.0
#> [52] rmarkdown_2.30 pwalign_1.6.0 XVector_0.50.0
#> [55] httr_1.4.7 reticulate_1.44.1 cellranger_1.1.0
#> [58] ragg_1.5.0 png_0.1-8 memoise_2.0.1
#> [61] evaluate_1.0.5 knitr_1.51 IRanges_2.44.0
#> [64] immReferent_0.99.6 rlang_1.1.7 Rcpp_1.1.1
#> [67] glue_1.8.0 BiocManager_1.30.27 xml2_1.5.2
#> [70] directlabels_2025.6.24 BiocGenerics_0.56.0 svglite_2.2.2
#> [73] jsonlite_2.0.0 R6_2.6.1 systemfonts_1.3.1
#> [76] fs_1.6.6Getting Help
If you have questions, suggestions, or encounter issues:
- GitHub Issues: github.com/BorchLab/deepMatchR/issues