[{"content":"I aim to be transparent about financial relationships that could be seen as conflicts of interest. I currently disclose the following.\nPrevious employment. Santa Ana Bio and Omniscope. I hold no equity in either. Scientific advisor, Epana Bio. I hold equity. Consultant, Columbus Instruments. I receive consulting fees and royalties from licensed software. I hold no equity. Licensed software. ClamBake, my tool for metabolic estimation from CLAM cages, is licensed to Columbus Instruments. These relationships are separate from my academic work at Washington University and are disclosed under institutional policy.\nLast updated June 2026.\n","date":"22 June 2026","externalUrl":null,"permalink":"/disclosures/","section":"Nick Borcherding","summary":"","title":"Disclosures","type":"page"},{"content":"","date":"15 June 2026","externalUrl":null,"permalink":"/tags/artificial-immune-systems/","section":"Tags","summary":"","title":"Artificial Immune Systems","type":"tags"},{"content":"","date":"15 June 2026","externalUrl":null,"permalink":"/tags/immunology/","section":"Tags","summary":"","title":"Immunology","type":"tags"},{"content":"","date":"15 June 2026","externalUrl":null,"permalink":"/tags/machine-learning/","section":"Tags","summary":"","title":"Machine Learning","type":"tags"},{"content":" A germinal center is a search loop. B cells enter, mutate their receptors, and compete for limited help from follicular helper T cells. Winners cycle back and mutate again. Losers die. Over a week or two, average binding affinity climbs by orders of magnitude. Strip away the anatomy and the cytokines, and what remains is a population of candidate solutions, a fitness function, a mutation operator, and a selection rule. That is an optimizer.\nThe last post argued that Artificial Immune Systems left most of modern immunology on the table. This post takes one of those mechanisms, somatic hypermutation, and shows what happens when you import it for real. In bHIVE, somatic hypermutation is not one trick. It is five, and each one is a search algorithm you already know.\nThe operator classical AIS forgot to vary # Clonal selection algorithms have always mutated their antibodies. The classic recipe is one line. Take a candidate, add Gaussian noise scaled inversely to its affinity, and keep the result if it binds better. Good binders barely change. Bad binders scatter widely. It works, and it works without a gradient.\nThe problem is that this is the only operator most implementations ever use. One mutation rule, applied to every problem, at every stage of maturation. Real somatic hypermutation is not like that. The enzyme that drives it, activation-induced cytidine deaminase (AID), does not strike uniformly. It concentrates on hotspot motifs. The mutation load a cell carries scales with its state. The whole process runs in cycles, and later cycles behave differently from early ones. The biology has a structure that just adding Gaussian noise throws away.\nThat structure is the interesting part. Because once you write down the different ways a real cell can mutate, each one turns out to match a different optimizer.\nFive strategies, one ladder # bHIVE exposes mutation through an SHMEngine with five methods. Lined up from simplest to most sophisticated, they form a ladder. The bottom rung is unstructured noise. The top rung is Adam.\nTwo questions decide every one of these strategies. How big should the step be? And in which direction? The immune system answers both, and the answers are not arbitrary. They track the same two quantities every gradient method tracks. Distance from the goal sets the step size. The shape of the error sets the direction.\nAffinity sets the step size # Start with step size. An antibody that already binds well should not throw itself across the search space. An antibody that binds poorly has nothing to lose by exploring. Affinity, in other words, behaves like a learning-rate schedule. High affinity means small steps. Low affinity means large ones. Two of the five strategies encode this, and they do it in two different ways.\nThe airs strategy, named for the Artificial Immune Recognition System of Watkins and Timmis, scales the noise variance directly. The mutation rate is\nr = c · e−a/T\nwhere a is affinity, T is a temperature, and c is a scale. A poorly matched antibody (a → 0) mutates at the full rate c. A well matched one (a → 1) settles toward c · e−1/T. The temperature controls how sharply the rate falls off. This is a fitness-dependent learning rate, the same idea as decaying your step size as the loss improves, written in the vocabulary of binding.\nThe energy strategy reinforces the same intuition with a hard ceiling instead of a soft variance. It gives each antibody a mutational budget\nE = E₀ (1 − a)²\nand draws a step whose length cannot exceed √E = √E₀ (1 − a). The direction is random and the magnitude is uniform up to that cap. This is a trust region. The cell is allowed to move, but only within a ball whose radius shrinks as it matures. The quadratic form is deliberate. It echoes models in which the metabolic cost of hypermutation grows with the square of the mutation load, so a high-affinity cell that has already paid for many mutations can afford very few more.\nThe difference between the two is the difference between a soft and a hard constraint. AIRS lets a low-affinity antibody, on rare draws, take a small step or a large one, because variance only sets the spread. Energy guarantees the step stays inside the ball. If you care about controlling worst-case moves, the trust region is the safer object. If you want smooth annealing, the variance schedule is cleaner. Same biology, two engineering choices.\nThe gradient points the way # Step size is only half the answer. The other half is direction, and this is where the richest biology lives. Real hypermutation is not isotropic. AID does not mutate every position with equal probability. It targets WRCY hotspot motifs, concentrating change on a fraction of the sequence. The cell mutates where mutation is most likely to matter.\nbHIVE\u0026rsquo;s hotspot strategy ports this idea directly. It computes the gradient\ng = x − a\nthe vector from the antibody a toward the data point x it is trying to recognize. Features where the antibody is far from the target have large |gᵢ|. The strategy turns those magnitudes into per-feature mutation rates, so the rate on feature i rises with |gᵢ|. Coordinates that are already correct barely move. Coordinates that are wrong get hammered. Computationally, this is feature-weighted, coordinate-wise mutation. Biologically, it is AID on a hotspot. The match is exact in spirit. Spend your mutational budget where the error is.\nThe adaptive strategy goes one step further and remembers. Instead of reacting to the current gradient alone, it keeps two running averages across maturation rounds, a first moment for direction and a second moment for per-feature scale:\nm₁ ← β₁ m₁ + (1 − β₁) g m₂ ← β₂ m₂ + (1 − β₂) g²\nthen bias-corrects them and takes the step\nΔ = lr · m̂₁ / (√m̂₂ + ε), lr = (1 − a) · base_rate\nThis is not like Adam. It is Adam, the optimizer Kingma and Ba published in 2015, with affinity supplying the learning rate. The first moment smooths the direction so a single noisy round does not throw the cell off course. The second moment rescales each feature by how variable its gradient has been, so stable directions take confident steps and noisy ones take cautious ones. bHIVE threads these moment matrices through every iteration of clonal selection, one per antibody, so each cell carries its own optimizer state.\nThere is a real subtlety worth naming. The hotspot and adaptive strategies use the data point as a target, which means they peek at the answer. AID does not. The enzyme has no idea which mutations will improve binding. It biases where it mutates based on sequence context, not on whether the result will bind better. So bHIVE\u0026rsquo;s gradient-informed strategies are a stronger claim than the biology supports. They are what hypermutation would look like if a cell could see its own fitness landscape. That is a useful fiction for an optimizer and an overreach as a model of AID. Keeping the distinction honest matters more than the metaphor.\nSelection is the other half of the loop # Every strategy above proposes a mutated antibody. None of them decides whether to keep it. That job belongs to selection, and bHIVE\u0026rsquo;s rule is strict. A mutation survives only if it raises affinity. In code it is a single comparison. If the mutant binds better than the parent, replace the parent. Otherwise discard.\nThis makes the whole engine a greedy hill-climber. Mutation proposes, selection accepts only improvements. It is the immune system\u0026rsquo;s version of the germinal center light zone, where B cells that fail to capture enough antigen or win enough T cell help simply die. The strictness is the point. Affinity maturation works because the bar to survive keeps rising. The same strictness is also the engine\u0026rsquo;s main limitation. Greedy acceptance cannot climb out of a local optimum on its own. The real germinal center hedges against this with permissive early selection and with clonal diversity that keeps many lineages alive at once. A future strategy could borrow that too, accepting the occasional neutral or slightly worse mutation the way simulated annealing does. The room to grow is right there in the biology.\nWhy expose five instead of one # The argument from the last post was that AIS should import more immunology. This is what importing looks like in practice. Not a single richer mutation rule, but a menu, because the immune system itself does not commit to one. Different problems want different search. A noisy, high-dimensional landscape wants the per-feature caution of adaptive. A problem where you trust the geometry wants the directed aggression of hotspot. A setting where worst-case moves are dangerous wants the trust region of energy. The classical Gaussian still has its place as a baseline, and uniform keeps it.\nExposing the choice is the honest move. When you pick a strategy in bHIVE, you are choosing an optimizer, and you can say which one and why. That traceability is the same property the last post argued AIS gives you for free. A firing detector has a lineage. Now its mutations have a named search algorithm behind them too. The glass box extends all the way down to how each antibody learned to bind.\nSomatic hypermutation spent millions of years discovering that good search means adapting your step to your progress and your direction to your error. We rediscovered the same thing and called it Adam. bHIVE just writes both down in the same place, and lets you pick which version of the immune system you want to run.\nThese strategies live in the SHMEngine of bHIVE, an open-source R package bringing AIS methods into modern immunology.\n","date":"15 June 2026","externalUrl":null,"permalink":"/posts/shm-optimizer/","section":"Posts","summary":"From random noise to Adam. The immune system’s mutation engine maps onto the search algorithms we already use.","title":"Somatic Hypermutation Is an Optimizer","type":"posts"},{"content":"","date":"15 June 2026","externalUrl":null,"permalink":"/tags/","section":"Tags","summary":"","title":"Tags","type":"tags"},{"content":"On revisiting Artificial Immune Systems and the case for borrowing more biology, not less.\nLong before backpropagation and transformers, an adaptive learning system was already running inside every vertebrate on the planet. It identifies threats it has never seen. It does this without labeled data. It tells self from non-self in a feature space larger than any image dataset we have ever built. It also remembers. The immune system is the original adaptive learning machine. For a brief window in the late 1990s and early 2000s, it inspired its own corner of computer science: Artificial Immune Systems, or AIS.\nFor a moment, AIS looked like a real contender alongside neural networks and genetic algorithms. Negative selection algorithms offered principled anomaly detection. Clonal selection algorithms gave a clean evolutionary metaphor for optimization. Idiotypic networks hinted at self-organizing memory. Then deep learning ate the world. AIS receded into specialty journals. A generation of ML researchers grew up barely aware the field existed.\nThat was a mistake. The way back is not to defend the AIS canon as it stood in 2003. The way back is to modernize it. The immune system we model in AIS is the immune system as we understood it thirty years ago. Our fundamental understanding of immunology has moved on. Our algorithms should too.\nWhat AIS got right # The intellectual core of AIS has aged well. Strip away the implementation details, and three durable ideas remain.\nThe first is clonal selection. When a candidate solution (an \u0026ldquo;antibody\u0026rdquo;) matches a target (an \u0026ldquo;antigen\u0026rdquo;) well, you copy it and mutate the copies. The cloning rate scales with affinity. The mutation rate scales inversely with affinity. Promising regions of the search space get exploited. Mediocre ones get explored. It works without a gradient.\nThe second is negative selection. To learn what is anomalous, you generate detectors at random and discard any that match a corpus of \u0026ldquo;self.\u0026rdquo; What survives is, by construction, sensitive to non-self. The model never sees a positive example of an attack. It can still flag one. For domains where anomalies are rare or adversarial, that is exactly the right inductive bias.\nThe third is the idiotypic network. Niels Jerne argued that antibodies recognize each other, not just antigens. The repertoire becomes a self-referential graph that can stabilize, oscillate, and remember without external input. Translated to ML, this is regularization by topology. Representations are constrained by their relationships to other representations, not just by labels.\nThese ideas are good. They are also half a century old.\nWhere AIS got stuck # The honest critique runs roughly like this.\nThe repertoires are uninformatively random. Most AIS implementations sample detectors from a uniform feature space. Real B and T cell repertoires are nothing like this. The body builds them from a gene library of V, D, and J segments, recombining and editing under a strongly non-uniform prior. Hundreds of millions of years of evolution shape that prior. Randomness is one ingredient. It is not the whole recipe.\nActivation is a single threshold. In an AIS detector, a match either crosses an affinity cutoff or it doesn\u0026rsquo;t. Real lymphocytes need two (or more) signals: antigen recognition plus a costimulatory signal from another cell. Without the second signal, recognition produces tolerance, not response. Sometimes it produces cell death. That distinction is the heart of how the immune system avoids attacking its own host. We rarely model it.\nThere is no notion of context, only content. The immune system does not respond to antigens. It responds to antigens in context. Antigen-presenting cells (APCs) work as filters that suppress molecular noise. They also work as lenses that focus lymphocyte attention on what matters. A serum protein floating in plasma is harmless. The same protein on a stressed dendritic cell next to inflammatory cytokines is a threat. AIS algorithms have, almost universally, no analog of an APC (although dendritic cell-based networks have been tried).\nOverfitting controls are weak. AIS prunes detectors by lifespan or hit count. The real immune system is far stricter. Lymphocytes that bind self too strongly die in the thymus. B cells whose mutated descendants perform worse than their parents lose the competition for follicular helper T cells. Our algorithms do almost none of this.\nAnomaly detection is benchmarked, not understood. Most evaluations of negative selection treat the problem as standard classification. How often did the detector flag a non-self example? Real immune anomaly detection is dynamic, adversarial, and context-sensitive. Holding the test set constant is the wrong abstraction.\nThese limitations are not fatal. They are an invitation.\nThe immunology we have not yet imported # The case is simple. The most interesting work in AIS for the next decade will not come from clever new optimizers. It will come from importing the parts of immunology we left on the table.\nThe deeper argument is one of functional isomorphism. Each mechanism below maps cleanly onto a computational primitive that modern ML either uses awkwardly or lacks entirely. Two-signal activation is conjunctive gating with asymmetric tolerance. Germinal center selection is population-based training with niche preservation. The idiotypic network is a graph neural network with signed edges. Anergy is a negative update on unconfirmed activations. This mapping is not metaphorical hand-waving. It is a tight correspondence between mechanisms evolution has tuned and primitives we are still figuring out. The reason to import this biology is not nostalgia. The immune system has, by trial over evolutionary time, solved problems we are still actively solving.\nTwo-signal activation as a costimulation prior # The two-signal model of lymphocyte activation is one of the most useful concepts in modern immunology. Signal one is antigen recognition. Signal two is a confirming cue, either costimulation from a helper T cell or a danger-associated molecular pattern from a stressed neighbor. Signal one alone is not enough. By itself, it produces tolerance. Signal two alone gets ignored.\nThis maps onto modern ML. A recognition module proposes. A separate discriminator confirms. Only the conjunction triggers an update. We already do something like this in adversarial training and in mixture-of-experts gating. We rarely do it with the asymmetry the immune system enforces. Recognition without confirmation should teach the system to ignore that pattern in the future. Unconfirmed activation is not neutral. It anergizes the detector. A real regularization story sits buried in there that classical AIS never tells.\nDanger theory and the death of self/non-self # Polly Matzinger\u0026rsquo;s danger theory reframed the immune system\u0026rsquo;s central question. The system is not primarily asking \u0026ldquo;is this self or non-self?\u0026rdquo; That question is poorly posed for a body whose cells turn over constantly and whose gut is full of friendly bacteria. The system is asking \u0026ldquo;is this dangerous?\u0026rdquo; Injured cells, not pathogens, release tissue distress signals that push APCs into a state where they can deliver signal two.\nFor AIS, danger theory suggests something concrete. Replace static \u0026ldquo;self\u0026rdquo; sets with dynamic signals from the environment. An anomaly is not a point that fails a self-membership test. It is a point that arrives in the same neighborhood as a distress signal. This generalizes negative selection without abandoning it. It also handles the case where the underlying distribution drifts.\nGerminal center dynamics # The germinal center is where modern affinity maturation happens. It is a beautiful piece of computational machinery. B cells enter, mutate their receptors, and compete for limited help from follicular T cells. The competition is not over a single objective. It is over multiple kinds of help, on a cycle, in a structured anatomical niche. Cells that lose the competition die. Winners go on to mutate further or differentiate into memory or plasma cells.\nThis is much richer than \u0026ldquo;mutate proportionally to inverse affinity\u0026rdquo; of older AIS algorithms. It is iterative, competitive, niche-structured, and decision-bound. It looks more like population-based training with ranked selection and multi-objective fitness than hill-climbing. The process echoes stochastic gradient descent, with mutation rate as the step size and affinity as the loss function. AIS implementations of clonal selection have barely scratched what germinal-center-style dynamics offer. Cycling between exploration and consolidation. Niche-based diversity preservation. A graduation step that turns short-lived effectors into long-lived memory.\nGene libraries instead of random init # This one matters more than it sounds. Real receptor repertoires are combinatorial, not uniform. A human can express on the order of $10^{13}$ distinct antibodies. Estimates of T cell diversity range from $10^{14}$ to $10^{20}$. Recombination and editing of a few hundred gene segments produces all of them. The space is structured. The prior is non-uniform. That structure carries evolutionary information about which shapes have historically been worth recognizing. In computational terms, shuffling gene segments is latent space sampling.\nFor AIS, this argues for replacing random repertoire initialization with learned or curated gene libraries. A finite set of building blocks, recombined to produce candidate detectors. Foundation models give us exactly this. Pretrained representations as the starting prior. It is one of the clearest places where the new ML toolkit and the actual immunology agree.\nIdiotypic networks as a GNN inspiration # A confession is in order here. Niels Jerne\u0026rsquo;s idiotypic network theory holds that antibodies form a self-referential graph in which they recognize each other\u0026rsquo;s variable regions. In immunology, the theory has not aged well. Forty years of follow-up produced little experimental support for the network as a primary regulator of immune behavior. Modern textbooks rarely invoke it. The field quietly retired it.\nThat does not mean we should discard the idea. Stripped of its claim to literal biological truth, the idiotypic network remains one of the clearest blueprints we have for self-organizing regularization in a population of learners. Each node is a detector. Edges carry both activating and suppressing signals. The system stabilizes not because any external loss tells it to, but because the topology forces a dynamic equilibrium. Perturb one node and the perturbation propagates along signed edges, gets damped by suppression, and the network settles into a new fixed point.\nThis is the substrate of a graph neural network with signed edges. The function it computes is unsupervised regularization that maintains diverse, stable, mutually-compatible representations. That is exactly what we want for repertoires that have to cover open-world distributions without supervision. Detectors evaluate each other, not just data. Neighbors suppress redundant representations before they bloat the population. Idiotypic networks may not run the immune system, but they remain the right metaphor for how a population of unsupervised detectors can regulate itself.\nLifecycle states beyond \u0026ldquo;active\u0026rdquo; # A real lymphocyte moves through states. Immature. Mature. Anergic. Effector. Memory. Exhausted. Annihilated. Each state has different rules for activation, mutation, and death. Most AIS detectors live in two states (active or deleted) or three (immature, mature, memory). That is a lot of biology thrown away. Memory in particular deserves a richer treatment than a counter that decrements over time. The ability to retain a long-lived, low-frequency representation that can be reactivated quickly is one of the most interesting properties of the immune system.\nA map of the isomorphisms # Pulled together, the picture looks like this. Each immune mechanism on the left names a computational primitive in the middle, with the concrete capability we get on the right.\nWhat we get out of doing this # Why bother? Deep learning works. Foundation models work. Why import this baroque biology at all?\nInterpretability by construction # A modernized AIS is a distributed sensor network of named, lineaged, individually meaningful detectors. Every firing detector has an identity. It came from a specific recombination event. It survived a specific selection process. It recognizes a specific pattern. When the system makes a call, you can ask which detectors contributed and why. You get a real answer with traceable provenance.\nDeep neural networks do not have this property. Their representations are entangled across layers, distributed across millions of weights, and only legible after extensive post-hoc analysis. Mechanistic interpretability is a heroic effort to recover what an immune-style architecture would expose for free. Saliency maps approximate which inputs mattered. Linear probes guess at what hidden layers encode. Sparse autoencoders try to extract interpretable features from activations after the fact. AIS gives you a glass box where the population is the explanation.\nFor some application domains, that distinction is not aesthetic. It is a hard requirement. Medical decision support, fraud detection, regulatory monitoring, and scientific discovery all benefit from architectures whose components engineers can inspect, audit, and reason about. Any setting where a model\u0026rsquo;s output triggers consequential action wants this. A modernized AIS is not only competitive on accuracy. It is structurally honest about how it got there.\nThe empirical case # The immune system is empirically good at the problems classical ML struggles with. Open-world recognition. Sample-efficient learning of novel patterns. Graceful handling of distribution shift. Adversarial robustness against inputs designed to fool it. These are exactly the failure modes of current systems. A real biological system has solved them, even imperfectly. Its design is worth studying.\nMore broadly, AIS gives us a vocabulary for thinking about populations of models rather than single ones. Ensembles, mixtures of experts, and agent collectives are growing more important. The immune system is one of the few well-studied biological examples of a population of specialists that coordinate without a central controller. Clonal competition, niche-based diversity, and idiotypic regulation transfer directly.\nThe biology is finally legible # Single-cell sequencing has finally made the real immune system computationally legible. We can now profile gene expression and receptor sequences at single-cell resolution, longitudinally, with antigen specificity. The magnitude of these data sets are often only rivaled by the heterogeneity found across the immune cells. For the first time in the history of AIS, we can fit our algorithms to the actual dynamics of the immune system rather than a stylized 1990s sketch of it. Biology is no longer the bottleneck. Our willingness to model it is.\nThe argument, in one paragraph # Artificial Immune Systems are not a failed paradigm. They are an unfinished one. The classical canon captured the easy parts of immunology. Affinity. Mutation proportional to fitness. Negative selection. It left the hard parts on the table. Two-signal activation, danger theory, germinal center competition, gene-library priors, idiotypic regularization, and the full richness of the lymphocyte life cycle are all there, well-characterized and ready to be ported into algorithms. The immune system is the original adaptive learner. If we want algorithms that learn the way it does, we should stop modeling it the way it was understood before most of its interesting machinery was discovered.\nIt is time to update the metaphor.\nSome of these ideas are being explored in bHIVE, an open-source R project bringing AIS methods into modern immunology.\n","date":"30 April 2026","externalUrl":null,"permalink":"/posts/updating-the-ais-metaphor/","section":"Posts","summary":"Revisiting Artificial Immune Systems through the lens of modern immunology and what we get out of doing it.","title":"The Immune System Was the Original Learning Machine. Why Don't Our Algorithms Act Like It?","type":"posts"},{"content":"","date":"4 August 2023","externalUrl":null,"permalink":"/tags/screpertoire/","section":"Tags","summary":"","title":"ScRepertoire","type":"tags"},{"content":"","date":"4 August 2023","externalUrl":null,"permalink":"/tags/single-cell/","section":"Tags","summary":"","title":"Single-Cell","type":"tags"},{"content":"","date":"4 August 2023","externalUrl":null,"permalink":"/tags/tcr/","section":"Tags","summary":"","title":"TCR","type":"tags"},{"content":"I happened upon this preprint the other day examining latent cell fate information within the TCR sequences. Naturally, my first thought was to apply the approach from Lagattuta et el to the single-cell objects I have. Here is an example of how to easily implement the TCRpheno. Please check out the preprint and code repository - there are some really interesting findings on memory formation.\nLoading Libraries and Functions # We will first need to load scRepertoire and tcrpheno and make a function to organize the TCR sequences into a compatible format for tcrpheno.\nThe tcrpheno models can be installed using:\nremotes::install_github(\u0026#34;kalaga27/tcrpheno\u0026#34;) We can load the tcrpheno and the rest of the packages/function we need with:\nlibrary(scRepertoire) library(tcrpheno) library(Seurat) library(stringr) library(viridis) library(scCustomize) library(patchwork) convert.contigs \u0026lt;- function(data) { #extracting TCR chain info from single-cell object meta data if (inherits(x=data, what =\u0026#34;Seurat\u0026#34;) | inherits(x=data, what =\u0026#34;SummarizedExperiment\u0026#34;)) { if (inherits(x=data, what =\u0026#34;Seurat\u0026#34;)) { dat \u0026lt;- data[[]] } else if (inherits(x=data, what =\u0026#34;SummarizedExperiment\u0026#34;)){ dat \u0026lt;- data.frame(colData(data)) rownames(dat) \u0026lt;- data@colData@rownames } dat$cdr3_aa1 \u0026lt;- str_split(dat$CTaa, \u0026#34;_\u0026#34;, simplify = TRUE)[,1] dat$cdr3_aa2 \u0026lt;- str_split(dat$CTaa, \u0026#34;_\u0026#34;, simplify = TRUE)[,2] dat$cdr3_nt1 \u0026lt;- str_split(dat$CTnt, \u0026#34;_\u0026#34;, simplify = TRUE)[,1] dat$cdr3_nt2 \u0026lt;- str_split(dat$CTnt, \u0026#34;_\u0026#34;, simplify = TRUE)[,2] dat$TCR1 \u0026lt;- str_split(dat$CTgene, \u0026#34;_\u0026#34;, simplify = TRUE)[,1] dat$TCR2 \u0026lt;- str_split(dat$CTgene, \u0026#34;_\u0026#34;, simplify = TRUE)[,2] dat \u0026lt;- list(dat) } else { dat \u0026lt;- data dat \u0026lt;- if(is(dat)[1] != \u0026#34;list\u0026#34;) list(dat) else df } #Reorganizing the data frame for tcrpeheno contigs \u0026lt;- lapply(dat, function(x) { cell \u0026lt;- x[,\u0026#34;barcode\u0026#34;] TCRA_cdr3aa \u0026lt;- x[,\u0026#34;cdr3_aa1\u0026#34;] TCRA_vgene \u0026lt;- str_split(x[,\u0026#34;TCR1\u0026#34;], \u0026#34;[.]\u0026#34;, simplify = T)[,1] TCRA_jgene \u0026lt;- str_split(x[,\u0026#34;TCR1\u0026#34;], \u0026#34;[.]\u0026#34;, simplify = T)[,2] TCRA_cdr3nt \u0026lt;- x[,\u0026#34;cdr3_nt1\u0026#34;] TCRB_cdr3aa \u0026lt;- x[,\u0026#34;cdr3_aa2\u0026#34;] TCRB_vgene \u0026lt;- str_split(x[,\u0026#34;TCR2\u0026#34;], \u0026#34;[.]\u0026#34;, simplify = T)[,1] TCRB_jgene \u0026lt;- str_split(x[,\u0026#34;TCR2\u0026#34;], \u0026#34;[.]\u0026#34;, simplify = T)[,2] TCRB_cdr3nt \u0026lt;- x[,\u0026#34;cdr3_nt2\u0026#34;] tmp \u0026lt;- cbind.data.frame(cell, TCRA_cdr3aa, TCRA_vgene, TCRA_jgene, TCRA_cdr3nt, TCRB_cdr3aa, TCRB_vgene, TCRB_jgene, TCRB_cdr3nt) tmp[tmp == \u0026#34;\u0026#34;] \u0026lt;- NA tmp }) contigs \u0026lt;- bind_rows(contigs) return(contigs) } Loading the Contigs and Seurat Object # We will use the data set that is built into scRepertoire - 3 ccRCC patients with paired tumor and peripheral blood.\n#Grab the Seurat Object seurat \u0026lt;- Seurat::UpdateSeuratObject(get(load(\u0026#34;~/seurat2.rda\u0026#34;))) #Can directly download the seurat object using: #seurat \u0026lt;- readRDS(url(\u0026#34;https://drive.google.com/uc?export=download\u0026amp;id=1wqakP2JQz9B62ofMfjWD0MB2SyPPoDE-\u0026amp;confirm=t\u0026amp;uuid=d4b1a2bc-465b-4c41-8258-5d4b100f1cbb\u0026amp;at=ANzk5s7lfBxMcg-RPpDFo6zykmXv:1682179250290\u0026#34;)) #Get contigs contig_list \u0026lt;- scRepertoire::contig_list Processing contigs and combining with Single-cell object # Attaching TCRs to single-cell experiments is a 2 step process with scRepertoire - 1) combineTCR() organizes the contigs by barcode. When calling this function, we will also change the default parameters of filterMulti to TRUE and removeNA to TRUE. This will return barcodes with clonotypes assigned by the highest expressing chain and remove any barcodes that are missing 1 or more chains. We can then look at the distribution and level of expansion along our UMAP.\ncombined.TCRs \u0026lt;- combineTCR(contig_list, samples = rep(c(\u0026#34;PX\u0026#34;, \u0026#34;PY\u0026#34;, \u0026#34;PZ\u0026#34;), each=2), ID = rep(c(\u0026#34;P\u0026#34;, \u0026#34;T\u0026#34;), 3), filterMulti = TRUE, removeNA = TRUE) seurat \u0026lt;- combineExpression(combined.TCRs, seurat) DimPlot(seurat, group.by = \u0026#34;cloneType\u0026#34;) + scale_color_viridis(discrete = TRUE, direction = -1) Running tcrpheno # I ran into an issue here about the way beta chains are converted - for now we can apply tcrpheno on just the alpha chains.\nextracted.TCRs \u0026lt;- convert.contigs(seurat) extracted.TCRs \u0026lt;- na.omit(extracted.TCRs) tcrpheno.results \u0026lt;- score_tcrs(extracted.TCRs, \u0026#34;a\u0026#34;) seurat \u0026lt;- AddMetaData(seurat, tcrpheno.results) FeaturePlot(seurat, c(\u0026#34;TCRalpha.CD8\u0026#34;, \u0026#34;TCRalpha.reg\u0026#34;), cols = viridis_pal()(10), order = TRUE) Comparing with Gene Expression # Here you can see the TCRalpha.CD8 model appears to preferentially identify CD8A-positive cells in the upper dense cluster.\nFeaturePlot(seurat, c(\u0026#34;CD8A\u0026#34;, \u0026#34;FOXP3\u0026#34;), cols = viridis_pal()(10), order = TRUE) We can look at the overlay of both CD8A gene expression and the predicted CD8 score based on the alpha chain analysis using scCustomize function Plot_Density_Joint_Only(). What is really interesting to me is the CD8+ portion of C3 that tcrpheno identifies with the CD8 alpha model.\ncells \u0026lt;- rownames(seurat[[]])[!is.na(seurat@meta.data$TCRalpha.CD8)] seurat.subset \u0026lt;- subset(seurat, cells = cells) plot1 \u0026lt;- Plot_Density_Joint_Only(seurat_object = seurat.subset, features = c(\u0026#34;CD8A\u0026#34;, \u0026#34;TCRalpha.CD8\u0026#34;), viridis_palette = \u0026#34;viridis\u0026#34;) plot2 \u0026lt;- DimPlot(seurat.subset) plot1 + plot2 ","date":"4 August 2023","externalUrl":null,"permalink":"/posts/tcrpheno/","section":"Posts","summary":"Applying TCRpheno to score latent cell-fate information from TCR sequences within single-cell objects using scRepertoire.","title":"TCRpheno","type":"posts"},{"content":"Using R does not have to be for work alone. There are a number of individuals using R for generative art - Thomas Lin Pederson is probably the best example I have seen. Here is a small contribution to enjoying data science for no other reason than making something interesting.\nGetting Target Sequence # We will pull the coding sequence for the human BRCA1 gene using the ensembl/BiomaRt pipeline.\nlibrary(biomaRt) mart \u0026lt;- useMart(\u0026#34;ensembl\u0026#34;, dataset=\u0026#34;hsapiens_gene_ensembl\u0026#34;) seq = getSequence(id = \u0026#34;BRCA1\u0026#34;, type = \u0026#34;hgnc_symbol\u0026#34;, seqType = \u0026#34;coding\u0026#34;, mart = mart) seq1 \u0026lt;- seq$coding[1] # First Sequence num.char \u0026lt;- nchar(seq1) seq1 \u0026lt;- strsplit(seq1, \u0026#34;\u0026#34;)[[1]] Translating the Sequence to Binary # Using the DNA sequence, we can translate the nucleotides to 2 bits of data. For sanity I am doing it in the alphabetical order, but any order for the binary translation would work. In the end, the DNA sequence will be a series of 0s and 1s that we will plot.\n# Binary Cipher A = \u0026#34;00\u0026#34; C = \u0026#34;01\u0026#34; G = \u0026#34;10\u0026#34; T = \u0026#34;11\u0026#34; translator \u0026lt;- list(\u0026#34;A\u0026#34;=A,\u0026#34;C\u0026#34; = C,\u0026#34;G\u0026#34; = G,\u0026#34;T\u0026#34; = T) for (i in seq_len(num.char)) { tmp.bin \u0026lt;- unlist(translator[seq1[i]]) if(i == 1) { bin.sequence \u0026lt;- tmp.bin } else { bin.sequence \u0026lt;- c(bin.sequence, tmp.bin) } } Defining Plot Coordinates # In order to plot the binarized gene sequence into a square, we need to define x and y coordinates along the sequence.\ndivisors \u0026lt;- function(x){ # Vector of numbers to test against y \u0026lt;- seq_len(x) # Modulo division. If remainder is 0 that number is a divisor of x so return it y[ x%%y == 0 ] } ###################### # Plotting data frame ###################### set.seed(42) #For Reproducibility x \u0026lt;- strsplit(paste(bin.sequence, collapse = \u0026#34;\u0026#34;), \u0026#34;\u0026#34;)[[1]] position \u0026lt;- seq(1,length(x)) #Specific Nucleotide #Added Texture by varying stroke and size of dots stroke \u0026lt;- sample(seq(0.05,5,0.05), length(x), replace = TRUE) size = sample(seq(0.05,1,0.05), length(x), replace = TRUE) df \u0026lt;- data.frame(x,position,stroke, size) df$row \u0026lt;- NA df$column \u0026lt;- NA ################################################ #Getting X and Y coordinates for Each Nucleotide ############################################### div \u0026lt;- divisors(length(x)) div.position \u0026lt;- round(length(divisors(length(x)))/2) divider \u0026lt;- div[div.position] x.pos \u0026lt;- seq(1, length(x), divider) num.column \u0026lt;- length(x)/divider #How far in the x position to go y.pos \u0026lt;- seq(1, length(x), num.column) col.ref \u0026lt;- seq_len(num.column) #X position calculation for (i in seq_len(num.column)) { if(i == num.column) { df$row[x.pos[i]:c(length(x))] \u0026lt;- i }else { df$row[x.pos[i]:c(x.pos[i+1]-1)] \u0026lt;- i } } for (i in seq_len(divider)) { pos \u0026lt;- c(x.pos + i -1) df$column[(pos)] \u0026lt;- i } #Binary designation - plot if x = 1 df$plot \u0026lt;- ifelse(df$x == 1, 1, NA) Plotting the Gene # Now we can finally plot the gene using the column and row positions we have calculated above. Notice we are actually plotting the subset of the data frame that does not have NA values (these NAs correspond to 0s).\nlibrary(ggplot2) ggplot(subset(df, !is.na(plot)), aes(x=column, y = row)) + geom_point(aes(size = size, stroke = stroke,), shape = 21) + guides(size = \u0026#34;none\u0026#34;) + theme_void() ","date":"31 January 2023","externalUrl":null,"permalink":"/posts/binary-genetics/","section":"Posts","summary":"Generative art from a gene coding sequence: pulling BRCA1 with biomaRt and rendering its bases as a binary mosaic in R.","title":"Binary Genetics","type":"posts"},{"content":"","date":"31 January 2023","externalUrl":null,"permalink":"/tags/data-science/","section":"Tags","summary":"","title":"Data Science","type":"tags"},{"content":"","date":"31 January 2023","externalUrl":null,"permalink":"/tags/generative-art/","section":"Tags","summary":"","title":"Generative Art","type":"tags"},{"content":"","date":"31 January 2023","externalUrl":null,"permalink":"/tags/r/","section":"Tags","summary":"","title":"R","type":"tags"},{"content":"","externalUrl":null,"permalink":"/authors/","section":"Authors","summary":"","title":"Authors","type":"authors"},{"content":"","externalUrl":null,"permalink":"/categories/","section":"Categories","summary":"","title":"Categories","type":"categories"},{"content":"This file holds the structured copy for the editorial landing page. The layout in layouts/index.html renders the hero and the five numbered sections from these front-matter fields. Visual styling lands in a later phase.\n","externalUrl":null,"permalink":"/","section":"Nick Borcherding","summary":"","title":"Nick Borcherding","type":"page"},{"content":"","externalUrl":null,"permalink":"/posts/","section":"Posts","summary":"","title":"Posts","type":"posts"},{"content":"Complete list of peer-reviewed work, grouped by year. Use the filter to search by title, author, or journal.\n","externalUrl":null,"permalink":"/publications/","section":"Publications","summary":"","title":"Publications","type":"publications"},{"content":"","externalUrl":null,"permalink":"/series/","section":"Series","summary":"","title":"Series","type":"series"},{"content":"A selection of recent and upcoming talks on computational immunology, single-cell analysis, and clinical immunogenetics.\n2026 # Pre-Transplant T Cell Receptor Network Topology Predicts Kidney Allograft Outcome Independent of HLA Mismatch — AIRR Community Meeting VIII, New Haven, CT. June 2026. Slides Pre-transplant TCR Network Topology Predicts Kidney Allograft Rejection — Academy of Clinical Laboratory Physicians and Scientists Annual Meeting, St. Louis, MO. June 2026. Slides 2025 # Celiac Disease: A Tail of Two HLA Alleles — Clinical Conference of Histocompatibility \u0026amp; Immunogenetics, Washington University, St. Louis, MO. August 2025. Slides Predicting TCR Specificity in the Age of Single-Cell Sequencing — American Society of Histocompatibility and Immunogenetics, Orlando, FL. October 2025. Slides Bridging Biology and Bytes: The World of Computational Immunology — Department of Immunology Grand Rounds, Mayo Clinic, Rochester, MN. June 2025. Computational Approaches to the Immune Synapse — Rheumatology Translational Research Conference, Washington University in St. Louis, MO. March 2025. 2023 # Transcriptional Heterogeneity in Cancer-Associated Regulatory T Cells is Predictive of Survival — Single-Cell RNA Sequencing Symposium, Iowa Institute of Genetics, University of Iowa, Iowa City, IA. April 2023. Single-Cell Characterization of the T Follicular Immune Response in COVID-19 Vaccination Using Deep Learning — Physician-Scientist Symposium, St. Louis, MO. June 2023. Using Deep Learning to Characterize Immune Response — Single Cell Club Meeting, McGill University, Montreal, Canada. January 2023. Slides 2022 # Departmental Grand Rounds: Diagnostic Utility of TCR Sequencing in Cutaneous T Cell Lymphoma — Washington University Department of Pathology and Immunology, St. Louis, MO. September 2022. Slides ","externalUrl":null,"permalink":"/talks/","section":"Talks","summary":"","title":"Talks","type":"talks"}]