Zero-shot antibody diversification with Cradle
Zero-shot antibody diversification with Cradle
Scientists kicking off a campaign using Cradle often have little to start with: a weakly functional lead and a target product profile. The gap between the two seems huge. Here's how we get better binding—often alongside additional gains—without a single labeled training datapoint.

Jonathan

Jonathan

One of the most common questions scientists ask before starting with Cradle is whether they have enough data. They might be working without any campaign-specific assay data at all, and want to kick off a project with only a weakly functional starting sequence and a target product profile. It seems daunting when you're doing things the old way, but over dozens of zero-shot diversification campaigns using Cradle, we have often exceeded the level of binding that’s typical from single-mutant scanning—with some achieving affinity leaps as large as 350-fold. Here, we look under the hood to show how this worked for a few of them.
Over the last two years, our approach to antibody optimization has matured, going from exploratory experiments to an automated pipeline that uses our substantial background in protein optimization while accounting for the specific features of the antibody at hand. As we developed this capability, we made three important observations:
1. Our pipeline generates variants of a starting antibody that preserve binding and biophysical properties of the original.
We observe behavior similar to that of other diversification rounds, with both up- and downward variations and a low rate of binding loss. This means that our approach is suitable for generating initial data for supervised training.
2. Many antibodies possess headroom that is easily accessible.
Often our experiments yield several variants with substantial affinity improvements and notable improvements of orthogonal properties, such as expression and melting temperature. We don't optimize for improvement over the lead explicitly, yet repeated results across targets confirm that our success in the Adaptyv competition was not a one-off.
3. Our mutation strategy matches the functional rates of single-mutant site-scanning mutagenesis (SSM) campaigns from the literature, while producing greater sequence diversity, and often hits with higher binding affinity.
SSM is the traditional alternative, in which candidate positions for substitutions in the CDRs–the hypervariable loops of antibodies most attributed to binding–are identified and exhaustively explored in the single-mutant setting. Sampling small combinations of mutations from protein language models accesses bigger functional jumps and richer per-variant training signal, at no meaningful cost in functional rate.
How we got there is a great illustration of the value of quick in vitro feedback for generative protein engineering pipelines. As we discuss in our post on our benchmarking philosophy, the only viable way to build reliable pipelines is to produce variants in the lab, and test them.
In case you missed it
Cradle is a software platform for active-learning driven protein optimization. A scientist uploads a lead sequence or dataset, the platform proposes a set of variants, the scientist characterizes them in their own assays, and the platform trains models on the resulting data to propose the next round of designs. Each round pushes molecules into regions of property space that evolution has not selected for, typically along several competing axes at once.
The process of generating new variants around your molecule of interest is called “diversification.” The process of generating new variants around your molecule of interest without any prior examples of what works is called “zero-shot” diversification. If you’re looking for novel sequences but you haven’t done any training yet, this is the most dependable path that lies before you–and we've had remarkable success with it. This post walks through what round 1 looks like in that situation, across seven antibody campaigns.
Affinity headroom around leads
Last year, we participated in the Adaptyv competition where the goal was to design the strongest possible binders to the EGFR protein (a cancer target), with the constraint that any submission had to be at least 10 amino acids away from any known binder. It felt like a good opportunity for an external test of our approach: We could quite easily start from a known binder and generate variants using our diversification pipeline, which allows for mutation count constraints. We were not yet fully confident in our ability to generate CDR mutations, so we limited the editable region to the frameworks. Note that our diversification strategy does not model anything about binding or epitopes: It is simply a well-honed pseudo-homolog-generation machine, i.e. it generates plausible evolutionarily-related proteins that have a high chance of having the same function as the starting point. From a practical perspective, designing this library with the necessary constraints takes about 10 minutes of human interaction, and around 10 hours of training + inference time.
This approach was highly successful: Not only did we win the competition, improving on the starting protein (an scFv version of Cetuximab), but all 12 of our submitted designs would have won the competition!
Impressive, but was it a fluke? Given that we weren’t even trying to explicitly optimize affinity, there might have been something special about our starting point in this case.
But no: The competition was actually our third experiment with this kind of result (x10 improvement), and since then we have conducted an experiment where the affinity improvement was even more drastic (x350).
By now we have run quite a few more such zero-shot experiments, both internally and with CROs, along with tens more zero-shot campaigns for our customers.
Here, we will present results from our zero-shot process on the following:
the receptor-binding domain (RBD) of the SARS-CoV-2 spike protein;
the human epidermal growth factor receptor 2 (HER2, a protein that is over-expressed in 30% of breast cancer tumors and has been a successful target for cancer immunotherapy);
and the epidermal growth factor receptor (EGFR).
For each of these, we simulated what a typical zero-shot lead optimization setting would look like: We start from a confirmed lead binder and assume that no variants have been evaluated in binding assays which would otherwise provide useful training data for predictive models. As a starting point, we focused on these single-chain antibody fragment binders:
Ty1, a VHH fragment identified as a single-digit nanomolar binder to SARS-CoV-2 RBD by Hanke et al in an alpaca immunization campaign;
Ty2, a novel VHH fragment we identified from the phage panning data produced by Hanke et al. We selected this variant as a more challenging starting point since we determined it to be 100x weaker than Ty1 as a SARS-CoV-2 RBD binder;
Trastuzumab-scFv, an scFv version of Trastuzumab, a humanized mouse antibody with a strong binding affinity to HER2. The scFv has the format VH-(GGGGS)x3-VL;
Cetuximab-scFv, the scFv version of Cetuximab, a chimeric antibody (human framework, mouse CDR) with a strong binding affinity to EGFR. We used the scFv as a strong binder reference in the Adaptyv competition (VH-(GGGGS)x3-VL).
We later expanded to C102, a Fab fragment version of a SARS-CoV-2 RBD-targeting binder, which was identified by Robbiani et al.
Note that while these are well-studied targets–chosen deliberately so that our results can be benchmarked against public data and reproduced–our customers run campaigns that frequently involve harder targets with less prior signal in pre-training. While we cannot share details, we consistently see hit rates above 50% in those settings as well.
We can sort our experiments into four tiers:
Framework-only mutants. Our first experiments restricted mutations to framework regions only, excluding CDRs. This was motivated by two factors: uncertainty about our ability to generate affinity-preserving CDR mutations in a zero-shot context, and existing literature showing that framework mutations informed by evolutionary information alone can improve binding affinity (Shanker et al). This covers our very first experiment on Ty1, as well as our Cetuximab-scFv variants against EGFR.
Framework+H1+H2 mutants. Hanke et al also provide sequencing data for their phage display panning experiments. We selected enriched VHH variants from this dataset beyond Ty1, which we used as unsupervised training data for our generative models, expecting them to provide useful signals about allowed mutations in CDRs. However, since this dataset had very low variability in CDRH3, we limited designs in our second experiment to CDRH1, CDRH2 and framework mutations.
Full-sequence mutants. To test our hypothesis that CDR mutations could be generated reliably, we ran an experiment on Trastuzumab-scFv (against HER2), allowing mutations across the full sequence. We also applied this strategy to Ty2, the weak anti-SARS-CoV-2 RBD binder we identified internally. This was a riskier setting: Ty2's affinity of 400nM sits close to the edge of SPR assay's measurement range (1μM), meaning even mild affinity degradation would produce binders indistinguishable from non-binders.
Full-sequence+indels mutants. Our anti-SARS-CoV-2 experiments provided an interesting opportunity to use MSA transformer models to generate variable-length sequences. These models take as context a sub-sampled MSA and can generate deletions and insertions by inserting or deleting gap characters. We used variants selected in the phage panning experiment of Hanke et al as inference-time context to generate Ty1 variants, which led to drastic diversification.
Our experiments and their results are summarized in the table below. The last three rows are baseline site-scanning mutagenesis (systematic single-mutant scan) campaigns from the literature, which highlight the efficiency of targeted multi-mutant exploration.
Antigen | Template binder | Template binding affinity | Template format | Edit regions | # Mutations | Functional Rate |
|---|---|---|---|---|---|---|
SARS-CoV-2 RBD | Ty1 | 4.8nM | VHH | FW-only | 2-8 | 69% |
EGFR | Cetuximab | 5nM | scFv | FW-only | 10 | 100% |
SARS-CoV-2 RBD | Ty1 | 4.8nM | VHH | FW+H1+H2 | 2-8 | 62% |
SARS-CoV-2 RBD | Ty1 | 4.8nM | VHH | All+indels | 2-52 | 98% |
SARS-CoV-2 RBD | C102 | 18nM | Fab | All | 2-8 | 80% |
HER2 | Trastuzumab | 3nM | scFv | All | 2-8 | 92% |
SARS-CoV-2 RBD | Ty2 | 400nM | VHH | All | 2-8 | 56% |
From Literature | ||||||
EGFR | 8.5nM | IgG | CDR SSM | 1 | 91% | |
SARS-CoV-2 | 0.3nM | scFv | H2 SSM | 1 | NA | |
Digoxin | 0.2nM | scFv | CDR SSM (6 positions) | 1 | 75% |
For 4/7 of our experiments and 2/3 targets, we found significant affinity improvements. This includes a whopping 350-fold reduction in the dissociation constant of our novel weak binder to SARS-CoV-2, Ty2. The one exception is our Trastuzumab diversification attempt, where none of the designed variants beat the starting Trastuzumab-scFv in terms of affinity to HER2. Our Fab diversification experiment, as well as our experiment with indels, both led to improvements on paper, but the difference in binding affinity is not high enough to confidently claim a hit over the template.
The positive results reported by Shanker et al using similar principles had made us confident that our approach would succeed in proposing functional variants with evolution-informed models, but the magnitude of the hits was a genuine surprise (their analogous round 1 has milder hit magnitudes of x1.2 to x8). Our results reinforce the observation that there is easily-accessible affinity improvement headroom around natural and traditionally-optimized antibodies, especially given that affinity optimization has historically been focused on CDRs.
A high-magnitude affinity hit in round 1 can be valuable on its own, sometimes enough to advance a candidate directly. But affinity is rarely the only property of interest in biologics development: Developability, expression, thermostability, immunogenicity, and others need to land together–a variant that wins on affinity while significantly degrading on any of the others doesn’t really move the needle. The odds of any single zero-shot round producing a variant that satisfies every axis of a multi-property optimization problem are low, which is why we treat zero-shot diversification as an exploratory data-generation step for the multi-property models that drive subsequent rounds.
This is nicely illustrated by observing the variation in melting temperature and expression of our variants: Despite not finding any hits in affinity over our starting Trastuzumab-scFv, we did generate variants with improved thermostability. One of the hits displayed +30% expression and +4°C melting temperature. So variants that might be discarded if looking at just one category of attributes can be bookmarked for later rounds that have a desired outcome defined across multiple properties. (Of course, this works both ways: The best Ty2 variants had such low expression in our E. coli vector that we had to resort to a CRO using a TX/TL expression system to perform an affinity assay.)
Efficient model training requires functional diversity
One of the reasons we focus on active-learning optimization at Cradle is because we frequently encounter these functional tradeoffs. Since the goal is to push molecules into regions of property space that evolution has not selected for, we view diversification in this context primarily as a way to generate an initial training dataset for models tuned to a specific target product profile (TPP). In contrast to the classical approach of serially performing rational design on a set of properties, we have found that generating function-preserving data in this fashion translates directly to explicit multi-property optimization: Each added property expands the search space. (Which is awesome because joint optima are rarely reachable one property at a time.) The right baseline criteria of a method, we believe, should therefore be total rounds to satisfy the full TPP (and, more broadly, the number of successfully completed TPP’s)–recognizing that these are broad metrics, and that, for fast feedback on diversification strategies, it is still useful to additionally qualify success at the single-round level.
Since we are building a training dataset for targeted models, a key metric is the rate of functional designs, which directly determines the size of the dataset. This is especially relevant for antibodies, where optimization attempts can lead to a total loss of binding. Enzymes, by contrast, nearly always preserve function, except when low initial expression combined with downward fluctuation yields insufficient protein to run assays (a problem that also affects antibodies).
An ideal dataset for our models has diversity in both sequence and function space. Mutations that produce different tradeoffs among measured properties, with variants both better and worse than the starting point, typically give models the training signal they need to identify promising directions for optimization.
In that regard, our Trastuzumab-scFv diversification campaign was successful, achieving a binding-preservation rate of 92%, and exploring expression and thermostability quite broadly.
Her2 affinity: Zero-Shot from a strongly-binding scFv: Hit rate 0%, Best hit affinity x1, Functional rate: 92%
Her2 Tm/Expression: Zero-shot from a strongly-binding ScFv

Generally, we expect to observe both up- and downward fluctuations centered around a starting point. That's also what we observed for affinity measurements in our very first experiment of this sort, in which we generated framework mutations around Ty1 (the strong binder against SARS-CoV2 RBD). Note that in most cases where variants couldn't be assayed, it was due to expression issues in a TX/TL system. Indeed, our zero-shot models tend to have higher success in cell-based expression systems.
Ty1 KD Distribution: Zero-Shot from a strongly binding Nanobody (FW only): Hit rate 36%, best hit affinity x15, Functional rate 69%

This general trend is why we approached the diversification of Ty2 (the weak binder we discovered) with some trepidation: Its affinity of 400nM being quite close to the edge of the affinity range our CRO partner could measure (around 1μM), we anticipated a rather high failure rate. This round fell into our preferred pattern with a good balance of up- and downward affinity fluctuations: 50% of our designs preserved their binding properties (including some drastic improvements), leading to an exact hit rate of 50%.
Ty2 KD Distribution: Zero-Shot from a weakly binding Nanobody: Hit rate 50%, best hit affinity x357

The pattern of both up- and downward variation around the template binder is also observed for our C102 anti-SARS-CoV-2 Fab experiment, as well as for our experiment introducing indels and large mutation loads on Ty1.
c102 Fab KD Distribution

Ty1 Indel KD Distribution

Despite making more drastic changes to our starting binder (Ty1) during the indel generation round, the resulting distribution of affinities was much narrower than in our other experiments. We hypothesize that this is due to the explicit conditioning of the model with an alignment of binders identified through panning: Even with large edits, the "context" binders constrain the functional space much more tightly than the semi-supervised conditioning provided by pre-training in our typical experiments. In practice, this gives us a useful lever: When panning or NGS reads on related binders are available, alignment-based conditioning can extend the safe edit budget at the not-horrible cost of narrower affinity exploration.
Let's take another look at our results in terms of binding affinity for our different experiments:
Antigen | Template binder | Template binding affinity | Template Format | Edit regions | # Mutations | Hit Rate | affinity improvement | Functional rate |
SARS-CoV-2 RBD | Ty1 | 4.8nM | VHH | FW-only | 2-8 | 35/96 | x15 | 69% |
EGFR | Cetuximab | 5nM | scFv | FW-only | 10 | 11/12 | x18 | |
SARS-CoV-2 RBD | Ty1 | 4.8nM | VHH | FW+H1+H2 | 2-8 | 8/96 | x7 | 62% |
SARS-CoV-2 RBD | Ty1 | 4.8nM | VHH | All+indels | 2-52 | 69/384 | x1.2 | 98% |
SARS-CoV-2 RBD | C102 | 18nM | Fab | All | 2-8 | 30/384 | x2 | 80% |
HER2 | Trastuzumab | 3nM | scFv | All | 2-8 | 0/192 | NA | 92% |
SARS-CoV-2 RBD | Ty2 | 400nM | VHH | All | 2-8 | 24/48 | x350 | 56% |
From Literature | ||||||||
EGFR | 8.5nM | IgG | CDR SSM | 1 | 66/1060 | x4.8 | 91% | |
SARS-CoV-2 | 0.3nM | scFv | H2 SSM | 1 | 9/190 | x2.6 | NA | |
Digoxin | 0.2nM | scFv | CDR SSM (6 positions) | 1 | 17/114 | x1.5 | 75% |
Note the fair amount of variation across experiments. Excluding the 100% functional rate observed in the low-N Adaptyv competition candidates, success rates for strong binders range from 60% to 90% while the success rate of our weak binder experiment was 56%. This provides a useful guideline for the throughput of these exploratory rounds: Given that we typically recommend 90 unique variants as the minimum training dataset for targeted models, evaluating 190 variants is a reasonable target to efficiently kick off an optimization campaign using Cradle.
Our experiment involving indels stands out in terms of results: While it only yielded similar binders to the starting point Ty1, a vast majority of variants maintained affinity despite high mutation loads of up to 50% of the VHH, many without resolvable decrease in affinity. This highlights that there is a lot of available functional exploration room around natural strong binders, which can be utilized for optimizing other properties involved in developability.
What's next?
As we have highlighted, zero-shot diversification is primarily a tool for initiating an active-learning optimization campaign when sequence-function data on close variants of the lead are not yet available. This process is remarkably effective at finding high-affinity hits–despite being optimized for something else, namely to generate guidance for subsequent multi-property optimization rounds.
Some of the presented results have gone on to be the first round of multi-round case studies, demonstrating the utility of datasets generated in this way. For instance, we have performed a 3-round multi-property optimization case study on both the Ty1 anti-SARS-CoV-2 VHH and the anti-HER2 scFv presented above, comparing our approach to open source ESM2-based tools and classical bioinformatics approaches. This case study can be found as part of our recently published whitepaper.
A key challenge for our full-sequence, multi-mutant approaches, however, is the risk of unknowingly introducing immunogenic mutations, particularly those with a high mutation load in framework regions. While our models are biased toward suggesting mutations found in the human proteome, we expect that targeted optimization signals could still create such liabilities. We have shown that our designs do not degrade in silico scores such as the T20 humanness score and the number of MHCII-binding core peptides (details in our blog post), and we continue actively exploring efficient ways to incorporate immunogenicity constraints into our design process.
Conclusion: Moving fast with zero-shot design
At the outset, the path from a weak lead to a clinical candidate can seem dark, narrow and steep. But our results across seven campaigns show that you don't need a ton of data to start making meaningful progress. By leveraging zero-shot diversification, scientists can bypass the initial "data gap," generating high-quality variants and significant affinity jumps even in the first round of testing. From there, the slope levels off and the path widens and divides, leading to additional attractive landscapes. This approach doesn't just find better molecules; it builds the functional diversity required to train the targeted models that solve complex, multi-property challenges. It means you can start your campaign with confidence, even when you're starting with very little.
One of the most common questions scientists ask before starting with Cradle is whether they have enough data. They might be working without any campaign-specific assay data at all, and want to kick off a project with only a weakly functional starting sequence and a target product profile. It seems daunting when you're doing things the old way, but over dozens of zero-shot diversification campaigns using Cradle, we have often exceeded the level of binding that’s typical from single-mutant scanning—with some achieving affinity leaps as large as 350-fold. Here, we look under the hood to show how this worked for a few of them.
Over the last two years, our approach to antibody optimization has matured, going from exploratory experiments to an automated pipeline that uses our substantial background in protein optimization while accounting for the specific features of the antibody at hand. As we developed this capability, we made three important observations:
1. Our pipeline generates variants of a starting antibody that preserve binding and biophysical properties of the original.
We observe behavior similar to that of other diversification rounds, with both up- and downward variations and a low rate of binding loss. This means that our approach is suitable for generating initial data for supervised training.
2. Many antibodies possess headroom that is easily accessible.
Often our experiments yield several variants with substantial affinity improvements and notable improvements of orthogonal properties, such as expression and melting temperature. We don't optimize for improvement over the lead explicitly, yet repeated results across targets confirm that our success in the Adaptyv competition was not a one-off.
3. Our mutation strategy matches the functional rates of single-mutant site-scanning mutagenesis (SSM) campaigns from the literature, while producing greater sequence diversity, and often hits with higher binding affinity.
SSM is the traditional alternative, in which candidate positions for substitutions in the CDRs–the hypervariable loops of antibodies most attributed to binding–are identified and exhaustively explored in the single-mutant setting. Sampling small combinations of mutations from protein language models accesses bigger functional jumps and richer per-variant training signal, at no meaningful cost in functional rate.
How we got there is a great illustration of the value of quick in vitro feedback for generative protein engineering pipelines. As we discuss in our post on our benchmarking philosophy, the only viable way to build reliable pipelines is to produce variants in the lab, and test them.
In case you missed it
Cradle is a software platform for active-learning driven protein optimization. A scientist uploads a lead sequence or dataset, the platform proposes a set of variants, the scientist characterizes them in their own assays, and the platform trains models on the resulting data to propose the next round of designs. Each round pushes molecules into regions of property space that evolution has not selected for, typically along several competing axes at once.
The process of generating new variants around your molecule of interest is called “diversification.” The process of generating new variants around your molecule of interest without any prior examples of what works is called “zero-shot” diversification. If you’re looking for novel sequences but you haven’t done any training yet, this is the most dependable path that lies before you–and we've had remarkable success with it. This post walks through what round 1 looks like in that situation, across seven antibody campaigns.
Affinity headroom around leads
Last year, we participated in the Adaptyv competition where the goal was to design the strongest possible binders to the EGFR protein (a cancer target), with the constraint that any submission had to be at least 10 amino acids away from any known binder. It felt like a good opportunity for an external test of our approach: We could quite easily start from a known binder and generate variants using our diversification pipeline, which allows for mutation count constraints. We were not yet fully confident in our ability to generate CDR mutations, so we limited the editable region to the frameworks. Note that our diversification strategy does not model anything about binding or epitopes: It is simply a well-honed pseudo-homolog-generation machine, i.e. it generates plausible evolutionarily-related proteins that have a high chance of having the same function as the starting point. From a practical perspective, designing this library with the necessary constraints takes about 10 minutes of human interaction, and around 10 hours of training + inference time.
This approach was highly successful: Not only did we win the competition, improving on the starting protein (an scFv version of Cetuximab), but all 12 of our submitted designs would have won the competition!
Impressive, but was it a fluke? Given that we weren’t even trying to explicitly optimize affinity, there might have been something special about our starting point in this case.
But no: The competition was actually our third experiment with this kind of result (x10 improvement), and since then we have conducted an experiment where the affinity improvement was even more drastic (x350).
By now we have run quite a few more such zero-shot experiments, both internally and with CROs, along with tens more zero-shot campaigns for our customers.
Here, we will present results from our zero-shot process on the following:
the receptor-binding domain (RBD) of the SARS-CoV-2 spike protein;
the human epidermal growth factor receptor 2 (HER2, a protein that is over-expressed in 30% of breast cancer tumors and has been a successful target for cancer immunotherapy);
and the epidermal growth factor receptor (EGFR).
For each of these, we simulated what a typical zero-shot lead optimization setting would look like: We start from a confirmed lead binder and assume that no variants have been evaluated in binding assays which would otherwise provide useful training data for predictive models. As a starting point, we focused on these single-chain antibody fragment binders:
Ty1, a VHH fragment identified as a single-digit nanomolar binder to SARS-CoV-2 RBD by Hanke et al in an alpaca immunization campaign;
Ty2, a novel VHH fragment we identified from the phage panning data produced by Hanke et al. We selected this variant as a more challenging starting point since we determined it to be 100x weaker than Ty1 as a SARS-CoV-2 RBD binder;
Trastuzumab-scFv, an scFv version of Trastuzumab, a humanized mouse antibody with a strong binding affinity to HER2. The scFv has the format VH-(GGGGS)x3-VL;
Cetuximab-scFv, the scFv version of Cetuximab, a chimeric antibody (human framework, mouse CDR) with a strong binding affinity to EGFR. We used the scFv as a strong binder reference in the Adaptyv competition (VH-(GGGGS)x3-VL).
We later expanded to C102, a Fab fragment version of a SARS-CoV-2 RBD-targeting binder, which was identified by Robbiani et al.
Note that while these are well-studied targets–chosen deliberately so that our results can be benchmarked against public data and reproduced–our customers run campaigns that frequently involve harder targets with less prior signal in pre-training. While we cannot share details, we consistently see hit rates above 50% in those settings as well.
We can sort our experiments into four tiers:
Framework-only mutants. Our first experiments restricted mutations to framework regions only, excluding CDRs. This was motivated by two factors: uncertainty about our ability to generate affinity-preserving CDR mutations in a zero-shot context, and existing literature showing that framework mutations informed by evolutionary information alone can improve binding affinity (Shanker et al). This covers our very first experiment on Ty1, as well as our Cetuximab-scFv variants against EGFR.
Framework+H1+H2 mutants. Hanke et al also provide sequencing data for their phage display panning experiments. We selected enriched VHH variants from this dataset beyond Ty1, which we used as unsupervised training data for our generative models, expecting them to provide useful signals about allowed mutations in CDRs. However, since this dataset had very low variability in CDRH3, we limited designs in our second experiment to CDRH1, CDRH2 and framework mutations.
Full-sequence mutants. To test our hypothesis that CDR mutations could be generated reliably, we ran an experiment on Trastuzumab-scFv (against HER2), allowing mutations across the full sequence. We also applied this strategy to Ty2, the weak anti-SARS-CoV-2 RBD binder we identified internally. This was a riskier setting: Ty2's affinity of 400nM sits close to the edge of SPR assay's measurement range (1μM), meaning even mild affinity degradation would produce binders indistinguishable from non-binders.
Full-sequence+indels mutants. Our anti-SARS-CoV-2 experiments provided an interesting opportunity to use MSA transformer models to generate variable-length sequences. These models take as context a sub-sampled MSA and can generate deletions and insertions by inserting or deleting gap characters. We used variants selected in the phage panning experiment of Hanke et al as inference-time context to generate Ty1 variants, which led to drastic diversification.
Our experiments and their results are summarized in the table below. The last three rows are baseline site-scanning mutagenesis (systematic single-mutant scan) campaigns from the literature, which highlight the efficiency of targeted multi-mutant exploration.
Antigen | Template binder | Template binding affinity | Template format | Edit regions | # Mutations | Functional Rate |
|---|---|---|---|---|---|---|
SARS-CoV-2 RBD | Ty1 | 4.8nM | VHH | FW-only | 2-8 | 69% |
EGFR | Cetuximab | 5nM | scFv | FW-only | 10 | 100% |
SARS-CoV-2 RBD | Ty1 | 4.8nM | VHH | FW+H1+H2 | 2-8 | 62% |
SARS-CoV-2 RBD | Ty1 | 4.8nM | VHH | All+indels | 2-52 | 98% |
SARS-CoV-2 RBD | C102 | 18nM | Fab | All | 2-8 | 80% |
HER2 | Trastuzumab | 3nM | scFv | All | 2-8 | 92% |
SARS-CoV-2 RBD | Ty2 | 400nM | VHH | All | 2-8 | 56% |
From Literature | ||||||
EGFR | 8.5nM | IgG | CDR SSM | 1 | 91% | |
SARS-CoV-2 | 0.3nM | scFv | H2 SSM | 1 | NA | |
Digoxin | 0.2nM | scFv | CDR SSM (6 positions) | 1 | 75% |
For 4/7 of our experiments and 2/3 targets, we found significant affinity improvements. This includes a whopping 350-fold reduction in the dissociation constant of our novel weak binder to SARS-CoV-2, Ty2. The one exception is our Trastuzumab diversification attempt, where none of the designed variants beat the starting Trastuzumab-scFv in terms of affinity to HER2. Our Fab diversification experiment, as well as our experiment with indels, both led to improvements on paper, but the difference in binding affinity is not high enough to confidently claim a hit over the template.
The positive results reported by Shanker et al using similar principles had made us confident that our approach would succeed in proposing functional variants with evolution-informed models, but the magnitude of the hits was a genuine surprise (their analogous round 1 has milder hit magnitudes of x1.2 to x8). Our results reinforce the observation that there is easily-accessible affinity improvement headroom around natural and traditionally-optimized antibodies, especially given that affinity optimization has historically been focused on CDRs.
A high-magnitude affinity hit in round 1 can be valuable on its own, sometimes enough to advance a candidate directly. But affinity is rarely the only property of interest in biologics development: Developability, expression, thermostability, immunogenicity, and others need to land together–a variant that wins on affinity while significantly degrading on any of the others doesn’t really move the needle. The odds of any single zero-shot round producing a variant that satisfies every axis of a multi-property optimization problem are low, which is why we treat zero-shot diversification as an exploratory data-generation step for the multi-property models that drive subsequent rounds.
This is nicely illustrated by observing the variation in melting temperature and expression of our variants: Despite not finding any hits in affinity over our starting Trastuzumab-scFv, we did generate variants with improved thermostability. One of the hits displayed +30% expression and +4°C melting temperature. So variants that might be discarded if looking at just one category of attributes can be bookmarked for later rounds that have a desired outcome defined across multiple properties. (Of course, this works both ways: The best Ty2 variants had such low expression in our E. coli vector that we had to resort to a CRO using a TX/TL expression system to perform an affinity assay.)
Efficient model training requires functional diversity
One of the reasons we focus on active-learning optimization at Cradle is because we frequently encounter these functional tradeoffs. Since the goal is to push molecules into regions of property space that evolution has not selected for, we view diversification in this context primarily as a way to generate an initial training dataset for models tuned to a specific target product profile (TPP). In contrast to the classical approach of serially performing rational design on a set of properties, we have found that generating function-preserving data in this fashion translates directly to explicit multi-property optimization: Each added property expands the search space. (Which is awesome because joint optima are rarely reachable one property at a time.) The right baseline criteria of a method, we believe, should therefore be total rounds to satisfy the full TPP (and, more broadly, the number of successfully completed TPP’s)–recognizing that these are broad metrics, and that, for fast feedback on diversification strategies, it is still useful to additionally qualify success at the single-round level.
Since we are building a training dataset for targeted models, a key metric is the rate of functional designs, which directly determines the size of the dataset. This is especially relevant for antibodies, where optimization attempts can lead to a total loss of binding. Enzymes, by contrast, nearly always preserve function, except when low initial expression combined with downward fluctuation yields insufficient protein to run assays (a problem that also affects antibodies).
An ideal dataset for our models has diversity in both sequence and function space. Mutations that produce different tradeoffs among measured properties, with variants both better and worse than the starting point, typically give models the training signal they need to identify promising directions for optimization.
In that regard, our Trastuzumab-scFv diversification campaign was successful, achieving a binding-preservation rate of 92%, and exploring expression and thermostability quite broadly.
Her2 affinity: Zero-Shot from a strongly-binding scFv: Hit rate 0%, Best hit affinity x1, Functional rate: 92%
Her2 Tm/Expression: Zero-shot from a strongly-binding ScFv

Generally, we expect to observe both up- and downward fluctuations centered around a starting point. That's also what we observed for affinity measurements in our very first experiment of this sort, in which we generated framework mutations around Ty1 (the strong binder against SARS-CoV2 RBD). Note that in most cases where variants couldn't be assayed, it was due to expression issues in a TX/TL system. Indeed, our zero-shot models tend to have higher success in cell-based expression systems.
Ty1 KD Distribution: Zero-Shot from a strongly binding Nanobody (FW only): Hit rate 36%, best hit affinity x15, Functional rate 69%

This general trend is why we approached the diversification of Ty2 (the weak binder we discovered) with some trepidation: Its affinity of 400nM being quite close to the edge of the affinity range our CRO partner could measure (around 1μM), we anticipated a rather high failure rate. This round fell into our preferred pattern with a good balance of up- and downward affinity fluctuations: 50% of our designs preserved their binding properties (including some drastic improvements), leading to an exact hit rate of 50%.
Ty2 KD Distribution: Zero-Shot from a weakly binding Nanobody: Hit rate 50%, best hit affinity x357

The pattern of both up- and downward variation around the template binder is also observed for our C102 anti-SARS-CoV-2 Fab experiment, as well as for our experiment introducing indels and large mutation loads on Ty1.
c102 Fab KD Distribution

Ty1 Indel KD Distribution

Despite making more drastic changes to our starting binder (Ty1) during the indel generation round, the resulting distribution of affinities was much narrower than in our other experiments. We hypothesize that this is due to the explicit conditioning of the model with an alignment of binders identified through panning: Even with large edits, the "context" binders constrain the functional space much more tightly than the semi-supervised conditioning provided by pre-training in our typical experiments. In practice, this gives us a useful lever: When panning or NGS reads on related binders are available, alignment-based conditioning can extend the safe edit budget at the not-horrible cost of narrower affinity exploration.
Let's take another look at our results in terms of binding affinity for our different experiments:
Antigen | Template binder | Template binding affinity | Template Format | Edit regions | # Mutations | Hit Rate | affinity improvement | Functional rate |
SARS-CoV-2 RBD | Ty1 | 4.8nM | VHH | FW-only | 2-8 | 35/96 | x15 | 69% |
EGFR | Cetuximab | 5nM | scFv | FW-only | 10 | 11/12 | x18 | |
SARS-CoV-2 RBD | Ty1 | 4.8nM | VHH | FW+H1+H2 | 2-8 | 8/96 | x7 | 62% |
SARS-CoV-2 RBD | Ty1 | 4.8nM | VHH | All+indels | 2-52 | 69/384 | x1.2 | 98% |
SARS-CoV-2 RBD | C102 | 18nM | Fab | All | 2-8 | 30/384 | x2 | 80% |
HER2 | Trastuzumab | 3nM | scFv | All | 2-8 | 0/192 | NA | 92% |
SARS-CoV-2 RBD | Ty2 | 400nM | VHH | All | 2-8 | 24/48 | x350 | 56% |
From Literature | ||||||||
EGFR | 8.5nM | IgG | CDR SSM | 1 | 66/1060 | x4.8 | 91% | |
SARS-CoV-2 | 0.3nM | scFv | H2 SSM | 1 | 9/190 | x2.6 | NA | |
Digoxin | 0.2nM | scFv | CDR SSM (6 positions) | 1 | 17/114 | x1.5 | 75% |
Note the fair amount of variation across experiments. Excluding the 100% functional rate observed in the low-N Adaptyv competition candidates, success rates for strong binders range from 60% to 90% while the success rate of our weak binder experiment was 56%. This provides a useful guideline for the throughput of these exploratory rounds: Given that we typically recommend 90 unique variants as the minimum training dataset for targeted models, evaluating 190 variants is a reasonable target to efficiently kick off an optimization campaign using Cradle.
Our experiment involving indels stands out in terms of results: While it only yielded similar binders to the starting point Ty1, a vast majority of variants maintained affinity despite high mutation loads of up to 50% of the VHH, many without resolvable decrease in affinity. This highlights that there is a lot of available functional exploration room around natural strong binders, which can be utilized for optimizing other properties involved in developability.
What's next?
As we have highlighted, zero-shot diversification is primarily a tool for initiating an active-learning optimization campaign when sequence-function data on close variants of the lead are not yet available. This process is remarkably effective at finding high-affinity hits–despite being optimized for something else, namely to generate guidance for subsequent multi-property optimization rounds.
Some of the presented results have gone on to be the first round of multi-round case studies, demonstrating the utility of datasets generated in this way. For instance, we have performed a 3-round multi-property optimization case study on both the Ty1 anti-SARS-CoV-2 VHH and the anti-HER2 scFv presented above, comparing our approach to open source ESM2-based tools and classical bioinformatics approaches. This case study can be found as part of our recently published whitepaper.
A key challenge for our full-sequence, multi-mutant approaches, however, is the risk of unknowingly introducing immunogenic mutations, particularly those with a high mutation load in framework regions. While our models are biased toward suggesting mutations found in the human proteome, we expect that targeted optimization signals could still create such liabilities. We have shown that our designs do not degrade in silico scores such as the T20 humanness score and the number of MHCII-binding core peptides (details in our blog post), and we continue actively exploring efficient ways to incorporate immunogenicity constraints into our design process.
Conclusion: Moving fast with zero-shot design
At the outset, the path from a weak lead to a clinical candidate can seem dark, narrow and steep. But our results across seven campaigns show that you don't need a ton of data to start making meaningful progress. By leveraging zero-shot diversification, scientists can bypass the initial "data gap," generating high-quality variants and significant affinity jumps even in the first round of testing. From there, the slope levels off and the path widens and divides, leading to additional attractive landscapes. This approach doesn't just find better molecules; it builds the functional diversity required to train the targeted models that solve complex, multi-property challenges. It means you can start your campaign with confidence, even when you're starting with very little.
Recent posts
Subscribe and get new posts and updates from Cradle straight to your inbox.
Follow Cradle
Built with ❤️ in Amsterdam & Zurich
Follow Cradle
Built with ❤️ in Amsterdam & Zurich
Follow Cradle




