How does AI change the first round of antibody optimization?

How does AI change the first round of antibody optimization?

The first round of antibody optimization is typically a slog of site-scanning mutagenesis and single-mutant steps, producing incremental gains. But what if round 1 could explore multi-mutant combinations? What if it could preserve binding while exploring other properties that matter? Seven Cradle workflows across three antigens show what that looks like.

Nicolas

Jonathan

Nicolas

Jonathan

Every antibody optimization campaign starts with the same gap. You have a lead, maybe from immunization, maybe from phage display. You know it binds. But you also know it's not ready: The affinity isn't tight enough, or expression is too low, or stability needs work, or some other developability property doesn’t yet meet the bar. Between your lead and your target product profile stretches a series of rounds of variant generation and wet-lab experiments, and the first one shapes everything that follows. 

For many teams, that first round is some version of site-scanning mutagenesis. You pick positions in the CDRs, make every single-point substitution, and screen. It's reliable and well understood, and by mapping the local fitness landscape one position at a time it makes improvements easy to track. The tradeoff is that each variant carries just one mutation, so the improvements you find tend to be incremental, and fail to capture non-additive effects: You don't learn much about how mutations interact.

Editor’s note

Further reading

We described the computational methodology behind these campaigns, including how evolutionary models are trained, how mutation strategies are selected, and how variant libraries are constructed, in the companion technical post.

What if that first round could do more? 

Instead of scanning single positions exhaustively, you’d generate variants carrying multiple mutations, each one exploring a different combination, each one telling you more about how changes in affinity, expression, and stability relate to each other. You’d give up the systematic single-position map, but in return access richer information per variant, with a realistic shot at bigger gains in one round.

Over the past two years, biologics teams at our wet lab in Amsterdam have been using Cradle to run first-round variant libraries this way, generating multi-mutant variants (typically 2-8 mutations each) from models trained on evolutionary sequence data, then assaying them in their own assays. (Dozens of similar campaigns run every year on Cradle at large pharma and biotech firms, and CROs.) 

In machine-learning language, the process of generating new variants around your molecule of interest without any prior examples of what works is called “zero-shot” diversification. It’s a deep-learning problem where the model is trained without being given examples of what works, like an image-recognition machine that has to select pictures of cars without being told what a car looks like. But it knows lots about things similar to and associated with cars, so it starts to build representations of those things’ common characteristics. It’s the same for a program with just a weakly functional starting sequence and a target product profile (TPP) — the spec sheet of properties you want to optimize for. The model doesn’t need your examples of better sequences, because it already has loads of examples of sequences and their functions, and complex mathematical tools with which to build the bridge.  

The power of such a model in protein engineering is that it can generate much bigger mutation loads than a human could reason about and still learn something useful for the next round. A trained bench scientist will be good at understanding where mutations make sense and what function a mutation probably has, and can probably make rational changes to achieve a specific goal. But when goals expand, adding other properties to affinity, or mutations don’t function as hypothesized, Cradle can suggest complementary mutations and do full-sequence exploration at scale. Maximizing learning is actually the real purpose of the zero-shot round (any early improvements being a bonus). Since the data a zero-shot round generates is fed back into the model for a subsequent optimization round, and because it’s also quite accomplished at generating high mutation loads while maintaining protein function, we can explore the functional landscape deeply.

Here, we illustrate our workflow, using well-studied targets (picked so the results could be benchmarked against public data). 

Across seven campaigns, binding was preserved in 56-92% of designed variants. In four of seven, we found affinity improvements ranging from 2-fold to 350-fold — even though we were not optimizing for affinity. Those improvements came as a side effect of generating diverse, functional variants around the lead.

Lessons from seven antibody diversification campaigns

To show what this looks like in practice, we benchmarked the approach on well-characterized targets, choosing starting binders spanning a range of affinities and formats:

  • Ty1: VHH nanobody against SARS-CoV-2 RBD. A lead with strong binding, at 4.8 nM, identified from an alpaca immunization campaign.

  • Ty2: A second VHH against the same target, but with much weaker binding (K_D = 400 nM) . We chose it deliberately as a stress test: At that affinity, even a modest loss pushes variants below the detection limit of most binding assays.

  • Trastuzumab-scFv: An scFv reformatting of Trastuzumab, targeting HER2 with a K_D of 3 nM. This is a familiar and well-optimized clinical antibody.

  • Cetuximab-scFv: An scFv of Cetuximab, targeting EGFR, with a K_D of 5 nM. This was our entry in the Adaptyv binder design competition, where the goal was to design strong EGFR binders at least 10 amino acids away from any known binder.

  • C102: A Fab fragment against SARS-CoV-2 RBD, with a K_D of 18 nM .

Standard practice in most antibody optimization focuses mutations on CDRs. That makes its own kind of sense, but we deliberately started elsewhere. The early campaigns allowed mutations only in framework regions, a conservative strategy that (although it may surprise teams accustomed to CDR-focused campaigns (is backed by published evidence that framework changes alone can improve binding affinity. As confidence grew, we allowed mutations beyond the framework to see what changes outside of known strategies could be beneficial: first CDR H1 and H2, then the full sequence, and in one experiment insertions and deletions with mutation loads up to 50% of the VHH. 

Here’s what came back from the lab

Antigen

Starting Binder

Starting K_D

Format

Where We Mutated

# Mutations

Functional Rate

Best Affinity Improvement

Hit Rate

SARS-CoV-2 RBD

Ty1

4.8 nM

VHH

Framework only

2 — 8

69%

15×

36%

EGFR

Cetuximab-scFv

5 nM

scFv

Framework only

10

100%

>8×

100%

SARS-CoV-2 RBD

Ty1

4.8 nM

VHH

FW + CDR H1/H2

2 — 8

62%

7x

8%

SARS-CoV-2 RBD

Ty1

4.8 nM

VHH

Full + indels

2 — 52

98%

1.2x

18%

SARS-CoV-2 RBD

C102

18 nM

Fab

Full sequence

2 — 8

80%

2x

8%

HER2

Trastuzumab-scFv

3 nM

scFv

Full sequence

2 — 8

92%

None

0%

SARS-CoV-2 RBD

Ty2

400 nM

VHH

Full sequence

2 — 8

56% 

350×

50%

The biggest surprise came from the weakest starting point. Ty2, at 400 nM, was barely within assay range. But the best variant came back at 1.6 nM. A 333-fold improvement in a single round, from a lead many teams would have deprioritized. Half the designed variants preserved binding, and the distribution showed the mix of improvements and deteriorations you'd want to see in a productive diversification round.

Strong binders had room to move, too. Ty1 (4.8 nM) produced variants as tight as 320 pM — a 15-fold improvement —  even though the round was constrained to framework mutations. (In the Adaptyv competition that Cradle won, all 12 Cetuximab-scFv framework variants improved on the starting antibody — so much so that any one of them would have won.)

The sole campaign that missed on affinity turned out to be informative in a different way. Trastuzumab-scFv against HER2 didn't yield any affinity improvement, but 92% of variants still bound the target, and among them were variants with 30% higher expression and 4 °C higher melting temperature. If your campaign needs to move stability or expression alongside affinity, a round like this produces the diversity you need even if it doesn’t move the needle on what you (thought you) were looking for. With Cradle, it turns out you're not optimizing one property at a time, but mapping the tradeoffs between all of them. That clears the path to the full TPP in later rounds.

A practical note on wet-lab assays: The best Ty2 binders by affinity had low expression in the E. coli vector we initially used. In the end we needed to characterize them in a cell-free system (via a CRO). The vector you use determines which variants you can evaluate, and strong binders with poor expression in your system will show up as false negatives — worth factoring into an assay plan.

What this changes about planning a first round

The conventional logic for round 1 is well established: Map single positions systematically, understand which substitutions are tolerated, and build from there. SSM campaigns on comparable targets (including hu225 against EGFR, 42A1 against SARS-CoV-2, dig 26-10 against digoxin) have validated this approach. 

This strategy assumes that the first round is just for finding specific positions with beneficial mutations that can be combined in subsequent rounds, not for making gains in several properties at once. But this assumption is worth questioning. Across our campaigns, multi-mutant libraries delivered functional rates of 56% (weak binder, hardest starting point) to 100% (Adaptyv competition, framework-only), with a typical strong binder landing at 60-90%. You lose the clean per-position map — attributing an improvement to a specific substitution is harder when variants carry 2-8 changes — but you gain something SSM doesn't offer: richer multi-property tradeoff data across mutation combinations, and a first round that benefits subsequent rounds by feeding this richer data into the model.

Practically, this means that assaying around 190 variants at these rates will ensure the collection of at least 90 functional data points. That’s plenty to characterize the local landscape around your lead, and if you're running a Cradle workflow, plenty to train campaign-specific models for targeted optimization in subsequent rounds. The library design takes about 10 minutes of interaction and a few hours of compute. The experiment is whatever your lab already runs — the only thing that changes is what you can expect from it.

From first round to full campaign 

And after round 1? What happens next depends on how you're running your campaign. Some variants may be worth advancing directly. The 350-fold affinity improvement like the one we saw with Ty2 might change the program’s trajectory on its own, for example. 

But the broader value is in what the full dataset enables: When you've characterized 90+ variants across multiple properties, you have enough signal to move from exploratory diversification to targeted, multi-property optimization. We've published multi-round case studies on both the Ty1 VHH and the Trastuzumab-scFv campaigns, comparing outcomes to open-source and classical approaches.

We chose well-studied targets for these case studies so the results could be benchmarked and reproduced. But this isn't limited to model antigens. These campaigns run routinely on proprietary targets with less prior characterization, and though we’ve only been able to run a few experiments with it since it deployed in February, the latest version of our diversification pipeline has a functional rate of 83±3%. It’s worth noting that there seems to be accessible optimization room around most antibodies, especially in regions that traditional approaches haven't exhaustively explored. Framework mutations have historically been underutilized for affinity optimization because CDR-focused campaigns are standard practice. Full-sequence multi-mutant exploration opens a much larger space, and any tradeoffs (e.g. modestly higher risk of losing binding) are manageable when your functional rates are in the range reported here. 

The optimization room is there. The question is whether your first round is designed to gain from it.

Editor’s note

Further reading

We described the computational methodology behind these campaigns, including how evolutionary models are trained, how mutation strategies are selected, and how variant libraries are constructed, in the companion technical post.

Every antibody optimization campaign starts with the same gap. You have a lead, maybe from immunization, maybe from phage display. You know it binds. But you also know it's not ready: The affinity isn't tight enough, or expression is too low, or stability needs work, or some other developability property doesn’t yet meet the bar. Between your lead and your target product profile stretches a series of rounds of variant generation and wet-lab experiments, and the first one shapes everything that follows. 

For many teams, that first round is some version of site-scanning mutagenesis. You pick positions in the CDRs, make every single-point substitution, and screen. It's reliable and well understood, and by mapping the local fitness landscape one position at a time it makes improvements easy to track. The tradeoff is that each variant carries just one mutation, so the improvements you find tend to be incremental, and fail to capture non-additive effects: You don't learn much about how mutations interact.

Editor’s note

Further reading

We described the computational methodology behind these campaigns, including how evolutionary models are trained, how mutation strategies are selected, and how variant libraries are constructed, in the companion technical post.

What if that first round could do more? 

Instead of scanning single positions exhaustively, you’d generate variants carrying multiple mutations, each one exploring a different combination, each one telling you more about how changes in affinity, expression, and stability relate to each other. You’d give up the systematic single-position map, but in return access richer information per variant, with a realistic shot at bigger gains in one round.

Over the past two years, biologics teams at our wet lab in Amsterdam have been using Cradle to run first-round variant libraries this way, generating multi-mutant variants (typically 2-8 mutations each) from models trained on evolutionary sequence data, then assaying them in their own assays. (Dozens of similar campaigns run every year on Cradle at large pharma and biotech firms, and CROs.) 

In machine-learning language, the process of generating new variants around your molecule of interest without any prior examples of what works is called “zero-shot” diversification. It’s a deep-learning problem where the model is trained without being given examples of what works, like an image-recognition machine that has to select pictures of cars without being told what a car looks like. But it knows lots about things similar to and associated with cars, so it starts to build representations of those things’ common characteristics. It’s the same for a program with just a weakly functional starting sequence and a target product profile (TPP) — the spec sheet of properties you want to optimize for. The model doesn’t need your examples of better sequences, because it already has loads of examples of sequences and their functions, and complex mathematical tools with which to build the bridge.  

The power of such a model in protein engineering is that it can generate much bigger mutation loads than a human could reason about and still learn something useful for the next round. A trained bench scientist will be good at understanding where mutations make sense and what function a mutation probably has, and can probably make rational changes to achieve a specific goal. But when goals expand, adding other properties to affinity, or mutations don’t function as hypothesized, Cradle can suggest complementary mutations and do full-sequence exploration at scale. Maximizing learning is actually the real purpose of the zero-shot round (any early improvements being a bonus). Since the data a zero-shot round generates is fed back into the model for a subsequent optimization round, and because it’s also quite accomplished at generating high mutation loads while maintaining protein function, we can explore the functional landscape deeply.

Here, we illustrate our workflow, using well-studied targets (picked so the results could be benchmarked against public data). 

Across seven campaigns, binding was preserved in 56-92% of designed variants. In four of seven, we found affinity improvements ranging from 2-fold to 350-fold — even though we were not optimizing for affinity. Those improvements came as a side effect of generating diverse, functional variants around the lead.

Lessons from seven antibody diversification campaigns

To show what this looks like in practice, we benchmarked the approach on well-characterized targets, choosing starting binders spanning a range of affinities and formats:

  • Ty1: VHH nanobody against SARS-CoV-2 RBD. A lead with strong binding, at 4.8 nM, identified from an alpaca immunization campaign.

  • Ty2: A second VHH against the same target, but with much weaker binding (K_D = 400 nM) . We chose it deliberately as a stress test: At that affinity, even a modest loss pushes variants below the detection limit of most binding assays.

  • Trastuzumab-scFv: An scFv reformatting of Trastuzumab, targeting HER2 with a K_D of 3 nM. This is a familiar and well-optimized clinical antibody.

  • Cetuximab-scFv: An scFv of Cetuximab, targeting EGFR, with a K_D of 5 nM. This was our entry in the Adaptyv binder design competition, where the goal was to design strong EGFR binders at least 10 amino acids away from any known binder.

  • C102: A Fab fragment against SARS-CoV-2 RBD, with a K_D of 18 nM .

Standard practice in most antibody optimization focuses mutations on CDRs. That makes its own kind of sense, but we deliberately started elsewhere. The early campaigns allowed mutations only in framework regions, a conservative strategy that (although it may surprise teams accustomed to CDR-focused campaigns (is backed by published evidence that framework changes alone can improve binding affinity. As confidence grew, we allowed mutations beyond the framework to see what changes outside of known strategies could be beneficial: first CDR H1 and H2, then the full sequence, and in one experiment insertions and deletions with mutation loads up to 50% of the VHH. 

Here’s what came back from the lab

Antigen

Starting Binder

Starting K_D

Format

Where We Mutated

# Mutations

Functional Rate

Best Affinity Improvement

Hit Rate

SARS-CoV-2 RBD

Ty1

4.8 nM

VHH

Framework only

2 — 8

69%

15×

36%

EGFR

Cetuximab-scFv

5 nM

scFv

Framework only

10

100%

>8×

100%

SARS-CoV-2 RBD

Ty1

4.8 nM

VHH

FW + CDR H1/H2

2 — 8

62%

7x

8%

SARS-CoV-2 RBD

Ty1

4.8 nM

VHH

Full + indels

2 — 52

98%

1.2x

18%

SARS-CoV-2 RBD

C102

18 nM

Fab

Full sequence

2 — 8

80%

2x

8%

HER2

Trastuzumab-scFv

3 nM

scFv

Full sequence

2 — 8

92%

None

0%

SARS-CoV-2 RBD

Ty2

400 nM

VHH

Full sequence

2 — 8

56% 

350×

50%

The biggest surprise came from the weakest starting point. Ty2, at 400 nM, was barely within assay range. But the best variant came back at 1.6 nM. A 333-fold improvement in a single round, from a lead many teams would have deprioritized. Half the designed variants preserved binding, and the distribution showed the mix of improvements and deteriorations you'd want to see in a productive diversification round.

Strong binders had room to move, too. Ty1 (4.8 nM) produced variants as tight as 320 pM — a 15-fold improvement —  even though the round was constrained to framework mutations. (In the Adaptyv competition that Cradle won, all 12 Cetuximab-scFv framework variants improved on the starting antibody — so much so that any one of them would have won.)

The sole campaign that missed on affinity turned out to be informative in a different way. Trastuzumab-scFv against HER2 didn't yield any affinity improvement, but 92% of variants still bound the target, and among them were variants with 30% higher expression and 4 °C higher melting temperature. If your campaign needs to move stability or expression alongside affinity, a round like this produces the diversity you need even if it doesn’t move the needle on what you (thought you) were looking for. With Cradle, it turns out you're not optimizing one property at a time, but mapping the tradeoffs between all of them. That clears the path to the full TPP in later rounds.

A practical note on wet-lab assays: The best Ty2 binders by affinity had low expression in the E. coli vector we initially used. In the end we needed to characterize them in a cell-free system (via a CRO). The vector you use determines which variants you can evaluate, and strong binders with poor expression in your system will show up as false negatives — worth factoring into an assay plan.

What this changes about planning a first round

The conventional logic for round 1 is well established: Map single positions systematically, understand which substitutions are tolerated, and build from there. SSM campaigns on comparable targets (including hu225 against EGFR, 42A1 against SARS-CoV-2, dig 26-10 against digoxin) have validated this approach. 

This strategy assumes that the first round is just for finding specific positions with beneficial mutations that can be combined in subsequent rounds, not for making gains in several properties at once. But this assumption is worth questioning. Across our campaigns, multi-mutant libraries delivered functional rates of 56% (weak binder, hardest starting point) to 100% (Adaptyv competition, framework-only), with a typical strong binder landing at 60-90%. You lose the clean per-position map — attributing an improvement to a specific substitution is harder when variants carry 2-8 changes — but you gain something SSM doesn't offer: richer multi-property tradeoff data across mutation combinations, and a first round that benefits subsequent rounds by feeding this richer data into the model.

Practically, this means that assaying around 190 variants at these rates will ensure the collection of at least 90 functional data points. That’s plenty to characterize the local landscape around your lead, and if you're running a Cradle workflow, plenty to train campaign-specific models for targeted optimization in subsequent rounds. The library design takes about 10 minutes of interaction and a few hours of compute. The experiment is whatever your lab already runs — the only thing that changes is what you can expect from it.

From first round to full campaign 

And after round 1? What happens next depends on how you're running your campaign. Some variants may be worth advancing directly. The 350-fold affinity improvement like the one we saw with Ty2 might change the program’s trajectory on its own, for example. 

But the broader value is in what the full dataset enables: When you've characterized 90+ variants across multiple properties, you have enough signal to move from exploratory diversification to targeted, multi-property optimization. We've published multi-round case studies on both the Ty1 VHH and the Trastuzumab-scFv campaigns, comparing outcomes to open-source and classical approaches.

We chose well-studied targets for these case studies so the results could be benchmarked and reproduced. But this isn't limited to model antigens. These campaigns run routinely on proprietary targets with less prior characterization, and though we’ve only been able to run a few experiments with it since it deployed in February, the latest version of our diversification pipeline has a functional rate of 83±3%. It’s worth noting that there seems to be accessible optimization room around most antibodies, especially in regions that traditional approaches haven't exhaustively explored. Framework mutations have historically been underutilized for affinity optimization because CDR-focused campaigns are standard practice. Full-sequence multi-mutant exploration opens a much larger space, and any tradeoffs (e.g. modestly higher risk of losing binding) are manageable when your functional rates are in the range reported here. 

The optimization room is there. The question is whether your first round is designed to gain from it.

Editor’s note

Further reading

We described the computational methodology behind these campaigns, including how evolutionary models are trained, how mutation strategies are selected, and how variant libraries are constructed, in the companion technical post.