BoltzGen: Validated de novo binder design for diverse targets

October 26th, 2025

BOLTZGEN: A Universal All-Atom Generative Model for Advanced Binder Design

BoltzGen is an all-atom generative model designed for creating novel proteins and peptides across all modalities to bind a diverse range of biomolecular targets. It is characterized by its unification of binder design and structure prediction into a single model. This capability allows BoltzGen to build strong structural reasoning about target-binder interactions directly into its generative process. BoltzGen is released under the MIT License, including weights, training code, inference code, and all designs, putting state-of-the-art biomolecular design capabilities directly into the hands of any researcher. For full details, see the manuscript or the repository.

Technical Core: Unified Design and Structure Prediction

At its core, BoltzGen employs an all-atom diffusion model that operates on a continuous space, enabling efficient, joint training for both structure prediction and design tasks. The comprehensive design pipeline is built on top of the Boltz model family, utilizing three key components:

BoltzGen (the core diffusion model) for initial structure generation.
BoltzIF for inverse folding (sequence prediction).
Boltz-2 for structure validation and affinity prediction (co-folding).

Geometric Encoding of Residue Type

A key technical innovation is the purely geometry-based representation of designed amino-acid types, which sidesteps the challenge of mixing discrete and continuous representations.

Representation: Designed residues are represented by a fixed-size set of 14 atoms. The first four atoms are fixed as the backbone N, C-α, C, and O atoms.
Encoding: The model determines the residue identity by learning to superpose a subset of the remaining virtual atoms onto designated backbone atoms. For example, threonine is encoded by placing three virtual atoms on the backbone nitrogen and four on the oxygen; the remaining atoms are interpreted as the side chain.

Flexible Design Specification Language

BoltzGen's generation process is controlled via a rich design specification language. This allows users to impose various requirements to steer the design:

Covalent Bonds: Specify bonds between individual atoms (e.g., for cyclic peptides, disulfide-stapled peptides, or helicons).
Structure Conditioning: Fix parts of the structure via pairwise distances (e.g., fixing the nanobody framework structure while leaving the CDR loops unconstrained).
Binding Site: Label residues on the target as binding or not-binding to guide the model toward or away from specific regions.
Secondary Structure: Specify designed residues to be part of α-helices, β-sheets, or coils.

Design Pipeline and Filtering

The core generative model is augmented by a multi-stage computational pipeline to filter and rank candidates:

Inverse Folding: Optionally re-sequences the designed binders to create sequences more likely to refold into the desired structure and improve solubility.
Refolding: Predicts the structure of the design-target complex using the integrated structure prediction capabilities (Boltz-2) to assess how similar the refolded structure is to the designed structure (a proxy for likelihood of binding). An additional "designfolding" step assesses stability of the binder alone.
Ranking: Designs are ranked using a worst-case weighted metric rank (Algorithm 2) across metrics like Predicted TM-score (pTMs), Predicted Aligned Error (PAE), and interaction-type scores (e.g., number of hydrogen bonds, salt bridges, buried surface area).
Quality-Diversity Selection: A greedy algorithm selects a final set of candidates that maximizes a combination of the quality score and structural/sequence diversity.

Summary of Experimental Wetlab Results

BoltzGen was experimentally validated across eight diverse wetlab design campaigns with functional and affinity readouts, spanning 26 targets.

Nanobodies and Proteins against 9 Novel Targets

This challenging experiment targeted proteins highly dissimilar (< 30% sequence identity) to any protein in the Protein Data Bank (PDB) with a known bound structure.

Nanobody Designs: Testing 15 nanobody designs per target yielded nanomolar (nM) binders for 6 out of 9 targets (66% success rate). Best Kd values included 7.8 nM, 6.1 nM (PMVK), and 8.8 nM (RFK).
Protein Designs: Testing 15 protein binders per target also resulted in nM binders for 6 out of 9 targets (66% success rate). Best Kd values included 9.8 nM (MZB1), 10 nM (PMVK), and 1.9 nM (AMBP).

Proteins against Bioactive Peptides

Designs were generated against three antimicrobial/cytotoxic peptides with diverse structures: protegrin (beta hairpin), melittin (helix), and indolicidin (polyproline II or amphipathic conformation).

Results: Testing only six binders per target achieved nM affinity binders for two targets (indolicidin and melittin) and µM binders for the third (protegrin). The best affinities were 180 nM for Indolicidin and 410 nM for Melittin.

Peptides to Bind Disordered Regions (NPM1)

BoltzGen was tasked with modeling how the disordered region of NPM1-c mutant (a driver of Acute Myeloid Leukemia) would fold while simultaneously designing a binding peptide.

Results: One out of five tested peptide designs was observed to localize to the nucleoli in live human cells, co-localizing with endogenous NPM1, representing the first evidence of a de-novo designed protein binding a disordered protein in live cells.

Peptides against RagC GTPase and RagA:RagC Dimer

RagC Peptides: Testing 29 short, linear peptides against RagC GTPase yielded 7 binders, with the highest affinity at 3.5 µM.
RagA:RagC Dimer Peptides: Testing 24 disulfide-bonded cyclic peptides against the RagA:RagC dimer yielded 14 binders, with the highest affinity at 80 µM.

Nanobodies against Recently Deposited Viral Targets

Designs were tested against Penguinpox cGAMP PDE and Filamentous Hemagglutinin (FhaB).

Results: Yeast surface display showed a binding signal for 1 out of 7 designs against Penguinpox and 7 out of 7 designs against Hemagglutinin. The binding was characterized as at best 2 µM affinity.

Proteins against Small Molecules

Designs targeted the small molecules rucaparib and a rhodamine derivative.

Results: For rucaparib, five out of six tested designs showed binding with affinities between 43.0 µM and 151.5 µM. The design process utilized chemical fragmentation to prioritize hydrogen bonds with the essential carboxamide group.

Antimicrobial Peptides (AMPs) against GyrA

Peptides were designed to inhibit the GyrA to GyrA interaction, a target for new antibiotics.

Results: Testing 1808 designs showed 19.5% (352) substantially inhibited E. coli growth (by > 4). A subset of 54 designs (3.0% of total) were confirmed to be highly specific to the designed binding interface, losing their activity when key interface residues were mutated to alanine.

Nanobodies and Proteins against 5 Benchmark Targets

Against benchmark targets with numerous known binders in training data (PD-L1, TNFα, PDGFR, IL-7Rα, InsulinR), both modalities achieved a high success rate:

Nanobodies: Achieved nM-affinity binders for 4 out of 5 targets (80% success rate). Best Kd values included 14.4 nM (IL-7Rα) and 13 nM (PD-L1).
Proteins: Achieved binders for 4 out of 5 targets (80% success rate), including a pM hit on PDGFR. Best Kd values included 1.9 nM (IL-7Rα) and 0.81 nM (PDGFR).

Comparative and Computational Results

Structure Prediction Performance

BoltzGen's folding performance on a diverse test set matches its predecessor, Boltz-2. This suggests that the generative model's dual capabilities—design and structure prediction—are robust, supporting the hypothesis that strong structural reasoning is essential for binder design.

Target Conditioning and Diversity

In a computational comparison designed to quantify a model's dependence on the target molecule (versus generating target-independent structures), BoltzGen demonstrated higher diversity of successful designs compared to RFdiffusion and RFdiffusion AA in several modalities:

Modality	BoltzGen Vendi Score	RFdiffusion Vendi Score	RFdiffusion AA Vendi Score
Length 150 Protein-Protein	18.0	14.6	-
Length 15 Peptide-Protein	31.2	11.0	-
Length 150 Protein-Small Molecule	22.5	-	19.4

A higher Vendi score, which uses TM-score as the similarity kernel, indicates that BoltzGen is more successfully generating diverse structures that are specifically conditioned on the input target.

A Note on Practical Considerations and Limitations

Computational Cost: A production-scale binder-design campaign may require generating between 5,000 and 60,000 designs. BoltzGen takes roughly 30–60 seconds per design for systems with a few hundred amino acids, meaning a production run is a non-trivial investment of computational resources (typically <100 designs per GPU-hour).

Memorization Issue: A specific limitation identified during design campaigns is a memorization issue for binders of length 73-76 amino acids. In this length range, the model's generation diversity collapses and it nearly exclusively samples sequences highly similar to ubiquitin. This is hypothesized to stem from ubiquitin's overrepresentation in the PDB training data (>1000 entries).

How BoltzGen Works

BoltzGen supports four default protocols:

Protocol (design-target)	Appropriate for	Major config differences
protein-anything	Design proteins to bind proteins or peptides	Includes `design folding` step.
peptide-anything	Design peptides (including helicons, cyclic peptides) to bind proteins	No Cys are generated in inverse folding. No `design folding` step. Don't compute largest hydrophobic patch.
protein-small_molecule	Design proteins to bind small molecules	Includes binding affinity prediction. Includes `design folding` step.
nanobody-anything	Design nanobodies (single-domain antibodies)	No Cys are generated in inverse folding. No `design folding` step. Don't compute largest hydrophobic patch.

Running BoltzGen on Tamarind Bio

Access the Platform: Begin by logging in to the tamarind.bio website.

Select BoltzGen: From the list of available computational models, choose the BoltzGen tool.
Input Target Information: Provide the sequence and/or structure of the target molecule you wish to bind via UI form or YAML file.
Tamarind offers 5 different example design experiments. For this example, we are selecting the Protein Binding Small Molecule. This will select the Protein-Small Molecule protocol, prepopulate the entities, and the constraints. You are able to Add Protein Sequences, Add Ligand, Add Structure Cif, Add Binder, Add Bond Constraint, or Add Length Conastraint before submitting.

Define Design Specifications: Use the platform's interface to define the required design constraints (e.g., specific covalent bonds, structural motifs, or binding requirements) using the flexible specification language.
Generate All-Atom Designs: The platform runs BoltzGen to generate a pool of novel all-atom protein and peptide sequences that meet the specified structural and binding criteria.
Analyze Functionality: Analyze the generated sequences for optimal design, relying on the model's inherent structural reasoning to ensure favorable target-binder interactions.

Feel free to check out the manuscript for yourself here.

BoltzGen: Validated de novo binder design for diverse targets