Use EnzyGen2 Online

Commercially Available EnzyGen2 No-Code Web Server

Try EnzyGen2

EnzyGen2: Ligand-Guided Functional Enzyme Co-Design

Co-design novel protein sequences and 3D backbone structures optimized for small-molecule ligand interactions.

EnzyGen2 is a state-of-the-art, 730-million-parameter protein foundation model designed for the simultaneous de novo co-design of enzyme sequences and 3D structures under ligand-guided functional targeting. By shifting away from rigid, sequential design pipelines, EnzyGen2 models the reciprocal dependencies between sequence and structure, generating novel, stable biocatalysts at unprecedented speeds.

Est. run time: ~1–2 minutes (400x faster than prior diffusion pipelines).
Input Requirements: Small-molecule ligand, functionally important residues (or template scaffold), and an evolutionary taxonomy identifier.
Outputs: Complete de novo protein sequences and self-consistent 3D backbone structures.

How EnzyGen2 Works

Traditional enzyme design relies on a fragmented, two-stage paradigm: first generating structural backbones, then designing sequences to stabilize that target fold. This workflow often fails to capture the intricate, reciprocal relationships required for true enzymatic function and substrate binding.

EnzyGen2 introduces an interleaved neural network architecture that solves these limitations through simultaneous sequence-structure co-design under explicit ligand-binding constraints.

1. Interleaved Architecture

The architecture seamlessly combines Transformer layers to track long-range global sequence dependencies with equivariant graph neural network (kNN-EGNN) layers to model three-dimensional spatial structural geometries.

2. Multi-Task Learning Objective

EnzyGen2 is optimized across a weighted multi-task framework spanning three core loss components:

Masked Sequence Prediction (Lseq): Ensures evolutionary plausibility.
Masked Backbone Structure Reconstruction (Lstr): Governs structural geometries.
Protein-Ligand Interaction Prediction (Lbind): Enforces ligand-binding affinity and specificity.

3. Evolutionary and Functional Steering

Rather than navigating a massive, random sequence space, EnzyGen2 leverages heterogeneous inputs to steer generation into evolutionarily sound regions:

NCBI Taxonomic Identifiers: Restricts the combinatorial search space to organism-specific, evolutionarily plausible boundaries.
Functionally Important Residues: Utilizes automatically extracted or user-specified patterns from multiple sequence alignments (MSAs) to target intended catalytic folds.

4. Massive Data Scale

A massive barrier to ligand-aware AI has been a lack of data (historically only ~20,000 complexes). To solve this, the authors curated an expanded dataset of 720,993 unique protein-ligand pairs merging experimentally validated crystal structures from the PDB and predicted folds from Swiss-Prot with documented ligands from UniProtKB.

Performance and Wet-Lab Validation

In rigorous in silico benchmarks and experimental wet-lab assays, EnzyGen2 delivers superior structural fidelity, high fold stability, and catalytic parameters that rival or exceed natural enzymes.

State-of-the-Art In Silico Benchmarks

Across the ten most frequent enzyme families, EnzyGen2 consistently outperforms established baselines including Inpainting, RFdiffusion/ProteinMPNN, and the ligand-aware RFdiffusion2/3 + LigandMPNN pipelines:

Structural Fidelity: Achieved significantly higher structural compliance, generating a vastly superior proportion of designs with an RMSD < 2Å to the target reference.
High Stability: Reached a mean AlphaFold2 confidence score (pLDDT) of 85.10, comfortably surpassing the standard high-stability folding threshold of 80.
Unparalleled Throughput: Generates viable enzyme candidates 400x faster than prior diffusion models, accelerating production to multiple samples per second.

Wet-Lab and Biochemical Validation

Unlike zero-shot computational predictions, EnzyGen2 has been extensively validated in vitro and in vivo across three diverse enzyme families, successfully traversing sequence configurations as low as 51.6% identity relative to natural counterparts:

Chloramphenicol Acetyltransferase (CAT): De novo designed variants enabled robust E. coli growth in selective media. Notably, candidate CAT-17 safely protected cells up to 500 µg/mL of chloramphenicol—a concentration where natural wild-type enzymes failed to survive.
Aminoglycoside Adenylyltransferase (AadA): De novo variant AadA-2 conferred antibiotic resistance allowing robust cell survival in media supplemented with up to 2400 µg/mL of spectinomycin.
Thiopurine S-Methyltransferase (TPMT): Designed to serve as biocatalytic S-adenosylmethionine (SAM)-regeneration enzymes via halomethyltransferase pathways. The generated enzymes TPMT-2, TPMT-9, and TPMT-10 displayed dramatically higher initial reaction velocities than the widely utilized natural baseline enzyme ac/HMT.

What is Tamarind Bio?

Tamarind Bio is a pioneering no-code bioinformatics platform built to democratize access to powerful computational tools for life scientists and researchers. Recognizing that cutting-edge machine learning models in structural biology are often difficult to deploy, Tamarind provides an intuitive, web-based environment that completely abstracts away high-performance computing management, complex software dependencies, and command-line interfaces.

By handling the technical heavy lifting, GPU orchestration, and data parallelization, Tamarind Bio empowers biologists, chemists, and pharmaceutical researchers to immediately run advanced generative pipelines from any standard browser.

How to Use EnzyGen2 on Tamarind Bio

Tamarind Bio provides a professional, streamlined interface to use EnzyGen2 without any coding required. To engineer functional de novo enzymes, follow this accessible workflow:

Access the Tool: Log in to the Tamarind Bio platform and select EnzyGen2 from the De Novo Design toolkit.
Provide the Functional Input Scaffold: Upload a structure (PDB file) or define the 3D spatial coordinates and residue identity types of your target enzyme's active site / functionally important residues.
Specify the Target Ligand: Input the chemical features or structural format of the binding small-molecule substrate or cofactor.
Select a Taxonomic Identifier: Input your desired NCBI taxonomic ID (e.g., 562 for E. coli) to constrain the generation to organism-compatible sequence spaces.
Adjust Generation Parameters: Choose your structural sampling settings, such as the cumulative probability threshold p for Nucleus Sampling (e.g., p = 0.4 or 0.6) to balance structural diversity and design fidelity.
Submit and Evaluate: Click Submit Job. Within minutes, download your newly generated, co-designed protein sequence along with its corresponding 3D backbone structure file, ready for downstream wet-lab synthesis or structural ranking tools like AlphaFold.

Source

Supporting 10,000+ scientists around the world,

from leading biotechs, and global biopharma

Get started

Book a demo