How to Use ZymCTRL Online

Try ZymCTRL

Commercially Available Online Web Server

ZymCTRL: A New Approach for Controllable Enzyme Design

Scientists have developed ZymCTRL, a conditional language model that can generate novel, artificial enzyme sequences for a specific enzymatic class. This deep learning model is specifically trained on the BRENDA enzyme database to learn the sequence features that define different catalytic reactions. ZymCTRL generates enzymes that are structurally similar to natural ones but distant in sequence space, offering the potential for novel solutions to established problems.

How ZymCTRL Works

ZymCTRL is a transformer decoder model that is trained with an autoregressive objective on a dataset of over 36 million enzyme sequences from the BRENDA database.

  • Conditional Generation: The model generates sequences based on a user-defined control tag, which is the Enzyme Commission (EC) number of a specific catalytic reaction.

  • Transfer Learning: To address the issue of underrepresented enzyme families in the training data, ZymCTRL tokenizes EC numbers by subclasses. This allows the model to transfer knowledge from populated classes to less-populated ones, enabling it to generate high-confidence sequences even for rare enzyme types.

  • Orthogonal Validation: The functionality of ZymCTRL-generated sequences was validated using an orthogonal function prediction method called ProteInfer. This method confirmed that the generated enzymes' predicted functions matched their intended catalytic reactions.

What is Tamarind Bio?

Tamarind Bio is a pioneering no-code bioinformatics platform built to democratize access to powerful computational tools for life scientists and researchers. Recognizing that many cutting-edge machine learning models are often difficult to deploy and use, Tamarind provides an intuitive, web-based environment that completely abstracts away the complexities of high-performance computing, software dependencies, and command-line interfaces.

The platform is designed provide easy access to biologists, chemists, and other researchers who may not have a background in programming or cloud infrastructure but want to run experimental models with their data. Key features include a user-friendly graphical interface for setting up and launching experiments, a robust API for integration into existing research pipelines, and an automated system for managing and scaling computational resources. By handling the technical heavy lifting, Tamarind empowers researchers to concentrate on their scientific questions and accelerate the pace of discovery.

Accelerating Discovery with ZymCTRL on Tamarind Bio

Integrating ZymCTRL on a platform like Tamarind would empower researchers to accelerate enzyme design and protein engineering campaigns.

  • Custom-Tailored Enzyme Design: Researchers can use ZymCTRL to generate novel enzyme sequences for a specific catalytic reaction, opening up possibilities in molecular medicine and environmental sciences.

  • Exploration of Novel Sequence Space: The model's ability to generate sequences distant from natural ones, while still being functional, allows researchers to explore novel solutions for enzymatic catalysis.

  • Accessible and Scalable: ZymCTRL is a computationally intensive model with 738 million parameters. By running it on a scalable platform like Tamarind, researchers can bypass the need for specialized hardware and perform high-throughput generation and screening of potential enzyme candidates.

How to Use ZymCTRL on Tamarind Bio

To leverage ZymCTRL's power, a researcher could follow this streamlined workflow:

  1. Access the Platform: Begin by logging in to the tamarind.bio website.

  2. Select ZymCTRL: From the list of available computational models, choose the ZymCTRL tool.

  3. Define the Catalytic Reaction: Provide the specific Enzyme Commission (EC) number for the desired chemical reaction.

  4. Generate Enzyme Sequences: The platform would use ZymCTRL to generate a library of novel enzyme sequences conditioned on the provided EC number.

  5. Validate and Refine: The generated sequences can be evaluated for predicted properties, such as globularity and structural confidence, using other tools on the platform like ESMfold or OmegaFold.

  6. Select Candidates: The best-performing candidates can then be selected for experimental validation, enabling a rapid transition from in silico design to in vitro testing.

Source