Try ProteinMPNN
Commercially Available
The field of protein engineering has long relied on traditional, physically-based methods to design novel proteins. These methods often struggle to achieve high success rates and can be computationally expensive. A new deep learning model, ProteinMPNN, offers a significant advancement in this area by efficiently designing amino acid sequences that are predicted to fold into a given 3D protein structure. The method, developed by the Baker group at the University of Washington, is a pure structure-based model that does not require functional information. It has shown exceptional performance in both computational and experimental tests.
How ProteinMPNN Works
ProteinMPNN is a deep learning model for inverse protein folding, meaning it solves the reverse problem of protein structure prediction. Instead of predicting the structure from a sequence (as AlphaFold does), ProteinMPNN takes a known or desired 3D backbone structure as input and generates a compatible amino acid sequence.
Graph-based approach: The model treats the protein backbone as a graph, where each amino acid is a node and the relationships between them are the edges. It then uses a message-passing neural network (MPNN) to reason about the atomic coordinates and residue identities.
Sequence Generation: It uses a masked autoregressive approach, where it iteratively predicts the identity of each residue while considering the identities of its neighbors. This process can be directed to fix certain parts of the sequence while designing others, which is critical for retaining function.
Performance: ProteinMPNN has demonstrated impressive results, with a 52.4% sequence recovery rate on native protein backbones, significantly outperforming traditional methods like Rosetta, which achieved 32.9%. The sequences it designs are robust and can successfully fold into their intended structures.
The Power of ProteinMPNN on Tamarind
While ProteinMPNN is a powerful tool, it requires specific technical expertise and computational resources to run effectively. For many researchers, this can be a major hurdle. Tamarind is a no-code bioinformatics platform that makes advanced computational tools like ProteinMPNN accessible to all life scientists. By handling the underlying computational infrastructure and complexity, Tamarind empowers researchers to accelerate their work without needing to write code or manage a high-performance computing environment.
Sequence Design at Scale: Tamarind enables researchers to use ProteinMPNN to design thousands of sequences simultaneously, which is essential for screening and optimizing protein variants.
Integrated Workflows: The platform allows users to combine ProteinMPNN with other state-of-the-art models, such as using RFdiffusion to generate novel protein backbones and then using ProteinMPNN to design the corresponding sequences. Users can also predict the structure of their designed proteins with AlphaFold to validate the design.
Specialized Models: Tamarind.bio offers various fine-tuned versions of ProteinMPNN for specific applications, including:
SolubleMPNN: Optimized for designing soluble proteins.
AbMPNN: Trained specifically for antibody design.
HyperMPNN: Tailored for designing thermostable proteins.
LigandMPNN: Accounts for ligand atoms during the design process.
Accelerating Discovery
The combination of ProteinMPNN's powerful design capabilities and Tamarind's user-friendly interface creates a complete solution for protein engineering. Researchers can easily move from a protein structure to a functional, foldable amino acid sequence. This streamlined workflow significantly reduces the time and resources needed for discovery, making it possible to create novel proteins for therapeutic, industrial, and biotechnological applications.