How to Use Profluent E1 Online
Try Profluent E1
Commercially Available Online Web Server
Profluent-E1: Retrieval-Augmented Protein Language Models
Profluent-E1, a family of retrieval-augmented protein language models (pLMs) designed to address the key challenges of conventional pLMs. Conventional pLMs suffer from parameter inefficiencies, baked-in phylogenetic biases, and functional performance issues. Profluent-E1 overcomes this by explicitly conditioning on homologous sequences, achieving state-of-the-art performance across zero-shot fitness and unsupervised contact-map prediction benchmarks.
How Profluent-E1 Works
Profluent-E1 is a foundational model built to integrate evolutionary context directly into its sequence reasoning:
Retrieval Augmentation: The core innovation is integrating retrieved evolutionary context (homologous sequences) by explicitly conditioning the model during inference.
Contextual Attention: This is achieved through a mechanism called block-causal multi-sequence attention, which allows the model to capture both general and family-specific constraints directly from the retrieved homologs.
Zero-Shot Performance: The model is trained to capture these constraints without requiring fine-tuning on specific family data, giving it superior predictive power in zero-shot tasks.
Scale: E1 models were trained on four trillion tokens from the Profluent Protein Atlas, with model sizes ranging from 150M to 600M parameters, showing consistent scaling of performance with model size.
What is Tamarind Bio?
Tamarind Bio is a pioneering no-code bioinformatics platform built to democratize access to powerful computational tools for life scientists and researchers. Recognizing that many cutting-edge machine learning models are often difficult to deploy and use, Tamarind provides an intuitive, web-based environment that completely abstracts away the complexities of high-performance computing, software dependencies, and command-line interfaces.
The platform is designed to provide easy access to biologists, chemists, and other researchers who may not have a background in programming or cloud infrastructure but want to run experimental models with their data. Key features include a user-friendly graphical interface for setting up and launching experiments, a robust API for integration into existing research pipelines, and an automated system for managing and scaling computational resources. By handling the technical heavy lifting, Tamarind empowers researchers to concentrate on their scientific questions and accelerate the pace of discovery.
Accelerating Discovery with Profluent-E1 on Tamarind Bio
Using Profluent-E1 on a platform like Tamarind Bio would accelerate protein engineering by providing an efficient, highly accurate, and flexible system for sequence analysis and design.
State-of-the-Art Variant Ranking: Researchers can leverage E1's state-of-the-art zero-shot fitness prediction capability to rapidly score and rank thousands of protein variants, filtering out non-functional sequences without extensive fine-tuning.
Overcoming Phylogenetic Bias: The model's retrieval-augmented approach fundamentally overcomes the common baked-in phylogenetic biases of previous models, providing more objective and functionally relevant sequence proposals.
Flexible Design Workflows: The platform would allow researchers to switch flexibly between single-sequence inference mode (for speed) and retrieval-augmented inference mode (for maximum accuracy) to optimize for their specific computational budget and confidence needs.
How to Use Profluent-E1 on Tamarind Bio
To leverage Profluent-E1's power, a researcher could follow this streamlined workflow on Tamarind Bio:
Access our Platform: Begin by logging in to the tamarind.bio website.
Select Profluent E1: From the list of tiles, either search or select Profluent E1.
Input a Protein Sequence: Provide the amino acid sequence you wish to analyze.
Select Inference Mode: Choose either:
Single-Sequence Mode: For quick fitness prediction and variant ranking.
Retrieval-Augmented Mode: For the highest zero-shot accuracy, enabling the model to retrieve and condition on homologous sequences for deeper context.
Generate Predictions: The model provides scores for fitness prediction, variant ranking, or embeddings for structural tasks.
Prioritize Leads: Use the state-of-the-art scores to prioritize sequences with the highest predicted function and viability.