How to Use ProFam Online
Try ProFam
Commercially Available Online Web Server
ProFAM: The Open-Source Foundation Model for Protein Families
ProFAM (Protein Family Model) is a pioneering autoregressive foundation model designed to reason explicitly over evolutionary contexts. Developed by a collaborative team (including researchers from UCL and TUM), ProFAM-1 shifts the protein modeling paradigm from individual sequences to entire protein families. By training on concatenated sets of unaligned homologs, ProFAM learns to capture the deep conservation and covariance patterns that define biological function.
As an open-source alternative to proprietary family-based models, ProFAM provides a robust "evolution-aware" backbone for zero-shot fitness prediction and homology-guided protein design.
Key Innovations: Autoregressive Evolutionary Intelligence
ProFAM-1 is a 251-million-parameter Transformer that moves beyond single-sequence context to decode the "language of families."
Protein Family Language Modeling (pfLM): Unlike standard pLMs (like ESM), ProFAM is trained to predict the next token across concatenated, unaligned sequences from the same family, allowing it to "see" evolutionary trends during the forward pass.
The ProFAM Atlas: Built on a massive, curated dataset derived from the AlphaFold Database (AFDB), TED FunFams, and UniRef90, encompassing millions of protein families.
Alignment-Free Context: The model leverages unaligned sequences, eliminating the need for computationally expensive and often error-prone Multiple Sequence Alignments (MSAs) during the core inference step.
Diverse Prompting: ProFAM can be "prompted" with known homologs to steer the generation of novel variants that preserve the specific structural and functional constraints of a target family.
Zero-Shot Mastery: Capable of scoring mutations and indels with high accuracy by comparing variant log-likelihoods against family-level evolutionary priors.
Performance Benchmarks: Leading the Way in Fitness & Design
ProFAM-1 consistently matches or exceeds the performance of state-of-the-art sequence-only models, particularly in predicting the effects of insertions and deletions (indels).
Task | Metric | ProFAM-1 Result | Comparison / Finding |
Fitness Prediction | Spearman Correlation (Subs) | 0.47 | Competitive with SOTA sequence-only methods |
Indel Prediction | Spearman Correlation (Indels) | 0.53 | Top-tier performance on ProteinGym |
Sequence Diversity | Self-Similarity Index | High | Recapitulates natural family distribution patterns |
Structural Fidelity | Predicted TM-score | > 0.70 | Generated sequences maintain fold integrity |
Scientific Breakthroughs in Functional Annotation
Homology-Guided Sequence Generation
ProFAM allows researchers to generate novel sequences that are "anchored" by evolutionary history. By providing a few representative sequences of a family as a prompt, the model generates diverse new members that maintain residue conservation and covariance, effectively exploring the "functional dark matter" within known protein folds.
High-Resolution Variant Scoring
By calculating the log-likelihood of a sequence given its family context, ProFAM identifies deleterious mutations that might be missed by models looking at single sequences in isolation. This makes it an invaluable tool for clinical variant interpretation and directed evolution.
Interpretable Evolutionary Patterns
ProFAM's architecture implicitly captures the "grammar" of protein families. It can identify which positions in a protein are evolutionarily coupled, providing structural insights—such as contact points and active sites—directly from sequence data.
ProFAM on Tamarind Bio: Scaling Evolutionary Insights
Tamarind Bio provides a seamless interface to harness ProFAM’s family-level reasoning without the overhead of complex PyTorch Lightning environments or GPU cluster management.
No-Code Inference: Upload your protein family FASTA and perform zero-shot fitness scoring or sequence generation through a simple web dashboard.
Managed Training & Fine-Tuning: Use Tamarind’s scalable A100/H100 infrastructure to fine-tune ProFAM on your proprietary family data or specific therapeutic targets.
How to Use ProFAM on Tamarind Bio
Access the Platform: Log in to tamarind.bio and navigate to the ProFAM tool.
Upload Your Family: Provide an unaligned FASTA file containing sequences from your protein family of interest.
Choose Your Workflow:
Variant Scoring: Upload a list of mutations (substitutions or indels) to receive zero-shot fitness scores.
Guided Generation: Use your uploaded sequences as a "prompt" to generate thousands of novel, diverse family members.
Set Sampling Parameters: Adjust temperature and top-p sampling to control the balance between sequence "naturalness" and novelty.
Run & Analyze: Execute the model to produce log-likelihood reports or new sequence libraries.
Export Results: Download high-confidence designs or fitness maps for wet-lab validation and downstream structural modeling.