How to Use ProFam Online

Try ProFam

Commercially Available Online Web Server

ProFAM: The Open-Source Foundation Model for Protein Families

ProFAM (Protein Family Model) is a pioneering autoregressive foundation model designed to reason explicitly over evolutionary contexts. Developed by a collaborative team (including researchers from UCL and TUM), ProFAM-1 shifts the protein modeling paradigm from individual sequences to entire protein families. By training on concatenated sets of unaligned homologs, ProFAM learns to capture the deep conservation and covariance patterns that define biological function.

As an open-source alternative to proprietary family-based models, ProFAM provides a robust "evolution-aware" backbone for zero-shot fitness prediction and homology-guided protein design.

Key Innovations: Autoregressive Evolutionary Intelligence

ProFAM-1 is a 251-million-parameter Transformer that moves beyond single-sequence context to decode the "language of families."

  • Protein Family Language Modeling (pfLM): Unlike standard pLMs (like ESM), ProFAM is trained to predict the next token across concatenated, unaligned sequences from the same family, allowing it to "see" evolutionary trends during the forward pass.

  • The ProFAM Atlas: Built on a massive, curated dataset derived from the AlphaFold Database (AFDB), TED FunFams, and UniRef90, encompassing millions of protein families.

  • Alignment-Free Context: The model leverages unaligned sequences, eliminating the need for computationally expensive and often error-prone Multiple Sequence Alignments (MSAs) during the core inference step.

  • Diverse Prompting: ProFAM can be "prompted" with known homologs to steer the generation of novel variants that preserve the specific structural and functional constraints of a target family.

  • Zero-Shot Mastery: Capable of scoring mutations and indels with high accuracy by comparing variant log-likelihoods against family-level evolutionary priors.

Performance Benchmarks: Leading the Way in Fitness & Design

ProFAM-1 consistently matches or exceeds the performance of state-of-the-art sequence-only models, particularly in predicting the effects of insertions and deletions (indels).

Task

Metric

ProFAM-1 Result

Comparison / Finding

Fitness Prediction

Spearman Correlation (Subs)

0.47

Competitive with SOTA sequence-only methods

Indel Prediction

Spearman Correlation (Indels)

0.53

Top-tier performance on ProteinGym

Sequence Diversity

Self-Similarity Index

High

Recapitulates natural family distribution patterns

Structural Fidelity

Predicted TM-score

> 0.70

Generated sequences maintain fold integrity

Scientific Breakthroughs in Functional Annotation

Homology-Guided Sequence Generation

ProFAM allows researchers to generate novel sequences that are "anchored" by evolutionary history. By providing a few representative sequences of a family as a prompt, the model generates diverse new members that maintain residue conservation and covariance, effectively exploring the "functional dark matter" within known protein folds.

High-Resolution Variant Scoring

By calculating the log-likelihood of a sequence given its family context, ProFAM identifies deleterious mutations that might be missed by models looking at single sequences in isolation. This makes it an invaluable tool for clinical variant interpretation and directed evolution.

Interpretable Evolutionary Patterns

ProFAM's architecture implicitly captures the "grammar" of protein families. It can identify which positions in a protein are evolutionarily coupled, providing structural insights—such as contact points and active sites—directly from sequence data.

ProFAM on Tamarind Bio: Scaling Evolutionary Insights

Tamarind Bio provides a seamless interface to harness ProFAM’s family-level reasoning without the overhead of complex PyTorch Lightning environments or GPU cluster management.

  • No-Code Inference: Upload your protein family FASTA and perform zero-shot fitness scoring or sequence generation through a simple web dashboard.

  • Managed Training & Fine-Tuning: Use Tamarind’s scalable A100/H100 infrastructure to fine-tune ProFAM on your proprietary family data or specific therapeutic targets.

How to Use ProFAM on Tamarind Bio

  1. Access the Platform: Log in to tamarind.bio and navigate to the ProFAM tool.

  2. Upload Your Family: Provide an unaligned FASTA file containing sequences from your protein family of interest.

  3. Choose Your Workflow:

    • Variant Scoring: Upload a list of mutations (substitutions or indels) to receive zero-shot fitness scores.

    • Guided Generation: Use your uploaded sequences as a "prompt" to generate thousands of novel, diverse family members.

  4. Set Sampling Parameters: Adjust temperature and top-p sampling to control the balance between sequence "naturalness" and novelty.

  5. Run & Analyze: Execute the model to produce log-likelihood reports or new sequence libraries.

  6. Export Results: Download high-confidence designs or fitness maps for wet-lab validation and downstream structural modeling.

Source

Supporting 10,000+ scientists around the world,

from leading biotechs, and global biopharma