How to Use NetSolP Online

Try NetSoIP

Commercially Available Online Web Server

NetSolP: Predicting Protein Solubility with Language Models

Scientists have developed NetSolP, a deep learning predictor that achieves state-of-the-art performance in predicting the solubility and usability of proteins expressed in Escherichia coli directly from their amino acid sequence. The model is built on the transformer architecture of protein language models (pLMs) and has been shown to improve extrapolation across datasets by using strict sequence-identity partitioning to curate existing data and minimize bias.

How NetSolP Works

NetSolP leverages the power of pLMs to predict a protein's solubility from its sequence alone, bypassing the need for computationally expensive multiple sequence alignments (MSAs).

  • Protein Language Model (pLM) Embeddings: The model uses contextual embeddings from pLMs like ESM-1b and ProtT5, which are trained on massive corpuses of protein sequences. These embeddings capture a wealth of information about a protein's biophysical characteristics and structure.

  • Data Curation: The model's success is also attributed to a rigorous data curation process that removes biases, such as those related to His-tags, which were found to be common in previous datasets. This ensures the model learns generalizable features rather than dataset-specific artifacts.

  • Ensemble and Distillation: The final predictor is an ensemble of fine-tuned ESM-1b models. A distilled version, NetSolP-D, is also provided, which preserves most of the performance but runs five times faster, making it highly efficient for large-scale predictions.

What is Tamarind Bio?

Tamarind Bio is a pioneering no-code bioinformatics platform built to democratize access to powerful computational tools for life scientists and researchers. Recognizing that many cutting-edge machine learning models are often difficult to deploy and use, Tamarind provides an intuitive, web-based environment that completely abstracts away the complexities of high-performance computing, software dependencies, and command-line interfaces.

The platform is designed provide easy access to biologists, chemists, and other researchers who may not have a background in programming or cloud infrastructure but want to run experimental models with their data. Key features include a user-friendly graphical interface for setting up and launching experiments, a robust API for integration into existing research pipelines, and an automated system for managing and scaling computational resources. By handling the technical heavy lifting, Tamarind empowers researchers to concentrate on their scientific questions and accelerate the pace of discovery.

Accelerating Discovery with NetSolP on Tamarind Bio

Using NetSolP on a platform like Tamarind. would accelerate protein engineering and biomanufacturing by providing a fast and accurate way to prioritize protein candidates.

  • High-Throughput Screening: Researchers could use the platform to rapidly screen thousands of protein candidates from a generative model or a large library to identify those with the highest probability of being soluble and expressed successfully in E. coli.

  • Improved Efficiency: By predicting solubility directly from the sequence, NetSolP helps to reduce the cost and time of wet-lab experiments, allowing researchers to focus their efforts on the most promising candidates.

  • Accessible Workflow: NetSolP is available as a web server and open-source code. Integrating this into a no-code platform would make advanced protein solubility prediction accessible to a broader community of researchers, democratizing access to cutting-edge tools.

How to Use NetSolP on Tamarind Bio


To leverage NetSolP's power, a researcher could follow this streamlined workflow on Tamarind:

  1. Access the Platform: Begin by logging in to the tamarind.bio website.

  2. Select NetSolP: From the list of available computational models, choose the NetSolP tool.

  3. Input a Protein Sequence: Provide the amino acid sequence of the protein you want to analyze.

  4. Run Prediction: The platform would run the NetSolP model to predict a solubility score for your protein.

  5. Analyze and Prioritize: The model provides a score that can be used to prioritize protein candidates. You can then select the proteins with the highest scores for experimental validation, increasing your chances of success.

Source