This project aims to develop a unified framework for functional protein design across diverse protein families. Specifically, we propose a generative model to jointly design protein sequence and backbone structure based on the automatically detected functional motifs. The model is learned by a joint training objective, including sequence prediction loss, structure prediction loss and protein-ligand binding prediction loss. We collect all available protein sequences from Uniprot, with the corresponding structures for partial data. Each protein is annotated with a NCBI category to indicate their specific functions. We will evaluate the designed proteins for some categories, especially enzymes. Students who are professional in chemistry/biology are welcome to join our project to help us analyze the designed proteins and select better candidates for subsequent wet-lab validation.

Lei Li - Language Technologies Institute

Zoey Song - Language Technologies Institute