Molecular latent space simulators
Chemical Science
2020
Proteins make living things work. Our mission is to make novel proteins that can solve critical problems. To achieve this, we turned to evolution for guidance and found a simple set of principles that explain how proteins work. This transformed what we know and can now do with proteins. The rules we've distilled allow us to analyze what Nature has built, generate what Nature has built, and then extrapolate from what has been built to solve challenges Nature has never faced. Because of this, Evozyne can make adaptive, high-performance proteins that can solve long standing challenges in therapeutics and sustainability.
Uses computational chemistry to make changes to the protein structure to guide new protein design.
Does not always require high-throughput assays and effective at producing stable proteins.
Requires an understanding of protein structures and how they fold—a challenge for many protein families.
Is computationally intensive.
Induces steps of random changes to the protein to steadily improve function.
Effective at designing gradual improvement of functions starting from specific sequences.
Very local exploration of sequence space.
Challenging to optimize over multiple parameters simultaneously.
Uncovers how Nature designs proteins to guide a rational search of a vast design space to build novel proteins.
Explores the impact of changes throughout a protein to unlock the full potential of the structure.
Optimizes multiple parameters simultaneously.
Generates a large space of functional protein solutions with extraordinary diversity.
Any difficult problem that you want proteins to solve requires a model.
We challenge proteins with a novel problem much like what nature might have done. We accelerate the search for solutions using our models. With every iteration, the models learn on what might have taken place over millions of years. This lets us leapfrog into new spaces of protein solutions that nobody has searched before.
We use the evolutionary history of a protein, not just a single protein.
Rather than varying a single natural sequence (directed evolution) or manipulating a single atomic structure (physics-based design), we capture information from genome databases representing a deep evolutionary record of a target protein. Merging this with experimental data, we learn the non-intuitive hidden rules of protein function.
We rationally search the entire design space, not just around the active site.
Protein sequence space is impossibly large – much larger than the number of atoms in the known universe – and cannot be searched comprehensively by any existing approach. Our generative machine learning models help find the vanishingly small subset of this space – the design manifold – that encodes folded, functional, and evolvable proteins. This unlocks the natural potential of evolved proteins for functional innovation.
We learn from every sequence generated in our design process, not just what works.
Rather than stepwise variation through single mutations, our models quickly evolve through rounds of iterative optimization, using thousands of measurements per iteration. Genotype-phenotype mapping for every sequence generated provides unbiased insights as feedback to our models. This leads us to our desired solutions faster and with greater efficiency.
We tune multiple parameters simultaneously instead of one at a time.
Successful protein engineering often involves a difficult multidimensional optimization over potentially correlated properties – catalytic power, substrate specificity, stability, environmental sensitivity, etc. Traditional methods go about this by optimizing one property at a time. The design rules uncovered by Evozyne enable simultaneous optimization over complex objective functions. This dramatically accelerates protein engineering campaigns and enables novel solutions that cannot be reached through serial optimization.
We provide a library of functional proteins instead of a single optimized molecule.
Just like natural evolution, our models produce a great diversity of sequences, all of which are solutions to a given design problem. This can facilitate downstream screens for specialized environments, proprietary host strains, and manufacturability. This enables our solutions to scale up from the lab to commercial operation with ease.
Our platform is based on decades of fundamental research into how and why proteins are built the way they are through the process of evolution. Below are several key papers on the development of evolutionarily consistent models for protein structure and function.
Chemical Science
2020
Science
2020
Physical Biology
2018
PNAS
2016
Cell
2016
Cell
2015
Nature
2012
Molecular Systems Biology
2011
Cell
2009
Science
2008
Nature
2005
Nature
2005
Science
1999