PROTEIN ENGINEERING DESIGN & SELECTION

The shortest path method (SPM) webserver for computational enzyme design
Casadevall G, Casadevall J, Duran C and Osuna S
SPMweb is the online webserver of the Shortest Path Map (SPM) tool for identifying the key conformationally-relevant positions of a given enzyme structure and dynamics. The server is built on top of the DynaComm.py code and enables the calculation and visualization of the SPM pathways. SPMweb is easy-to-use as it only requires three input files: the three-dimensional structure of the protein of interest, and the two matrices (distance and correlation) previously computed from a Molecular Dynamics simulation. We provide in this publication information on how to generate the files for SPM construction even for non-expert users and discuss the most relevant parameters that can be modified. The tool is extremely fast (it takes less than one minute per job), thus allowing the rapid identification of distal positions connected to the active site pocket of the enzyme. SPM applications expand from computational enzyme design, especially if combined with other tools to identify the preferred substitution at the identified position, but also to rationalizing allosteric regulation, and even cryptic pocket identification for drug discovery. The simple user interface and setup make the SPM tool accessible to the whole scientific community. SPMweb is freely available for academia at http://spmosuna.com/.
Interactive computational and experimental approaches improve the sensitivity of periplasmic binding protein-based nicotine biosensors for measurements in biofluids
Haloi N, Huang S, Nichols AL, Fine EJ, Friesenhahn NJ, Marotta CB, Dougherty DA, Lindahl E, Howard RJ, Mayo SL and Lester HA
We developed fluorescent protein sensors for nicotine with improved sensitivity. For iNicSnFR12 at pH 7.4, the proportionality constant for ∆F/F0vs [nicotine] (δ-slope, 2.7 μM-1) is 6.1-fold higher than the previously reported iNicSnFR3a. The activated state of iNicSnFR12 has a fluorescence quantum yield of at least 0.6. We measured similar dose-response relations for the nicotine-induced absorbance increase and fluorescence increase, suggesting that the absorbance increase leads to the fluorescence increase via the previously described nicotine-induced conformational change, the 'candle snuffer' mechanism. Molecular dynamics (MD) simulations identified a binding pose for nicotine, previously indeterminate from experimental data. MD simulations also showed that Helix 4 of the periplasmic binding protein (PBP) domain appears tilted in iNicSnFR12 relative to iNicSnFR3a, likely altering allosteric network(s) that link the ligand binding site to the fluorophore. In thermal melt experiments, nicotine stabilized the PBP of the tested iNicSnFR variants. iNicSnFR12 resolved nicotine in diluted mouse and human serum at 100 nM, the peak [nicotine] that occurs during smoking or vaping, and possibly at the decreasing levels during intervals between sessions. NicSnFR12 was also partially activated by unidentified endogenous ligand(s) in biofluids. Improved iNicSnFR12 variants could become the molecular sensors in continuous nicotine monitors for animal and human biofluids.
Sequence-activity mapping via depletion reveals striking mutational tolerance and elucidates functional motifs in Tur1a antimicrobial peptide
Collins J and Hackel BJ
Proline-rich antimicrobial peptides (PrAMPs) are attractive antibiotic candidates that target gram-negative bacteria ribosomes. We elucidated the sequence-function landscape of 43 000 variants of a recently discovered family member, Tur1a, using the validated SAMP-Dep platform that measures intracellular AMP potency in a high-throughput manner via self-depletion of the cellular host. The platform exhibited high replicate reproducibility (ρ = 0.81) and correlation between synonymous genetic variants (R2 = 0.93). Only two segments within Tur1a exhibited stringent mutational requirements to sustain potency: residues 9YLP11 and 19FP20. This includes the aromatic residue in the hypothesized binding domain but not the PRP domain. Along with unexpected mutational tolerance of PRP, the data contrast hypothesized importance of the 1RRIR4 motif and arginines in general. In addition to mutational tolerance of residue segments with presumed significance, 77% of mutations are functionally neutral. Multimutant performance mainly shows compounding effects from removed combinations of prolines and arginines in addition to the two segments of residues showing individual importance. Several variants identified as active from SAMP-Dep were externally produced and maintained activity when applied to susceptible species exogenously.
Design of functional intrinsically disordered proteins
Garg A, González-Foutel NS, Gielnik MB and Kjaergaard M
Many proteins do not fold into a fixed three-dimensional structure, but rather function in a highly disordered state. These intrinsically disordered proteins pose a unique challenge to protein engineering and design: How can proteins be designed de novo if not by tailoring their structure? Here, we will review the nascent field of design of intrinsically disordered proteins with focus on applications in biotechnology and medicine. The design goals should not necessarily be the same as for de novo design of folded proteins as disordered proteins have unique functional strengths and limitations. We focus on functions where intrinsically disordered proteins are uniquely suited including disordered linkers, desiccation chaperones, sensors of the chemical environment, delivery of pharmaceuticals, and constituents of biomolecular condensates. Design of functional intrinsically disordered proteins relies on a combination of computational tools and heuristics gleaned from sequence-function studies. There are few cases where intrinsically disordered proteins have made it into industrial applications. However, we argue that disordered proteins can perform many roles currently performed by organic polymers, and that these proteins might be more designable due to their modularity.
Improving plastic degrading enzymes via directed evolution
Joho Y, Vongsouthi V, Gomez C, Larsen JS, Ardevol A and Jackson CJ
Plastic degrading enzymes have immense potential for use in industrial applications. Protein engineering efforts over the last decade have resulted in considerable enhancement of many properties of these enzymes. Directed evolution, a protein engineering approach that mimics the natural process of evolution in a laboratory, has been particularly useful in overcoming some of the challenges of structure-based protein engineering. For example, directed evolution has been used to improve the catalytic activity and thermostability of polyethylene terephthalate (PET)-degrading enzymes, although its use for the improvement of other desirable properties, such as solvent tolerance, has been less studied. In this review, we aim to identify some of the knowledge gaps and current challenges, and highlight recent studies related to the directed evolution of plastic-degrading enzymes.
Correction to: Growing ecosystem of deep learning methods for modeling protein-protein interactions
Sequence-developability mapping of affibody and fibronectin paratopes via library-scale variant characterization
Nielsen GH, Schmitz ZD and Hackel BJ
Protein developability is requisite for use in therapeutic, diagnostic, or industrial applications. Many developability assays are low throughput, which limits their utility to the later stages of protein discovery and evolution. Recent approaches enable experimental or computational assessment of many more variants, yet the breadth of applicability across protein families and developability metrics is uncertain. Here, three library-scale assays-on-yeast protease, split green fluorescent protein (GFP), and non-specific binding-were evaluated for their ability to predict two key developability outcomes (thermal stability and recombinant expression) for the small protein scaffolds affibody and fibronectin. The assays' predictive capabilities were assessed via both linear correlation and machine learning models trained on the library-scale assay data. The on-yeast protease assay is highly predictive of thermal stability for both scaffolds, and the split-GFP assay is informative of affibody thermal stability and expression. The library-scale data was used to map sequence-developability landscapes for affibody and fibronectin binding paratopes, which guides future design of variants and libraries.
Protein sequence design on given backbones with deep learning
Liu Y and Liu H
Deep learning methods for protein sequence design focus on modeling and sampling the many- dimensional distribution of amino acid sequences conditioned on the backbone structure. To produce physically foldable sequences, inter-residue couplings need to be considered properly. These couplings are treated explicitly in iterative methods or autoregressive methods. Non-autoregressive models treating these couplings implicitly are computationally more efficient, but still await tests by wet experiment. Currently, sequence design methods are evaluated mainly using native sequence recovery rate and native sequence perplexity. These metrics can be complemented by sequence-structure compatibility metrics obtained from energy calculation or structure prediction. However, existing computational metrics have important limitations that may render the generalization of computational test results to performance in real applications unwarranted. Validation of design methods by wet experiments should be encouraged.
High titer expression of antibodies using linear expression cassettes for early-stage functional screening
Wu S, Tsukuda J, Chiang N, Hao T, Chen Y, Hötzel I, Balasubramanian S, Nakamura G and Kelly RL
Antibody discovery processes are continually advancing, with an ever-increasing number of potential binding sequences being identified out of in vivo, in vitro, and in silico sources. In this work we describe a rapid system for high yield recombinant antibody (IgG and Fab) expression using Gibson assembled linear DNA fragments (GLFs). The purified recombinant antibody yields from 1 ml expression for this process are approximately five to ten-fold higher than previous methods, largely due to novel usage of protecting flanking sequences on the 5' and 3' ends of the GLF. This method is adaptable for small scale (1 ml) expression and purification for rapid evaluation of binding and activity, in addition to larger scales (30 ml) for more sensitive assays requiring milligram quantities of antibody purified over two columns (Protein A and size exclusion chromatography). When compared to plasmid-based expression, these methods provide nearly equivalent yield of high-quality material across multiple applications, allowing for reduced costs and turnaround times to enhance the antibody discovery process.
Computational methods for protein design
Ferruz N and Stein A
An engineered NKp46 antibody for construction of multi-specific NK cell engagers
Lee RB, Maddineni S, Landry M, Diaz C, Tashfeen A, Yamada-Hunter SA, Mackall CL, Beinat C, Sunwoo JB and Cochran JR
Recent developments in cancer immunotherapy have highlighted the potential of harnessing natural killer (NK) cells in the treatment of neoplastic malignancies. Of these, bispecific antibodies, and NK cell engager (NKCE) protein therapeutics in particular, have been of interest. Here, we used phage display and yeast surface display to engineer RLN131, a unique cross-reactive antibody that binds to human, mouse, and cynomolgus NKp46, an activating receptor found on NK cells. RLN131 induced proliferation and activation of primary NK cells, and was used to create bispecific NKCE constructs of varying configurations and valency. All NKCEs were able to promote greater NK cell cytotoxicity against tumor cells than an unmodified anti-CD20 monoclonal antibody, and activity was observed irrespective of whether the constructs contained a functional Fc domain. Competition binding and fine epitope mapping studies were used to demonstrate that RLN131 binds to a conserved epitope on NKp46, underlying its species cross-reactivity.
Engineered FHA domains can bind to a variety of Phosphothreonine-containing peptides
Thota SS, Allen GL, Grahn AK and Kay BK
Antibodies play a crucial role in monitoring post-translational modifications, like phosphorylation, which regulates protein activity and location; however, commercial polyclonal and monoclonal antibodies have limitations in renewability and engineering compared to recombinant affinity reagents. A scaffold based on the Forkhead-associated domain (FHA) has potential as a selective affinity reagent for this post-translational modification. Engineered FHA domains, termed phosphothreonine-binding domains (pTBDs), with limited cross-reactivity were isolated from an M13 bacteriophage display library by affinity selection with phosphopeptides corresponding to human mTOR, Chk2, 53BP1, and Akt1 proteins. To determine the specificity of the representative pTBDs, we focused on binders to the pT543 phosphopeptide (536-IDEDGENpTQIEDTEP-551) of the DNA repair protein 53BP1. ELISA and western blot experiments have demonstrated the pTBDs are specific to phosphothreonine, demonstrating the potential utility of pTBDs for monitoring the phosphorylation of specific threonine residues in clinically relevant human proteins.
Supercharged Phosphotriesterase for improved Paraoxon activity
Kronenberg J, Britton D, Halvorsen L, Chu S, Kulapurathazhe MJ, Chen J, Lakshmi A, Renfrew PD, Bonneau R and Montclare JK
Phosphotriesterases (PTEs) represent a class of enzymes capable of efficient neutralization of organophosphates (OPs), a dangerous class of neurotoxic chemicals. PTEs suffer from low catalytic activity, particularly at higher temperatures, due to low thermostability and low solubility. Supercharging, a protein engineering approach via selective mutation of surface residues to charged residues, has been successfully employed to generate proteins with increased solubility and thermostability by promoting charge-charge repulsion between proteins. We set out to overcome the challenges in improving PTE activity against OPs by employing a computational protein supercharging algorithm in Rosetta. Here, we discover two supercharged PTE variants, one negatively supercharged (with -14 net charge) and one positively supercharged (with +12 net charge) and characterize them for their thermodynamic stability and catalytic activity. We find that positively supercharged PTE possesses slight but significant losses in thermostability, which correlates to losses in catalytic efficiency at all temperatures, whereas negatively supercharged PTE possesses increased catalytic activity across 25°C-55°C while offering similar thermostability characteristic to the parent PTE. The impact of supercharging on catalytic efficiency will inform the design of shelf-stable PTE and criteria for enzyme engineering.
Correction to: De novo design of a polycarbonate hydrolase
DexDesign: an OSPREY-based algorithm for designing de novo D-peptide inhibitors
Guerin N, Childs H, Zhou P and Donald BR
With over 270 unique occurrences in the human genome, peptide-recognizing PDZ domains play a central role in modulating polarization, signaling, and trafficking pathways. Mutations in PDZ domains lead to diseases such as cancer and cystic fibrosis, making PDZ domains attractive targets for therapeutic intervention. D-peptide inhibitors offer unique advantages as therapeutics, including increased metabolic stability and low immunogenicity. Here, we introduce DexDesign, a novel OSPREY-based algorithm for computationally designing de novo D-peptide inhibitors. DexDesign leverages three novel techniques that are broadly applicable to computational protein design: the Minimum Flexible Set, K*-based Mutational Scan, and Inverse Alanine Scan. We apply these techniques and DexDesign to generate novel D-peptide inhibitors of two biomedically important PDZ domain targets: CAL and MAST2. We introduce a framework for analyzing de novo peptides-evaluation along a replication/restitution axis-and apply it to the DexDesign-generated D-peptides. Notably, the peptides we generated are predicted to bind their targets tighter than their targets' endogenous ligands, validating the peptides' potential as lead inhibitors. We also provide an implementation of DexDesign in the free and open source computational protein design software OSPREY.
TIMED-Design: flexible and accessible protein sequence design with convolutional neural networks
Castorina LV, Ünal SM, Subr K and Wood CW
Sequence design is a crucial step in the process of designing or engineering proteins. Traditionally, physics-based methods have been used to solve for optimal sequences, with the main disadvantages being that they are computationally intensive for the end user. Deep learning-based methods offer an attractive alternative, outperforming physics-based methods at a significantly lower computational cost. In this paper, we explore the application of Convolutional Neural Networks (CNNs) for sequence design. We describe the development and benchmarking of a range of networks, as well as reimplementations of previously described CNNs. We demonstrate the flexibility of representing proteins in a three-dimensional voxel grid by encoding additional design constraints into the input data. Finally, we describe TIMED-Design, a web application and command line tool for exploring and applying the models described in this paper. The user interface will be available at the URL: https://pragmaticproteindesign.bio.ed.ac.uk/timed. The source code for TIMED-Design is available at https://github.com/wells-wood-research/timed-design.
Modular and integrative activity reporters enhance biochemical studies in the yeast ER
Martinusen SG, Slaton EW, Nelson SE, Pulgar MA, Besu JT, Simas CF and Denard CA
The yeast endoplasmic reticulum sequestration and screening (YESS) system is a broadly applicable platform to perform high-throughput biochemical studies of post-translational modification enzymes (PTM-enzymes). This system enables researchers to profile and engineer the activity and substrate specificity of PTM-enzymes and to discover inhibitor-resistant enzyme mutants. In this study, we expand the capabilities of YESS by transferring its functional components to integrative plasmids. The YESS integrative system yields uniform protein expression and protease activities in various configurations, allows one to integrate activity reporters at two independent loci and to split the system between integrative and centromeric plasmids. We characterize these integrative reporters with two viral proteases, Tobacco etch virus (TEVp) and 3-chymotrypsin like protease (3CLpro), in terms of coefficient of variance, signal-to-noise ratio and fold-activation. Overall, we provide a framework for chromosomal-based studies that is modular, enabling rigorous high-throughput assays of PTM-enzymes in yeast.
Correction to: Variable heavy-variable light domain and Fab-arm CrossMabs with charged residue exchanges to enforce correct light chain assembly
De novo design of a polycarbonate hydrolase
Holst LH, Madsen NG, Toftgård FT, Rønne F, Moise IM, Petersen EI and Fojan P
Enzymatic degradation of plastics is currently limited to the use of engineered natural enzymes. As of yet, all engineering approaches applied to plastic degrading enzymes retain the natural $\alpha /\beta $-fold. While mutations can be used to increase thermostability, an inherent maximum likely exists for the $\alpha /\beta $-fold. It is thus of interest to introduce catalytic activity toward plastics in a different protein fold to escape the sequence space of plastic degrading enzymes. Here, a method for designing highly thermostable enzymes that can degrade plastics is described. With the help of Rosetta an active site catalysing the hydrolysis of polycarbonate is introduced into a set of thermostable scaffolds. Through computational evaluation, a potential PCase was selected and produced recombinantly in Escherichia coli. Thermal analysis suggests that the design has a melting temperature of >95$^{\circ }$C. Activity toward polycarbonate was confirmed using atomic force spectroscopy (AFM), proving the successful design of a PCase.
abYpap: improvements to the prediction of antibody VH/VL packing using gradient boosted regression
Boron VA and Martin ACR
The Fv region of the antibody (comprising VH and VL domains) is the area responsible for target binding and thus the antibody's specificity. The orientation, or packing, of these two domains relative to each other influences the topography of the Fv region, and therefore can influence the antibody's binding affinity. We present abYpap, an improved method for predicting the packing angle between the VH and VL domains. With the large data set now available, we were able to expand greatly the number of features that could be used compared with our previous work. The machine-learning model was tuned for improved performance using 37 selected residues (previously 13) and also by including the lengths of the most variable 'complementarity determining regions' (CDR-L1, CDR-L2 and CDR-H3). Our method shows large improvements from the previous version, and also against other modeling approaches, when predicting the packing angle.
Growing ecosystem of deep learning methods for modeling protein-protein interactions
Rogers JR, Nikolényi G and AlQuraishi M
Numerous cellular functions rely on protein-protein interactions. Efforts to comprehensively characterize them remain challenged however by the diversity of molecular recognition mechanisms employed within the proteome. Deep learning has emerged as a promising approach for tackling this problem by exploiting both experimental data and basic biophysical knowledge about protein interactions. Here, we review the growing ecosystem of deep learning methods for modeling protein interactions, highlighting the diversity of these biophysically informed models and their respective trade-offs. We discuss recent successes in using representation learning to capture complex features pertinent to predicting protein interactions and interaction sites, geometric deep learning to reason over protein structures and predict complex structures, and generative modeling to design de novo protein assemblies. We also outline some of the outstanding challenges and promising new directions. Opportunities abound to discover novel interactions, elucidate their physical mechanisms, and engineer binders to modulate their functions using deep learning and, ultimately, unravel how protein interactions orchestrate complex cellular behaviors.