mGPfusion: Predicting protein stability changes with Gaussian process kernel learning and data fusion.

RSS Source
Emmi Jokinen, Markus Heinonen, Harri Lähdesmäki

Proteins are commonly used by biochemical industry for numerous processes.Refining these proteins' properties via mutations causes stability effects aswell. Accurate computational method to predict how mutations affect proteinstability are necessary to facilitate efficient protein design. However,accuracy of predictive models is ultimately constrained by the limitedavailability of experimental data. We have developed mGPfusion, a novelGaussian process (GP) method for predicting protein's stability changes uponsingle and multiple mutations. This method complements the limited experimentaldata with large amounts of molecular simulation data. We introduce a Bayesiandata fusion model that re-calibrates the experimental and in silico datasources and then learns a predictive GP model from the combined data. Ourprotein-specific model requires experimental data only regarding the protein ofinterest, and performs well even with few experimental measurements. ThemGPfusion models proteins by contact maps and infers the stability effectscaused by mutations with a mixture of graph kernels. Our results show thatmGPfusion outperforms state-of-the-art methods in predicting protein stabilityon a dataset of 15 different proteins and that incorporating molecularsimulation data improves the model learning and prediction accuracy.

Stay in the loop.

Subscribe to our newsletter for a weekly update on the latest podcast, news, events, and jobs postings.