Data drives better categorization of patients

Subtypes of Parkinson’s Disease patients with different risks of progression – and perhaps needing different approaches to treatment -- can be identified by applying machine learning techniques to clinical and genetic data. Studies require wider replication, Dr Andrew Singleton (National Institute on Aging, Bethesda, USA) cautioned. But such developments make for exciting times.

In a likely landmark study, Dr Singleton and colleagues took data on 140 clinical variables from Parkinson’s Disease cases and healthy controls and looked at change in people’s condition. Machine learning techniques collapsed these 140 variables into the smallest number of dimensions that best explained the way PD evolved.

There were three dimensions: vector 1 consisted mostly of measures related to cognition; vector 2 related essentially to motor function; and vector 3 related to sleep. Within this three dimensional space, four subgroups were identified: controls without PD; and groups with the condition whose progression was slow, or moderate, or rapid. Dr Singleton was surprised to find that sleep was one of the three dimensions related to speed of progression.

Perhaps surprisingly, sleep related to speed of progression

Different subtypes of disease?

Our ability to better predict disease onset and disease progression has implications for the better management of individual patients. It also has application to clinical trials, where it would be helpful to be able to specify subtypes likely to experience fast or slow progression. This is especially so if – as seems possible -- these subtypes have a different etiology and pattern of genetic risk factors and response to management.

Andrew Singleton began his presentation by describing advances in our ability to predict the onset of disease. The starting point is to take all the available baseline data – clinical and genetic – and use machine learning techniques to build a model that best discriminates between people with PD and those who do not have the disease.

Improving categorization increases confidence in PD diagnosis and could tailor management to risk of progression

In defining the architecture of genetic risk, participants in the Parkinson's Progression Markers Initiative PPMI) -- who have now had their full genome sequenced -- have proved an invaluable resource. They are probably the best characterized group of people with PD in the world.

Dr Singleton and colleagues produced a model that distinguished between people with PD and those  who did not have the disease based on the five most strongly predictive factors. These were hyposmia, family history of PD, sex, age and a genetic risk score – which, at that stage, was based on 28 common genetic variants supplemented by a few of that were more rare. The 2015 model distinguished PD patients from healthy controls with an area under the curve (AUC) of 0.92.

Predicting disease onset

Interestingly, people with SWEDD (scans without evidence of dopaminergic deficit) who were assigned to the PD category had a greater than average risk of converting to typical PD within a year. 

Further work has shown that genetics can be used to predict risk without knowing the identity of the individual risk loci. Based on longitudinal cohorts, it seems that – out of around twenty million variants – a hundred thousand each contribute a small amount to the variance. Applying this new knowledge to the PPMI cohort increases the AUC for discriminating people with PD from healthy controls to 0.97.

Dr Singleton foresees a time when genetic techniques of this kind, allied to established baseline clinical risk factors, will be able to predict who will receive a PD diagnosis over the next three years with an AUC of around 0.88. And that would certainly be clinically useful.