Introduction of biomarkers for diagnosis and treatment of psychiatric diseases into routine clinical practice requires moving from proof-of-concept studies to large-scale clinical validation and deployment. Involving clinicians, machine learning (ML) technology and Big Data initiatives will help providing high-quality models for clinical use. Exciting developments in this field were presented at ECNP2018.
For more information on Big Data in psychiatry and neurology, please see "Big Data in Healthcare" in the Lundbeck Institute Campus.
Dr Janaina Mourao-Miranda (University College London, UK) addressed the question: “how can we combine machine learning and neuroimaging techniques to improve diagnosis and outcome in psychiatry?”
How can we combine machine learning and neuroimaging techniques to improve diagnosis and outcome in psychiatry?
Train, test, predict
A framework of training and testing components, with brain images from patients and healthy controls, is used to train the ML algorithm to develop predictive function. The algorithm is then applied to test images to predict patient versus healthy control.
This technique has been applied extensively to many clinical questions in psychiatry, including identifying subjects at-risk of developing a condition or treatment responders and predicting disease outcome. One of the main issues faced in developing this technology on a larger scale, however, is reproducibility. Increasing sample size seems to decrease accuracy, as limited by binary nature of categorical classification of subjects/data. A wide range of clinical assessments are condensed into a single label, assuming within-group homogeneity, when real world groups are very heterogeneous.
Hence, the benefit of using continuous clinical measures. In regression models, images are given a score rather than binary classification, and in normative models the algorithm is trained to detect outliers.
Multiple sources are better than one
This is still limited by under-using much of the available clinical information. Multiple-source learning uses multiple modalities of data. This is useful in identifying biomarkers, but still limited to one score which may not capture everything.
Stratification models are more transformative, identifying subgroups with particular risks or response patterns. Determining a ‘brain-behavior latent space’ allows identification of subjects who are moving from a ‘healthy’ to a ‘disease’ state, based on expression of certain associations, with potential of targeting early intervention.
Application to clinical practice requires moving from data to group-level inference to predictive modelling
What is a good predictive model?
Application to clinical practice requires moving from data to group-level inference to predictive modelling1. Professor Tim Hahn (Goethe-University Frankfurt, Germany) proposed guidelines for assessing what is a good predictive model. How can clinicians or researchers judge quality of the explosion of papers in artificial intelligence (AI)/ML, and evaluate clinical utility and translational potential? As John Naisbitt said: “We are drowning in information but starved for knowledge”2.
“We are drowning in information but starved for knowledge”
Introducing AI Transparency
To help clinicians, he proposed a conceptual framework of ‘AI Transparency’, looking at generalization, model scope and risk profile. Understanding the algorithm itself isn’t necessary, and this best practice check list is easy to use.
Is the model capable of generalization when applied to previously unseen data? It is important to keep training and test data independent; otherwise the model will just get very good at describing the training data, but can’t be applied to test data. Professor Hahn encouraged the use of online repositories, where researchers can upload data and receive predictions back3. If data sets can’t be kept independent, then important to do cross validation, especially for hyperparameter optimization, recognizing the trade-off between bias and variance.
It is important to keep training and test data independent
The second element is model scope. A model can only make high quality predictions about those people sufficiently represented in the training set. It also needs evaluation in the context of its later use. Ideally don’t have exclusion criteria in studies, otherwise difficult to demonstrate that the model works in real life settings.
Can the model provide information beyond what is already known?
The final important element is assessing the risk profile and distribution of errors. Can the model provide information beyond what is already known (incremental utility)? Is there algorithmic bias? A model will learn what it is taught, and there is potential to introduce bias, even if unintentional. Be aware of safety and security issues, with the need to guard training data and run adversarial attacks to immunize models. Once this is in place the next step will be addressing compliance issues.