Harnessing complex health data through machine learning
Associate Professor Laurent Billot, Director of the Biostatistics & Data Science Division, talks about the recent establishment of the Global Data Analytics Program at the Institute and the role machine learning can play in improving health outcomes.
Why has The George decided to establish the Global Data Analytics Program?
There's been a significant expansion of data in health relatively recently and our ability to access large data sets has also expanded, like data collected by the health system on hospitalisation for example.
Although artificial intelligence and, more specifically, machine learning methods, have been around for some time, applications of these methods to health research is relatively recent. Their use is growing rapidly and it is important for the Institute to remain competitive and take advantage of the latest developments in that space to improve health outcomes.
What is machine learning and what role can it play in improving health?
Machine learning presents a real opportunity to harness the deluge of health data we now can access to improve health outcomes across a range of different areas. Some health data can be really messy and made up of different types of data mixed together, like electronic medical records for example, which include text notes and can be fairly incomplete. Traditional analysis methods have a limited ability to handle this kind of unstructured or complex data. However, some machine learning methods such as neurol networks, have the ability to deal with non-traditional data sources.
Let me give you an example. Brain scans are images - the type of complex data I'm talking about. Images are not data sets with clear structures like the variables in an Excel file or database. So how can we more efficiently analyse an image correctly and consistently to predict health outcomes, such as the risk of the recurrence of stroke?
That's where machine learning is really useful. Usually images are reviewed manually by specialists who assess them individually. But with neural networks or more specifically, deep learning, you can train an algorithm by showing it images of what you want to identify over and over again until it has seen enough to be able to decide by itself. The algorithm learns by being exposed to examples and then reapplies this understanding to new images to predict outcomes, like the probability of experiencing a cardiovascular event in the next five years for example.
What kind of activities will the new program undertake?
The Global Data Analytics Program will act as a hub, providing methodological guidance, services and training focused on machine learning and visual analytics methods that are aligned with our overall Strategy 2025 goals.
One area we'll be looking at is enabling us to predict health events in a more accurate manner. If you can identify that someone has a strong predictor for stroke, you can act on that prediction and potentially avoid the event. We're not trying to replace doctors, but instead help the health system accurately monitor the risks faced by patients. For example, by triaging and identifying patients at risk and then you can develop a special protocol to look after them so that they have better outcomes.
Going back to the brain scan example, if you can automate the measurement of things like how much bleeding there was in the brain to measure the impact of the stroke, then you're better able to know how to deal with that patient. It just makes things much more efficient.
At the moment, we're mapping a list of current activities to understand what people are doing and what they are interested in. As mentioned, we'll be exploring the use of machine learning in brain scans. We're also looking at how we can improve trial efficiency by identifying patterns of data quality that are problematic. Another area we are looking at is the use of natural language processing. This involves using text as information and input into algorithms, as opposed to having a person manually read and code everything. For example, SMARThealth's algorithm uses clinical guidelines to develop treatment recommendations based on a patient's data. These guidelines are constantly updated and are made up of a lot of words. We'd like to be able to automatically extract the information contained in the new clinical guidelines to update our risk assessment algorithm.
We're also interested in looking at using machine learning to identify clusters of multi-morbidities. For example, are there people who tend to have diabetes and another chronic condition? We'd like to use machine learning to try to understand how those morbidities interact and their potential clinical outcomes.
What would you like the program to achieve in the long-term?
I want us to be in a position where we are aware of the latest machine learning methods and are competent and comfortable applying them to a range of areas where they are relevant and can add value in improving health outcomes. For this, we'll need to add capacity by hiring experts, linking them to others in the Institute and integrating machine learning and visual analytics methods across our research portfolio.
I want us to be better at how we visualise the data we are reporting on. We produce tables and listings, but I'd like to see us be able to visualise our research data in an interactive manner so that the investigator on the study can see their data as the study progresses using machine learning, of course without compromising the integrity of the study or unblinding participants. We don't want to compromise privacy so we'll need to tighten how we store and access our data. We'll also need to standardise our data more and create an online repository.
Ultimately, I hope that the Global Data Analytics Program leads to more efficient trials being conducted at the Institute, better targeted treatments by identifying individuals that are the most likely to respond, and new prediction algorithms that help prevent patients from developing life-threatening medical conditions.