By combining genomic data with data in national health registries, an artificial intelligence model can be developed that can be asked questions about potential future disease treatment. Such statistical and machine learning models are able to predict the occurrence of a disease.
Associate Professor Andrea Ganna from the Institute for Molecular Medicine Finland (FIMM) at the University of Helsinki is interested in combining genetic and statistical data.
“Healthcare can benefit from machine learning, which is constantly learning from the huge amount of data available to it. Questions can be posed to AI about potential hospital treatments in the future. AI can tell what a person’s life expectancy is, or how much prescription drugs will cost next year with certain life choices.”
Ganna uses large datasets to identify the demographic and genetic traits that underlie common and complex diseases. AI can make a risk calculation for each individual by modelling data from longitudinal tracking of diseases and medications along with genetic, family and demographic data.
In particular, Ganna uses FinRegistry data in his research. FinRegistry is a joint research project of the Finnish Institute for Health and Welfare (THL) and FIMM, led by Research Professor Markus Perola from THL. It is one of the world’s largest studies that makes secondary use of registry data.
“The dataset contains data from 7.2 million individuals, i.e. the entire population of Finland and many relatives who have already died. It contains a lot of different, wide-ranging information, including health information, information on family relationships, socio-economic information, and laboratory results and prescriptions. This is an enormous data set.”
The database includes data from 19 national registers, such as the Finnish Cancer Registry, the Drug Purchase Register and Kanta. Kanta is a register that brings together customer and patient data from healthcare and pharmacies. More than one billion pharmaceutical purchases alone have been registered in the collection so far. These are data points, with each individual fact being one data point. In total, there are more than 6.5 billion data points in the dataset.
“I consider the project to be unique. The data is rich and varied,” Ganna says.
“Combining health information with social and economic information is extremely important to me. These are often considered to be separate from each other, but combining the data is vital for health. We need to consider socio-economic information to understand how “fair” AI models are. We don’t want AI model to work worst in the most fragile sectors of our population. ”
Once the data has been collected from different registers, the individual data is encrypted and stored in the sensitive data services of the Finnish ELIXIR node of the CSC - IT Center for Science. Ganna and his research team analyse the data in this secure environment.
“We have worked with the CSC to make services more useful for researchers. We started with simple analyses and moved towards more complex models.”
There is a colossal amount of sensitive data in Ganna’s research.
“We are creating a data matrix for AI and machine learning models, but we are also very aware of the sensitive nature of the data. We cannot re-identify individuals and we use very advanced security measures to avoid unauthorized access. ”
This information may be used for various purposes.
“We are gaining a better understanding of the different clusters of disease, and are able to make better predictions. We can even create a digital clock that describes ageing. It uses data from the whole population to give each person in Finland a kind of digital age, based on an indicative trajectory derived from health data.”
The plan is to integrate the registry data by Ganna and his research team into the genomic data in the biobanks. This is an ambitious project, aiming to identify emerging diseases in individuals that could be prevented from developing. In the future, the data could be used to identify at-risk individuals who could benefit from preventive drug treatment.
There is already enough data to make this possible, according to Ganna. Ganna cites the FinnGen research project, which has already produced genome data on half a million people in Finland deposited in the biobanks. The project involves investigating the genetic background of various diseases in the Finnish population. The next step is to determine how genes influence the progression of diseases.
“It would be possible to contact people at risk, as their information is in the biobanks. Of course, this assumes that the people in the biobanks have given their consent to be contacted.”
In Ganna’s view, the CSC’s sensitive data services should be further developed to support machine learning models in particular. So far, AI models have only been tested in research. This is because, under current Finnish legislation, it is not possible to automatically use registry data to re-contact people at risk.
“We can make these beautiful models, but we can’t warn people at risk,” Ganna says. However, he adds that if the models are simplified enough, they can be used in clinical care.
One example he cites is the respiratory syncytial virus (RSV) that Pekka Vartiainen from FIMM and Markus Perola from THL studied in the FinRegistry project. RSV is the commonest virus causing respiratory infections in young children worldwide. The researchers created a simplified model that can be used in the clinical management of RSV. In Finland, doctors could now use registry data to identify who is at risk of contracting the virus and who could be treated in time.
Ganna believes that in the future, healthcare will benefits from AI models that understands health data.
“AI will support clinical decision making, by helping doctors to better summarize health trajectories of their patients. The future is bright.”
Ari Turunen
30.5.2024
Read article in PDF
More information:
Finnish Institute for Life Science (FIMM)
FIMM is part of HiLIFE Helsinki Institute of Life Science -research center.
https://www.helsinki.fi/en/hilife-helsinki-institute-life-science/units/fimm
CSC – IT Center for Science
is a non-profit, state-owned company administered by the Ministry of Education and Culture. CSC maintains and develops the state-owned, centra- lised IT infrastructure.
https://research.csc.fi/cloud-computing
ELIXIR
builds infrastructure in support of the biological sector. It brings together the leading organisations of 21 Euro- pean countries and the EMBL European Molecular Biology Laboratory to form a common infrastructure for biological information. CSC – IT Center for Science is the Finnish centre within this infrastructure.