
Piia Bartos, a senior pharmaceutical researcher at the University of Eastern Finland’s School of Pharmacy, is interested in RNA, the proteins that bind RNA, and how this system can be influenced to prevent cancer growth. She studies RNA and the function of the argonaute protein that binds to it using massive simulations.
Molecular dynamics simulations provide insights into how biomolecules interact with each other at the atomic level. Because atoms are in constant motion, the forces between them are calculated and used to determine factors such as the new positions, velocities and energies of the protein atoms. This will provide new information for drug design.
Bartos has been studying RNA-binding proteins (RBPs), which may play a role in cancer treatment. RBPs have been found to play a role in cancer cells, particularly in drug responses and the development of drug resistance. More than 1,500 RBPs have been discovered so far. Changes in the function of these proteins can affect the level of cancer gene expression.
RNA interference (RNAi) is a biochemical mechanism whereby RNA causes the cleavage of messenger RNA in the cell, disrupting gene expression. The researchers who discovered RNAi, Andrew Fire and Craig Mello, were awarded the Nobel Prize for medicine in 2006. RNAi can be used to switch off the expression of proteins that promote cancer growth.
“We’re particularly interested in argonaute proteins which play an important role in RNA-mediated gene silencing. The most important of these is Ago2,” says Bartos.
When RNA is bound to Ago2 protein, this combination is called the RNA-Ago2 complex. Argonaute 2 protein binds microRNA molecules in cells.
“As argonaute-2 is a protein that’s vital for cell function, it’s likely to affect all types of cancer. If it is removed from the cells, the cells will not survive. If its activity could be eliminated in cancer cells, those cells would not survive. This would prevent the growth and spread of cancer cells.”

The challenge is that two types of RNA molecules can be bound in the RNA-Ago2 complex. The first inhibits protein production, whereas the second increases it. In the latter case, the production of cancer cells may increase.
“I simulated the function of RNA separately, and also with the Ago-2 protein. I have tried to clarify how Ago-2 complexes differ structurally – that is, when they contain RNA that increases protein production, and when they contain RNA that decreases protein production. We’ve just finished running the simulations and we’re now analysing the results.”
Simulations of molecular dynamics can be used to make a kind of video of the movements of Ago2-RNA complexes and to compare the differences between activating and silencing complexes.
The RNA sequence data used in the simulation was obtained from the A.I. Virtanen Institute for Molecular Sciences. Six RNA molecules were used in the simulations, three of which increased protein production and three of which decreased it. For all of these, molecular dynamics simulations were run for about 50 microseconds, or a millionth of a second per system. The simulations placed high demands on the computing resources of the Finnish ELIXIR Node, CSC – IT Center for Science.
“It’s a fairly big protein. Along with the RNA and the surrounding water, there are about 300,000 atoms, and we had to calculate the speed and position of all of them every four femtoseconds.”
A femtosecond is a millionth of a billionth of a second. Bartos is aiming to find out whether the shape of the complex changes, and whether a part of the protein moves differently when it has an increasing or decreasing RNA bound to it.
“It’s likely that the change in the shape of the complex can indicate that the complex binds to different proteins.”
There must therefore be a difference in the structure or movement of the complexes that causes different effects that either increase or decrease gene expression.
By understanding the structural differences between RNA-protein complexes that reduce and increase gene expression, it is possible to design and screen drugs that bind only to the desired complex. According to Bartos, such drugs would be a medical breakthrough, and would offer a new way to treat cancers where protein production is impaired.
“RNA interference-based drugs are a good alternative. These drugs could be more specific and better targeted to the cancer cell than a standard small-molecule cancer drug. With RNA interference, we could, if necessary, block the expression of any protein in a cancer that we wanted to block. So this would give highly selective drugs.”
According to Bartos, however, modelling the function of RNA is still a challenge. In simulations, force-field models work well for proteins, but not for RNA.
“The reason for this is that RNA is chemically and physically quite different from proteins.”
An example of a problem is the phosphate that forms the RNA strand with deoxyribose.
“The phosphate in RNA is electrically charged and is not very well modelled by these current force-field equations. So there’s clearly a lot of work to be done in developing the tools.”
Drug design has been making great strides on many levels. DeepMind’s artificial intelligence AlphaFold can already solve how a sequence becomes a protein structure. It uses known protein structures, and predicts the structure for all known proteins.
Sequencing can be used to identify mutations in cancer, and models can be used to study how mutations affect the action of anticancer drugs.
“For example, the mutation may prevent the cancer drug from binding to the target protein at the drug target, in which case the patient will rarely benefit from the drug.”
As computing capacity increases, it will also become possible to simulate larger entities.
“It would be great to simulate a single protein at a larger unit, for instance at the cellular level. We could simulate how the protein interacts with other proteins, cell membranes and cell organelles.”
Ari Turunen
30.9.2024
Read article in PDF
Citation
Turunen, A., & Nyrönen, T. (2024). New drug targets from RNA-binding proteins. https://doi.org/10.5281/zenodo.14810576
More information:
Hanna Baltrukevich & Piia Bartos: RNA-protein complexes and force field polarizability. Front. Chem., 22 June 2023
Sec. Theoretical and Computational Chemistry
Volume 11 – 2023 | https://doi.org/10.3389/fchem.2023.1217506
Milla Kurki et all: Structure of POPC Lipid Bilayers in OPLS3e Force Field. Journal of Chemical Information and Modeling. Vol 62/Issue 24
https://pubs.acs.org/doi/full/10.1021/acs.jcim.2c00395
University of Eastern Finland
CSC – IT Center for Science
is a non-profit, state-owned company administered by the Ministry of Education and Culture. CSC maintains and develops the state-owned, centra- lised IT infrastructure.
https://research.csc.fi/cloud-computing
ELIXIR
builds infrastructure in support of the biological sector. It brings together the leading organisations of 21 Euro- pean countries and the EMBL European Molecular Biology Laboratory to form a common infrastructure for biological information. CSC – IT Center for Science is the Finnish centre within this infrastructure.

The University of Eastern Finland performed a virtual search of 1.56 billion molecules to test two drug candidates. This was the world’s most extensive screening of its kind.
Most drugs available today have been designed so that the target molecules are the body’s own proteins. Once the structure of one member of a protein family has been determined, the structure of other proteins in the same family can be predicted through modelling. A successful drug can be developed, for example, by screening a large library to find a molecule with a three-dimensional structure enabling interaction with the target protein.
Professor Antti Poso’s research team were looking for molecules that would react with SurA chaperone and cyclin-G-associated kinase (GAK), two candidates with medicinal effect. The project tested the HASTEN algorithm developed for the screening, and created a new machine learning model.
“These target proteins, SurA and GAK, were already known to us from existing academic research projects. The results of the massive screenings can be used in other research. We not only just validated a method but are also able to help various academic research projects,” says Poso.
Chaperones contribute to protein folding and regulate protein interaction. Kinases have a role in cellular signalling, among other things.
“The SurA chaperone is related to a collaborative project with the University of Tübingen, with the aim of developing new antibiotics. Kinases, on the other hand, are a large family of proteins. Most cancer drugs are kinase inhibitors. There are some 500 types of kinase, with cyclin-G-associated kinase, or GAK, being one of them. GAK’s potential lies in cancer drugs and the treatment of viral infections.”
Poso’s team is studying the interaction of drugs and proteins, and creating target protein models. The point at which a drug binds to a protein can usually be identified in the target protein structure, thereby making the drug work. The model can be used specifically in virtual screening. This involves searching large molecular databases for new ideas for drug development.
“Chaperone’s protein structure is very different from that of kinase. So we are talking about two very different target proteins that were worth testing together.”

The structural difference of two drug candidates was a key factor, because the algorithm must work in all protein families.
“Two drug candidates were used to test how the HASTEN algorithm developed by Tuomo Kalliokoski at Orion works in the CSC supercomputing environment. The scalability was successful.”
The target protein screening was performed, for purposes of comparison, with the HASTEN algorithm and the traditional docking method. In docking, the search algorithm calculates the interactions between the protein and the drug candidate in the database. The value given by the algorithm shows how well the drug binds to the protein.
Poso’s team screened 1.56 billion molecules containing the drug candidate. The molecules were screened from the REAL database of Enamine, a large Ukrainian chemical company.
“First we calculated every two-dimensional molecule drawn in the database and converted them into three-dimensional format. After that the software tried to fit each molecule inside GAK or SurA. An individual fitting can have hundreds of thousands of alternatives.”
Then the researchers tested how machine learning fared compared to docking. The HASTEN algorithm was used for machine learning.
“We first chose a million molecules at random to see how the docking worked. We then fed the results to AI. So what the machine did was learn to predict the result on the basis of a million molecules, meaning that when a molecule has a specific shape, it docks into a specific location.”
After this, all 1.56 billion molecules were fed in to the AI to predict results using the results of the initial million molecules. The ones that had the highest prediction were docked again, followed by another round of machine learning. After a few rounds the AI was able to predict docking to the accuracy of 90 per cent.
“The machine that had been trained completed the screening much more quickly than would have been possible with the traditional docking method. While the calculation of docking took a couple of months even using powerful computers, with machine learning the learning process and prediction only took a few days.”
According to Poso, researchers can new routinely screen billions of molecules in the same time that previously only managed a million. And thanks to the machine learning model, billions of molecules can now be screened without a supercomputer.
“Obviously it follows that with supercomputers we can take even bigger databases and screen thousands of billions of molecules with this method.”
The next thing Poso’s team will be looking at is what is known as the vivid screening method.
“Instead of just predicting a single activity or docking, we can simultaneously predict a number of different properties, such as predicting a docking that can cause side effects, while maintaining solid docking to a good location.”
The research made use of the supercomputing resources, data storage and tool containerisation of the Finnish ELIXIR Node, CSC – IT Center for Science.
Ari Turunen
31.8.2024
Read article in PDF
Citation
Turunen, A., & Nyrönen, T. (2024). New machine learning method speeds up drug screening hundred-fold. https://doi.org/10.5281/zenodo.13691983
More information:
Toni Sivula, Laxman Yetukuri, Tuomo Kalliokoski, Heikki Käsnänen, Antti Poso & Ina Pöhner (2023): Machine Learning-Boosted Docking Enables the Efficient Structure-Based Virtual Screening of Giga-Scale Enumerated Chemical Libraries. J. Chem. Inf. Model. DOI: 10.1021/acs.jcim.3c01239. Available at: https://pubs.acs.org/doi/full/10.1021/acs.jcim.3c01239
HASTEN algorithm
https://github.com/TuomoKalliokoski/HASTEN
University of Eastern Finland
CSC – IT Center for Science
is a non-profit, state-owned company administered by the Ministry of Education and Culture. CSC maintains and develops the state-owned, centra- lised IT infrastructure.
https://research.csc.fi/cloud-computing
ELIXIR
builds infrastructure in support of the biological sector. It brings together the leading organisations of 21 Euro- pean countries and the EMBL European Molecular Biology Laboratory to form a common infrastructure for biological information. CSC – IT Center for Science is the Finnish centre within this infrastructure.

The Bioinformatics Center at the University of Eastern Finland (UEF), led by Virpi Ahola, is developing new applications for analysing biomedical and multimodal data. These can be used to study cancers, metabolics, cardiovascular and neurodegenerative diseases.
Ahola has had a long career in bioinformatics. She was part of professor Ilkka Hanski’s metapopulation biology research group, which sequenced the whole genome of the Glanville fritillary butterfly that was the first reference genome solved in Finland. At the Karolinska Institute in Hong Kong, Ahola analysed gene function in different diseases at the single-cell level, and thereby studied how stem cells can be used to develop new drugs and treatments. She now heads the UEF Bioinformatics Center.
The Bioinformatics Center integrates different types of omics data (genomics, proteomics, transcriptomics) with clinical data and, in the future, possibly also imaging data.
“In addition to the usual omics analyses, we carry out multimodal data analysis for different research groups. This entails combining the analysis of different types of data in order to provide more information than if they are analysed separately.”
Analysis of the multimodal data varies depending on whether the data originates from different patients.
Omics is a research method that aims to analyse all genetically determined variables of a research subject simultaneously. Genomics analyses genetic variation and the function of genes, proteomics focuses on proteins, and epigenetics on the regulation of gene function and the storage of hereditable information without changes in the DNA sequence. Metabolomics, for its part, analyses changes in metabolism caused by disease, diet or medication.
“We are developing bioinformatics services in collaboration with biomedical experts. One focus at the University of Eastern Finland is on understanding the molecular basis of key chronic diseases and improving their prevention and treatment,” Ahola says.
Translational medicine uses basic research in clinical trials, but also patient samples and disease models to identify disease mechanisms and drug targets. The research approach is interdisciplinary, which provides a good starting point for research but also improves treatments for patients.
“What is delaying the era of translational medicine is that we simply don’t know enough. The idea behind combining several different data sources is to obtain more information. The integration is very much computational, and requires CSC – IT Center for Science’s resources and infrastructures like ELIXIR.”
One example Ahola gives is single-cell technologies.
In transcription, the genetic code in DNA is copied into RNA. This is the first phase of protein synthesis. Transcriptomics provides precise information about the gene expression in an individual cell at a given moment.
“The use of single-cell transcriptomics is still expensive. The principles of open science exist, and therefore all data must be shared when it is published. This allows data to be reused, and different data sources to be combined.”
However, the challenge is that data is produced using different technologies.
“Different data sources may have different numbers of cells or different cell types. Which methods should be used to combine the different data? If this could be solved, we could analyse more effectively cell development and specialisation.”
Ahola’s aim is to provide more assistance in the use of computational methods.
The University of Eastern Finland’s Bioinformatics Center provides researchers computing capacity and helps researchers in data pre-processing and analysis, and also assists in the use and installations of different computational methods and software.
“If there are no bioinformaticians in the same team or collaborative teams, researchers are expected to be proficient in computational methods and processing big data.”
Ahola admits that the requirements are tough, for example for postgraduate students. Fortunately, the University of Eastern Finland has taken up this challenge by providing a Computational Biomedicine as an orientation option.
“One example of the reuse of data is Finnish biobanks, which contain the genomes of over half a million Finns. It’s not a simple matter to analyse the biobank data, because the amount of data is insane.”
Ahola is referring to the FinnGen research project, which was launched in autumn 2017. Its main goal is to increase understanding of the causes of diseases and promote their diagnosis, prevention and the development of treatment methods. FinnGen uses samples collected by all Finnish biobanks. By June 2023, more than 553,000 samples were collected for the FinnGen survey. The first phase of the research project lasted six years. There are only a few research projects of this scale in the world.
The research projects can combine genomic data with data from national health registries. Indeed, Finland has exceptionally good resources to carry out genetic research covering the entire population.
Clinical data from longitudinal studies combined with genetic data offers many opportunities. But there must be a lot of data.
“Data collections are needed because no single researcher can collect data from 10,000 or 100,000 individuals. If the dataset is smaller, it may not provide reliable information for studying genetically complex diseases.”
There are many research projects using different data sources underway at the University of Eastern Finland. A project on Alzheimer’s disease at the University of Eastern Finland and Kuopio University Hospital will combine clinical data collected from patient visits with FinnGen data. In this way, researchers are aiming to understand the biological mechanisms leading to the onset of Alzheimer’s disease.
“FinnGen’s biobank is a unique resource that could be used much more in research,” Ahola says.
“Another example of research on Alzheimer’s disease is a project with Rappta Therapeutics and UEF professors Mikko Hiltunen and Annakaisa Haapasalo. This project uses transgenic cell lines to study the effect of different Alzheimer’s treatments on protein function.”
One interesting collaboration project is underway with Academy of Finland researcher Kirsi Ketola.
“The study investigates carboplatin treatment resistance mechanisms in prostate cancer. Carboplatin produces DNA cross-links, which lead to activation of a mechanism that repairs DNA and causes resistance, allowing cancer cells to divide again. The research uses single-cell techniques to measure both gene expression and chromatin changes at the single-cell level.”
Chromosomes are located in the nucleus, in the form of long chromatin strands.
According to Ahola, careful data integration and analysis could promote development of personalised treatment.

Ahola is a tireless advocate for the openness and reuse of data, and for the development of methods and infrastructures that facilitate and encourage this. She cites the European Genome-phenome Archive (EGA) as an example. This is a data archive that makes it possible to share and, with permission, access biomedical data that has already been published.
“The archive contains human genomic data, combined with clinical and other metadata. Since in principle it may be possible to identify a person by genome and phenotype, data sharing is strictly regulated.”
According to Ahola, the EGA allows for data sharing in the appropriate way. This makes it possible to reuse valuable biomedical research data, for instance for creating or testing new research hypotheses.
“Existing data can be approached from a different perspective – for example, patients can be selected using different criteria than in an already published study, or data can be used as part of a larger data set.”
Referring to Biocenter Finland, Ahola says that more should be done together. The centre brings together seven biocentres from different Finnish universities. It should not be impossible to increase collaboration between different biocentres and internationally, for example through the Finnish ELIXIR node CSC.
“ELIXIR is an avenue for us to network and learn from the experiences of other bioinformatics core facilities, and to be part of discussions where research infrastructure issues are brought up and new initiatives are taken.”
Because new technologies produce large and complex data sets, research infrastructures should also include data science experts, not research equipment alone.
“To make effective use of data, the computing capacity offered by CSC, for example, is not enough. Data processing and reuse also requires staff with expertise in the field. As I see it, better resourcing and systematic collaboration between biocentres could substantially facilitate and improve the processing, integration and reuse of large omics data sets.”
Ari Turunen
1.9.2023
Read article in PDF
Citation
Turunen, A., & Nyrönen, T. (2024). Improving breast cancer treatment prognoses with liquid biopsy. https://doi.org/10.5281/zenodo.13691344
More information:
Bioinformatics Center, University of Eastern Finland
https://uefconnect.uef.fi/en/group/bioinformatics-center/
CSC – IT Center for Science
is a non-profit, state-owned company administered by the Ministry of Education and Culture. CSC maintains and develops the state-owned, centralised IT infrastructure.
https://research.csc.fi/cloud-computing
ELIXIR
builds infrastructure in support of the biological sector. It brings together the leading organisations of 21 European countries and the EMBL European Molecular Biology Laboratory to form a common infrastructure for biological information. CSC – IT Center for Science is the Finnish centre within this infrastructure.

In addition to gene variants there are also genomic variants in the locations of the single base pairs in the DNA stretch. The variations cause differences between individuals, but they can also help localise the disease-causing genes. These single nucleotide polymorphisms (SNP’s) can act as markers indicating the disease. The artificial intelligence model developed at the University of Eastern Finland searches breast cancer interacting SNP’s.
The huge amount of genomic data has made possible that researchers can now calculate what kind of gene variants are among the groups who have cancers. Hundreds or thousands of gene variants can have an impact to a single disease.
With statistical methods researchers can estimate how the gene variants of a single person can increase the disease risk. However, variations are also at the single base pair e.g. nucleotides in DNA, known as genetic variants or SNPs. DNA sequence variations occur when a single nucleotide (adenine, thymine, cytosine, or guanine) in the genome sequence is altered. Each SNP represents a difference in a single nucleotide. For example the nucleotide cytosine (C) can be replaced with the nucleotide thymine (T) in a certain stretch of DNA. It means that the base-pair cytosine-adenine can alter for thymine-adenine. Unlike gene mutations, SNP’s are not necessarily located within genes. They can be also in the non-coding regions of the genes or regions between the genes. There are lots of SNP’s in human genome. They occur almost once in every 1,000 nucleotides on average, which means there are approximately 4 to 5 million SNPs in a person’s genome.
SNP’s can be beneficial when searching the genetic risk factors for cancer. In biomedical research, SNP’s are used for comparing regions of the genome between cohorts with and without a disease.
”When SNP’s occur within a gene or in a regulatory region near a gene, they may play a direct role in disease by affecting the gene’s function. We have a novel machine learning approach to identify group of interacting SNPs, which contribute most to the breast cancer risk,” says researcher Hamid Behravan from University of Eastern Finland. He works in Kuopio at the Institute of Clinical Medicine.
”We have published several findings about identifying the genetic component of the breast cancer risk that would reliably distinguish disease cases from healthy controls. Identifying the breast cancer-associated SNPs that reliably distinguish disease cases from healthy controls may be particularly useful in improving breast cancer risk prediction and developing individual treatment strategies”, says Behravan.
The standard hypothesis testing methods have measured only the association between a single SNP with a disease. However, the studies by University of Eastern Finland have demonstrated that risk factors for breast cancer can be predicted better when SNPs are examined as groups that actually interact with each other.
The idea of genome-wide association studies (GWAS) is to identify SNPs on the DNA, which explains the genetic component of the observed phenotype in genotyped people.
”Genome-wide association studies measure the association between an individual SNP’s with a disease, but ignore the possible correlation among SNPs”, says Behravan.
”To date, population based genome wide association studies often use polygenic risk scoring (PRS), which aggregates the effects of risk alleles with the disease. However, PRS assumes that the disease-associated SNPs are independent of each other and the risk effects are linear and additive. We have shown that instead of evaluating the effect of single components (SNPs) one at a time, it would be particularly useful to improve breast cancer risk prediction by studying groups of interacting SNPs using an machine learning.”

The machine learning method developed in Eastern University of Finland has proven to be efficient.
”We found group of interacting SNPs that have true biological meaning. A biological analysis of the identified SNPs reveals genes related to important breast cancer-related mechanisms, such as Estrogen metabolism and apoptosis.”
Elevated endogenous estrogen levels are associated with increased postmenopausal breast cancer risk. There is also strong evidence that tumour growth is not just a result of uncontrolled proliferation but also of reduced apoptosis.
”So, we found genes behind those identified SNPs by our approach, and built gene interaction maps from those genes, and then we observed several separate networks related to breast cancer, such as Estrogen metabolism and apoptosis network. So not only our system found group of interacting SNPs with highest breast cancer risk predictive potential, but also those identified SNPs were behind a number of important biological entities in breast cancer. Therefore, interacting SNPs indicates both SNPs selected together, and SNPs involve in cancer related biological networks.”

The machine learning approach developed in Kuopio is based on a gradient tree boosting method followed by an adaptive iterative search algorithm. Boosting is the first module and searching the second module.
Boosting is an algorithm and method of converting weak learners into strong learners. Algorithm begins by training a decision tree. Weak classifiers are added sequentially to correct the errors made by existing classifiers towards building a strong classifier.
”The first module evaluates the accuracy of features, in this case the SNPs, on the breast cancer risk prediction. The first module provides an initial list of candidate SNPs with breast cancer-risk predictive features. ”
”The second module then uses the candidate SNPs in an adaptive iterative search to capture the interacting features. The best identified interacting SNPs are then used to predict the breast cancer risk for an unknown individual at the testing phase using a machine classifier. Classifier was trained to distinguish the breast cancer cases (positive samples) and healthy controls (negative samples).”
Since cancer is a multi-factorial disease caused by lifestyle, genetic, and environmental factors, individual analysis of the sources of genetic variants may not be enough to create a comprehensive view of the disease risk. According to Behravan other sources of data is needed.
“We are developing integrative machine learning approaches to combine different sources of data, such as demographic data.”
Ari Turunen
18.5.2020
Read article in PDF
Citation
Ari Turunen, Hamid Behravan, & Tommi Nyrönen. (2020). Searching markers for breast cancer by machine learning. https://doi.org/10.5281/zenodo.8131311
More information:
School of Medicine, University of Eastern Finland
https://www.uef.fi/en/web/laake
CSC – IT Center for Science
CSC – The Finnish IT Center For Science is a non-profit, state-owned company administered by the Ministry of Education and Culture. CSC maintains and develops the state-owned, centralised IT infrastructure.
http://www.csc.fi
https://research.csc.fi/cloud-computing
ELIXIR
ELIXIR builds infrastructure in support of the biological sector. It brings together the leading organisations of 21 European countries and the EMBL European Molecular Biology Laboratory to form a common infrastructure for biological information. CSC – IT Center for Science is the Finnish
centre within this infrastructure.
https://www.elixir-finland.org
http://www.elixir-europe.org

During metabolism, molecules are created and broken up, and some of these have an effect on health. Their concentrations are measured from blood, urine and tissue samples. Metabolomics enables the detection of biomarkers that can give an indication of a person’s lifestyle, diet, illnesses and the effects of medication and other xenobiotics.
A single measurement yields information about hundreds, possibly thousands of metabolic products (metabolites). The same measurement also reveals external compounds, such as medication, environmental toxins and stimulants.
“Metabolomics enables the comprehensive observation of metabolic phenomena. This gives us an extremely good idea of the body’s biochemical state,” says Professor Seppo Auriola, of the School of Pharmacy of the University of Eastern Finland. Auriola is also the head of the LC-MS Metabolomics Center in Kuopio, which is part of Biocenter Finland’s infrastructure network.
One analytical tool used in metabolomics consists of a combination of liquid chromatography and high-resolution mass spectrometry. Liquid chromatography-mass spectrometry (LC-MS) is used to screen and identify compounds in samples. Liquid chromatography separates compounds on the basis of their fat solubility, while a mass spectrometer is used to measure exact molecular weights. The term ‘molecular feature’ — meaning the signal generated by a compound during ionisation and measurement — is used in metabolomics.
“In metabolomics, we attempt to find the statistically different molecular features be- tween the different groups being studied. These could be ‘ill versus healthy’, for example. Metabolomics also involves trying to identi- fy such molecular features as molecules, by means of various spectroscopic techniques. Our lab uses mass spectrometry for this,” says Laboratory Manager Marko Lehtonen.
Metabolomic measurements can be divided into untargeted and targeted methods. The starting point with untargeted analysis is trying to find as many metabolites as possible from a sample. A targeted analysis, on the other hand, focuses on a limited group of known metabolites.
Untargeted measurements can provide a good basis for creating a hypothesis.
“The first screening reveals metabolic products that have changed, for example after the first exposure. Then we start thinking about the theory and try to understand why this occurred,” says Auriola, who focuses on analytical chemistry and measurement techniques used on samples.

As metabolomics measurement methods become increasingly efficient, more accurate measurement data will be obtained on the effects of people’s lifestyles and environment on their health. Diet is a key external factor affecting a person’s metabolism.
“Metabolomics is ideally suited for dietary studies. Analyses provide clear markers on what a person has been eating and how this affects their endogenous compounds,” says Auriola.
Endogenous substances comprise all compounds produced by the body, such as hormones and transmitters. These include endocannabinoids, steroids and endorphins.
“We can examine whether a positive lifestyle change also affects metabolite levels. This would be an indication that the body is doing better. Metabolomics can also be used to detect disease biomarkers at an early stage, before diseases actually occur.”

Another important area suitable for metabolomics analysis is exogenous compounds – that is, compounds from outside the body – such as medication and environmental toxins. This involves looking for biomarkers to show how a medication is affecting the body.
Auriola thinks it is also important to ask why a certain substance affects us negatively. We can also look for such biomarkers in metabolic products that indicate human susceptibility to a xenobiotic, or the effect of a xenobiotic on humans. These include the effect of pesticides on human health.
“We do not understand the mechanisms of all pesticides. As we develop more advanced methods, we will obtain a clearer picture of how humans are affected by exposure to certain substances. We can measure the level of environmental toxins and the corresponding level of endogenous metabolites in human populations.”
Studies by the University of Eastern Finland and Karolinska Institute examined the effect of polychlorinated biphenyls (PCBs) on mouse offspring. It has long been known that these substances have most effect in the early stages of development. Animal tests have revealed developmental disturbances in various organs. When the metabolomics profiles of offspring were studied, certain changes were found in males. However, such changes were absent from females. The metabolite changes caused by PCB compounds in males affected the liver and nervous system.
“We will be able to monitor changes in the following generation without knowing in advance what we should be looking for,” says Auriola.
“By means of LC-MS equipment and the untargeted metabolomics method, we can find changed molecules among the thousands of molecules we are measuring.”
Molecular characteristics are identified by means of algorithms. The study by the University of Helsinki and University of Eastern Finland involved the analysis of compounds sampled from neonatal umbilical cords. Pre-eclampsia (a type of pregnancy disorder) is one of the commonest causes of premature birth and maternal deaths during childbirth. The precise causes of the condition are unknown. It is known to increase the subsequent risk of cardiovascular disease in both mother and child. However, we do not know how the changed metabolism of mothers with pre-eclampsia affects the metabolism of newborns. Metabolites in the umbilical cord tissue of newborns were analysed with the LC-MS equipment in Kuopio, comparing the results between those who had pre-eclampsia, and healthy controls. The study also made use of material by the Finnish Genetics of Pre-eclampsia Consortium. All Finnish university hospitals contributed to the assembly of the FINNPEC cohort.
“Many different research projects use the services of our laboratory,” says Marko Lehtonen. For example, research samples related to diabetes and Alzheimer’s have been studied in the laboratory. According to Lehtonen, metabolomics will provide more information that can be used to study rare and hereditary diseases.
“Newborns are screened with targeted measurements. This is also an excellent example of an area where metabolomics can be very significant. It will save society money. Based on certain biomarkers found in the body, hereditary diseases among newborns can be identified,” says Lehtonen.
Not all metabolites can be measured using the current equipment.
“Compounds are present in a sample in such small concentrations that we also need targeted methods. As equipment technologies develop, untargeted methods may become efficient enough to reveal compounds that could not be detected earlier. This will ensure that we do not lose other information from a sample. Targeted methods only track specific compounds and are blind to all other data,” says Lehtonen, stressing that the untargeted method provides plenty of data which can be used to investigate new issues.
As equipment becomes more accurate and sensitive, we will be able to observe really small concentrations. We’re talking about picograms and nanograms per litre. One picogram is one trillionth of a gram, and one nanogram is one billionth of a gram.
“We can currently see thousands of compounds, but many important molecules remain below our observation horizon,” says Seppo Auriola.
“For example, more and more steroids will be identifiable in samples as measurement technology improves. This will enable us to study endogenous steroids and their metabolites.”
These include sex hormones, such as testosterone and progesterone, and corticosteroids (e.g. cortisone and cortisol).
“We are involved in a project studying the effect on steroids, and other metabolic characteristics, of exercise and lifestyle choices among children and young people. Other studies involve trying to find compounds that affect steroid metabolism selectively, and may therefore be used as medication.”
Metabolic products studied by means of mass spectrometry are first ionised. These ions are separated from each other on the basis of their mass-to-charge ratio.
According to Lehtonen, the identification of molecular characteristics is the last stage in metabolomics, based on the attempt to clearly identify a statistically different metabolite between two or more groups being studied.
Lehtonen would prefer a model in which laboratory and research data were used as a basis for machine learning.
“Although these spectra can be compared to fragmentation spectra found in mass libraries, the problem is that identification still involves high amounts of manual work. It would be ideal to have a learning algorithm that automatically sought fragmentation spectra and compared them to what was in the library. Such a model could accurately define compounds identified previously in a laboratory. This would be of considerable help to research,” says Marko Lehtonen.

According to Seppo Auriola, we should make more use of measurement data. The problem lies in the availability and uniformity of data.
“ELIXIR has several processes underway to unify the use of various tools in metabolomics, in order to render them compatible. Measurement data should also be archived.”
According to Auriola, in addition to being used for scientific publications, most original measurement data should be made available to other researchers for further analysis.
“The second phase involves adding metadata, determining what kind of data should be available on the samples, how they have been measured and verified, and what kinds of groups have been studied. How will this data be conveyed along with the measurement data? The crucial issue is that data that took a lot of work to obtain could be used for later analyses and comparisons.”
Another challenge involves the available tools: how to pick and identify compounds, and what software is required to calculate the results, to identify molecules and compare their numbers in various samples. How are facts presented? How are changes in metabolite levels obtained, how are they found on the metabolite map, where are the compounds located on metabolic routes, and how are their concentrations changed? How can this be described clearly and how should the result be presented? A fair amount of work is required to unify all this. All the related data and tools are currently fragmented between various people’s software,” says Auriola.
Ari Turunen
8.4.2020
Read article in PDF
Citation
Ari Turunen, Seppo Auriola, Marko Lehtonen, & Tommi Nyrönen. (2020). Metabolomics measures and analyses metabolic changes caused by illness, diet or medication. https://doi.org/10.5281/zenodo.8131264
More information:
LC-MS Metabolomics Center
University of Eastern Finland
CSC – IT Center for Science
CSC – The Finnish IT Center For Science is a non-profit, state-owned company administered by the Ministry of Education and Culture. CSC maintains and develops the state-owned, centralised IT infrastructure.
http://www.csc.fi
https://research.csc.fi/cloud-computing
ELIXIR
ELIXIR builds infrastructure in support of the biological sector. It brings together the leading organisations of 21 European countries and the EMBL European Molecular Biology Laboratory to form a common infrastructure for biological information. CSC – IT Center for Science is the Finnish
centre within this infrastructure.
https://www.elixir-finland.org
http://www.elixir-europe.org