• Suomi
  • English

Studying the human microbiome is a key towards holistic understanding of our health

Together, microbes and their interactions with the host are called a microbiome. Microbiome composition is unique to each person. The microbiome helps the body’s defence system fight infections, for instance. If the microbiome is disturbed, the body could be exposed to diseases such as diabetes.

Leo Lahti, Associate Professor of Data Science from the University of Turku, is developing the machine learning models with his research group and partners for screening microbial signatures from large-scale data collections.

‘The microbiome is just one element in development of diseases, but it is an element that we haven’t been able to study as extensively before, because the effective data collection methods we have now have become available only very recently,” says Lahti.

The DNA of microbial samples is sequenced and the species of microbes in the sample are identified. The sample can come for example from the environment, or parts of the human body.

Lahti’s research group has characterized microbes in different habitats and ecosystems together with experimental researchers.

”Scientists are starting to have a pretty good idea of what kinds of species and groups of bacteria can be found in different environments. We have learned quite a lot about their function, tasks, and role in metabolism and the chemical compounds they produce.”

All bacteria and single-celled archaea are microbes. Microbes also include algae, protozoa, yeasts and moulds. According to Lahti, microbial research has advanced rapidly after the prices of sequencing technologies started coming down. The DNA of microbial samples is sequenced and the species of microbes in the sample can be identified if they have been characterized earlier.

‘The sample can come from the environment, parts of the human body, or just about anywhere. We study bits of DNA and try to put the puzzle together. That way we can determine the bacterial composition of the given sample. We can even trace completely new bacterial genomes and discover previously unknown species,” says Lahti.

Microbes of the human body predict disease


The most diverse microbial ecosystem in our bodies thrives in the gut. According to current knowledge, a typical adult carries around 1 to 2 kg of bacteria on average, which is slightly more than the amount of human cells. There are many levels in the human microbiome, and the levels form diverse ecosystems. The composition of a person’s microbiome is affected by their genome and living environment. Habits such as diet and an outdoorsy lifestyle have been shown to affect the composition of a person’s microbiome. A person’s microbes can be used to deduce whether the person is a vegetarian or an omnivore, for example. Lahti’s research group has also been involved in studying the microbial compositions of different population groups.

“Identifying the species in the human microbiome in general was a huge task. The composition can vary geographically, i.e. according to where people live and whether they belong to the indigenous population, or whether they live in the city or the countryside, or their standard of living.”

Studying the composition helps with understanding how microbes are connected to a person’s health.

“In this context, linking a person’s current state of health with their future health status takes centre stage. Is it possible to infer something on a person’s current or even future state of health by looking at their microbiome? And if it is possible, can the person’s health be affected by modifying the microbial composition, and what kinds of risks or ethical questions are related to this? Computational and machine learning techniques are in a key role in extracting information from the complex data sets that are now being generated.”

In Finland, stool samples were collected in connection with the Finnish Institute for Health and Welfare’s (THL) FINRISK population survey in 2002. The many years’ worth of follow-up data from these individuals is now enabling the study of the links between microbiome composition and long-term changes in health status. Lahti says the extensive Finnish population cohort is unique at a global level.

‘The data is very valuable; carrying out a study like this would be hard in many countries because similar comprehensive population register data is often not available. We now have a huge number of samples and associated health information that we can use to study the link between microbial composition and population health.”

According to Lahti, some microbial analyses can be used in diagnostics. They could be used to identify specific diseases or to identify microbes that predispose humans to the risk of certain cancers. For example, Helicobacter pylori found in the stomach may increase the risk of stomach cancer.

“Certain groups of bacteria are found in the gut that are statistically linked to the risk of developing a disease later on. We have recently discovered that they can indicate an increased risk of mortality, liver disease, and type 2 diabetes, for instance. We do not yet understand the causal relationships between these observations, but we can see the signals years before a person gets sick.”

E. coli bacteria, magnified 10,000 times. Photo: Agricultural Research Service, U.S Department of Agriculture.

Research results have been published on mortality rates, liver diseases, and type 2 diabetes.

‘These are significant disease groups that are studied frequently anyway. Despite the long research traditions associated with them, microbiome studies have brought a new perspective into understanding these diseases . Microbes play a part in our metabolism. The compounds produced by microbes in the body can have a significant role in these diseases and in immune systems.”

“After we learn more about what these microbes do and what microbes are found in our bodies, we will have a better chance at understanding the mechanisms that affect the development of diseases. This can help with developing new ways to curb the effects of diseases or prevent the risk of developing them in the first place.”

According to Lahti, there is currently huge medical interest in microbiomes, because lifestyle changes and many common diseases have been found to be linked to changes in the microbial balance. In addition to this, antimicrobial resistance is a growing health problem, among others. It refers to the increased ability of bacteria to withstand antibiotics, and it is predicted to be the leading cause of death in the coming decades.


Machine learning models

Lahti’s research group extracts information from big data and merges information from different sources. The size of datasets is constantly increasing, and they need to be structured and organized in order to make them understandable. Such analyses have many computational steps. First, the data must be pre-processed and the DNA fragments must be combined in order to see which species they come from and in which proportions they occur in different samples. After this, the connections between the microbial composition and the living environment or state of health of the host organism can be studied in more detail.

“Data can be complicated. It can be hierarchical and have temporal or spatial structure. So, we need new computational methods. For example, machine learning methods are useful because they reduce the need for human intervention, which means we can automate a significant part of the processing and transfer it to machines.”

According to Lahti, methods that can assist people in making quantitative conclusions play a big part in biomedical research.

“Data is collected in databases. And when we analyse new samples, we want to combine the sample data with the data already in the databases. The newly observed data must be interpreted in the context of the previously collected data and accumulated knowledge.


Allas is CSC’s data management system that research groups can also use to share data.

According to Lahti, when studying species of microbes, it is important to understand how they work together as an ecosystem and interact with the human body. Sequencing the genomes of groups of microbes and data integration can require massive computing and storage resources.

“We need the resources provided by CSC to obtain comprehensible information from the large-scale sequencing data that we can then analyse statistically. We increasingly use these services as a platform for cooperation. We can build workflows with other research groups and make the data available through CSC, and the data analysis platform is also in one place, on the CSC servers. It is also important that the bioinformatics data resources provided by ELIXIR can be accessed via CSC. We are also increasingly using these services to provide training in computational research methods.”

Ari Turunen


Read article in PDF

University of Turku


CSC – IT Center for Science

is a non-profit, state-owned company administered by the Ministry of Education and Culture. CSC maintains and develops the state-owned, centralised IT infrastructure.





builds infrastructure in support of the biological sector. It brings together the leading organisations of 21 Euro- pean countries and the EMBL European Molecular Bio- logy Laboratory to form a common infrastructure for biological information. CSC – IT Center for Science is the Finnish centre within this infrastructure.