• Suomi
  • English

Efficient processing and sharing of data improving disease diagnosis and treatment

Next-generation analysis methods of genes and RNA molecules enable faster and easier analyses. Data can also be stored well and shared with research teams through the Allas user interface of CSC.


Next-generation sequencing (NGS) methods are used to study variations in the human genome and changes in genetic expression. The analysis of billions of sequence fragments provided by the NGS methods can be performed in a single computer run.

The new methods enable us to study numerous genes and targets from different samples simultaneously. This means that we can quickly analyse individual cells, such as cancer cells. We can also analyse cell-free DNA from blood plasma, indicating quickly and reliably whether the selected treatments have been effective and specifically whether any metastases remain.

Platforms used by the Institute for Molecular Medicine Finland (FIMM) and CSC use a range of algorithms to analyse data produced by sequencing methods (exomes, genomes and transcriptomes). One of the most important ones is Broad Institute’s Genome Analysis Toolkit (GATK). This is used to look for gene variants and identify changes in the DNA or RNA sequence in the cell line. GATK analysis software has become the de facto bioinformatics standard in the scientific community. GATK software can also be run on the superfast Dynamic Read Analysis for GENomics (Dragen) platform. The Finland ELIXIR Node CSC maintains Dragen, in collaboration with FIMM. Dragen perfroms the computing-intensive primary analysis of the sequence data, which is followed by the downstream bioinformatics analysis by the researchers. This make the CSC storage capacity beneficial, because analysed data will not fit into any conventional computer, instead it is shared directly to users through the Allas service. Cooperation between CSC and FIMM is crucial in terms of completing the analyses quickly.

“When we have at our disposal high-capacity sequencing platforms, algorithms and computing power, we get quick results. Today we can analyse one genome in a single day, as opposed to several weeks using older systems,” says Pekka Ellonen.

Mr Ellonen is Head of Laboratory at the Institute for Molecular Medicine Finland (FIMM). The unit uses modern methods to provide the research community genomics (DNA) and transcriptomics (RNA) analyses. The unit receives samples from various research projects.

“We agree with the researchers on the most suitable methods and customise the optimal toolkit to test their hypothesis. Such methods may include exome sequencing, genome sequencing, the sequencing of various RNA molecules (transcriptome), and genetic expression,” says Ellonen.

These methods are able to determine a tissue sample’s genes (genomics) or identify all genes (transcriptomics) and proteins (proteomics) present in the tissue. The sequencing of the exome, that is, the regions that code the proteins can help in the study of hereditary diseases, congenital developmental disorders and cancer. Genetic expression is regulated accurately in cells and any changes may lead to illness. The research can focus on, for example, the differences between cancerous and healthy tissue.

High throughput sequencing equipment produces 1–20 billion sequence fragments, depending on the type of run. The NovaSeq6000 can have four runs of different capacity. The lowest-capacity run can sequence a couple of dozen exomes. The exome is composed of all of the exons within the genome, corresponding to about one per cent of the entire genome The highest-capacity run can analyse 24 genomes at a time. Whole genome sequencing (WGS) covers the entire genome, 3.1 billion bases. Some 1.2 billion short sequences are created to analyse a single human genome, which are combined with algorithms to create the genome being analysed. In order to obtain a reliable answer, the base pairs are read several times (reading depth). When looking for changes typical for cancer, for example, the reading depth may be 500–1000. This means that the analysis may focus on an exome, for example.


Analytics of a single cell


Next-generation sequencing methods enable the study of complex biological systems. According to Ellonen, by far the greatest development in bioinformatics in recent years has been the analysis of single cells. Single cells are analysed in collaboration by the Single-Cell Analytics (SCA) unit of the Institute for Molecular Medicine Finland and the sequencing unit.

Each cell contains the individual’s every gene, but certain genes are only expressed in certain cells and often only under certain conditions. Genetic expression and protein production in cells varies at different stages of development and as a result of illnesses. This causes changes in the cellular and tissue functions. The analytics of a single cell does not actually refer to a single cell.

“Now we are able to study, for example, cancer cells as individual targets. We cannot reach a reliable result merely by determining the base sequence or genetic expression in a single cell; we must study samples of thousands or tens of thousands of cells,” says Ellonen.

Single-cell RNA sequencing (scRNA-seq) can reveal regular inter-gene interaction, cell lineages and differences and the cell’s framework in its environment.

Single-cell sequencing also shows various and even new types of cell and genetic expression data about the functioning. Single-cell DNA sequencing, on the other hand, provides information about mutations taking place in small cell populations among normal cells. Single-cell accuracy provides information on the genetic differences of tumours, which is helpful in their treatment.

“The number of living cells in the sample being studied is verified in the laboratory, after which each cell is separated into its own droplet, enabling single-cell DNA or RNA molecules to be marked with molecule- and cell-specific DNA barcodes. The molecule-specific, cell-specific and eventually the sample-specific DNA barcodes enable both the identification of molecules in each cell and a financially efficient sequencing,” says Pirkko Mattila, head of the Single-Cell Analytics (SCA) unit of the Institute for Molecular Medicine Finland.

“One sequencing run will profile thousands of cells at a time from multiple samples. This results in, from the analysis of thousands or up to hundreds of thousands of cells, a single-cell resolution, enabling us to study the properties of a single cell.”



Liquid biopsy


Liquid biopsy refers to taking a liquid sample, containing cells or parts of cells, from living tissue, such as blood. Liquid biopsy is a promising monitoring tool for cancer treatment without invasive surgical operations.

“We create sequencing libraries from genomic regions interesting from the viewpoint of various cancers,” says Pekka Ellonen.

Liquid biopsy can also be used for identifying cancer in its early stages. A blood sample provides information on tumour blood cells or DNA fragments they have secreted into the bloodstream.

“Tumours are usually in a difficult place, requiring a surgical procedure to remove them or take a sample of them. When tumours grow uncontrollably, there is a higher than normal amount of cell deaths. Dying cancer cells release DNA fragments into the bloodstream. These DNA fragments are collected for sequencing from a blood sample’s cell-free fraction, plasma and serum. Analysis of the sequencing results can show whether the bloodstream contains DNA fragments containing changes typical of cancers,” says Ellonen.

Cell-free DNA (cfDNA) refers to DNA circulating in the blood stream outside the blood cells. CfDNA fragments enter the blood circulation either due to apoptosis or necrosis. Normally, these fragments are cleaned up by macrophages, but the overproduction of cells in cancer causes cfDNA in the blood stream. Cell-free DNA ends up to the bloodstream also in healthy cells. A part of the cell-free DNA in the cancer patients originates from the tumor. This circulating cfDNA and especially the fraction originated from tumor (circulating tumor DNA, ctDNA) is a promising research subject for the projects with the goal of individual cancer treatments. In addition to blood samples cell-free DNA can be analysed in urine, spinal fluid and saliva samples. The Sequencing Unit of the Institute for Molecular Medicine Finland (FIMM) is looking for remnants of such DNA by means of sequencing.


Ellonen says that liquid biopsy is used extensively and it is related to many new research projects. Liquid biopsy can be used not only in basic research but also to make a treatment plan and to monitor treatment effects or cancer recurrence. Being able to take many blood samples at different times will help doctors understand what kind of molecule changes have taken place in the body.

“New genetic markers may be identified and, in the best-case scenario, an accurate treatment method can be selected on the basis of the observed mutations. Alternatively, you may know what you are looking for, that is, you are monitoring for residual signs of the disease in the body, in other words whether the surgical procedure removed the cancer completely.”

Pekka Ellonen is enthusiastic about CSC´s Allas storage service’s user interface, enabling laboratories and research institutions to share pre-processed sequencing results and molecular data with researchers, research teams and consortia. Allas provides 12 petabytes of storage space. The data is securely available through the Web. Data processing can be performed using standard programming interfaces from anywhere.

“Public money produces data, which should be shared in time with the wider scientific community, obviously appropriately pseudonymised. The user interface enables the sharing of large materials, such as the cohort material of useful genomic data.”


Ari Turunen


Read the article in PDF


More information:





CSC – IT Center for Science

CSC – The Finnish IT Center For Science is a non-profit, state-owned company administered by the Ministry of Education and Culture. CSC maintains and develops the state-owned, centralised IT infrastructure.


ELIXIR builds infrastructure in support of the biological sector. It brings together the leading organisations of 21 European countries and the EMBL European Molecular Biology Laboratory to form a common infrastructure for biological information. CSC – IT Center for Science is the Finnish
centre within this infrastructure.