20.10.2022Reusable, accurately described and high-quality data – tools created by the research community for agile data management
29.9.2022Gene sequencing used for study of structure and functioning of microbial communities in oceans
1.9.2022Antibiotic-resistant bacteria are a global problem
23.8.2022Personalised medicine against cancer and viruses
30.6.2022Studying the human microbiome is a key towards holistic understanding of our health
23.5.2022FINRISK: one of the world’s longest-running population survey time series
8.4.2022Combining biobank data with data from health registers enables research towards personalised treatment
3.3.2022Finnish research team sequences the genomes of thousands of individuals with diabetes to look for genetic risk factors
10.2.2022BIGPICTURE helps pathology go digital
30.12.2021Sensitive data infrastructure
23.11.2021In the future, an algorithm may diagnose glaucoma from fundus photos
26.10.2021Patient data creating better artificial intelligence models
15.9.2021Teaching an algorithm to identify cancer from sequence data
3.12.2020Efficient processing and sharing of data improving disease diagnosis and treatment
10.11.2020Bioinformatics to revolutionise healthcare: Efficient data processing speeds up diagnoses and enables personalised drug treatments
27.8.2020Tissue samples into digital images, interpreted by artificial intelligence
9.6.2020Digital pathology speeds up diagnosis
18.5.2020Searching markers for breast cancer by machine learning
8.4.2020Metabolomics measures and analyses metabolic changes caused by illness, diet or medication
1.3.2020Deep learning algorithms help in breast cancer screening
13.2.2020All breast cancer risk factors evaluated with AI
6.2.2020A dog can smell diseases
2.12.2019ELIXIR Compute Platform for life and health sciences
18.11.2019New bioinformatics methods and measurement technologies call for continuously updated courses and analysis software
30.10.2019No need to turn up personally: SisuID improves electronic authentication
30.9.2019Risk assessment of cardiovascular diseases for all citizens
20.8.2019Federated user ID management: a single identity giving access to numerous bioinformatics services
4.9.2019Targeted treatment for venous diseases with vascular system modelling
4.7.2019Research on rare genetic disorders can be utilised in understanding the mechanisms behind even more common diseases
3.6.2019VEIL.AI: patient data in a veil
20.5.2019Biocenter Oulu: technology services for biomedical research
23.4.2019Mouse models provide insights into the causal mechanisms of diseases
4.3.2019Euro-BioImaging: imaging infrastructure
26.2.2019Imaging helps to highlight significance of data
14.1.2019Data harmony and standards: data must be processed, described and stored by uniform means
10.12.2018Hundreds of genes could lie behind a single disease
5.11.2018Help from the Finnish genome for the prevention of cardiovascular diseases
8.10.2018Disease prediction models are becoming more accurate thanks to the computational methods
11.9.2018Genetic data under control and in the desired format
23.8.2018Massive data management project: Finns’ heredity is collected and safeguarded
14.6.2018Half of all drug ingredients affect only three protein families
12.6.2018Looking for a good drug
29.5.2018Quick DNA analysis of patient samples with artificial intelligence
7.5.2018Secrets of the intestines
4.4.2018Algorithm determines the appropriate drug
19.3.2018Bank of million patient samples
20.2.2018Mapping the genomes of all organisms enables the development of new vaccines and medicines
7.2.2018Ordered and secured
2.11.2017Striving for a national service to utilise genomic data in health care
11.8.2017Better harvests on the horizon? Data will also be harvested
19.6.2017Microbes and climate change
21.5.2017Storing the whole genome of the Finnish population? The data will benefit disease research
6.4.2017”Smart life insurances” offered: human biological data is only useful when interpreted correctly
15.1.2016New drug molecules through determining the structure of proteins
26.10.2015BBMRI.fi: an IT Infrastructure for shared biobanks
24.9.2015Fighting cancer with mathematics
10.8.2015Saimaa ringed seal aids the study of population genomes
1.8.2015Webmicroscope stores tissue samples in the cloud
15.7.2015Pups and Pooches Behind Genetic Discoveries in Human Diseases Canine Genetic Research Benefits from ELIXIR Databases
5.6.2015Life sciences in European cloud
Proper data management enables high-quality research. Data management is now guided by the FAIR principles, which have been put in place to ensure that data is findable, accessible, interoperable (capable of being integrated with other data) and reusable. Under these principles, the ELIXIR infrastructure offers useful data management tools that support researchers at various stages of the process.
“Good scientific practice involves making sure that data is well documented and remains usable throughout the research process, and in such a way that results can be verified later. It is important that researchers and information systems are able to find and access compatible and reusable research outputs. To ensure this, the FAIR principles were set out by a consortium of scientists and organisations in 2016,” explains CSC’s data management specialist Minna Ahokas.
“With instructions and tools provided by ELIXIR, it is easier for researchers to make their data findable, accessible, interoperable and reusable.”
The RDMkit website, created in cooperation with the ELIXIR nodes of the member countries, aims to support and harmonise data management practices in Europe.
RDMkit includes instructions and tips concerning the entire life cycle of data – from data management planning and data analyses right up to publication and reuse.
“RDMkit has been implemented in a way that anyone dealing with data is able to access the tools. It offers not only instructions but also links to services that researchers and any support personnel may need at various stages of data management.”
Finland’s ELIXIR node, CSC, is one of the parties producing content and maintaining the toolkit.
Ahokas stresses that the site was designed right from the start transparently in collaboration with researchers and data management experts. Anyone belonging to the ELIXIR infrastructure can participate in the development. Everything has been documented in the GitHub portal designed for software development projects.
“Data can be viewed in RDMkit throughout its life cycle. There are instructions for data collection, description and publication.”
RDMkit was developed in the ELIXIR-CONVERGE project, the aim of which is to help harmonise life science data management across Europe. There was a clear demand for unifying data management, as research projects are as a rule international, with data being transferred across national borders.
“RDMkit is the first major international attempt to unify data management practices and instructions to enable reusable data that is also sufficient in quantity and quality and described in a uniform way. Data management entails the planning of data collection, processing and description: how and where it is stored and how version management is handled. Whether some data should be stored for the long term also needs to be considered. And decisions must also be made about what data can be deleted.”
Ahokas emphasises the importance of offering researchers services that help them comply with good data management practices.
“We are trying to avoid such situations as researchers being presented with some new lists of data management requirements every time they apply for funding, without being offered the services needed to ensure that they can comply with these. If we demand that research project data management follows the FAIR principles, then we must offer sufficient support and services to produce FAIR data.”
CSC, Finnish research organisations and universities have created a national data support network. It supports cooperation between CSC and organisations, and provides a forum for open discussion, questions and peer support.
And at Aalto University, for example, for each discipline is assigned a data agent – that is, data management experts with experience in research. They will collaborate with researchers to manage data.
At the time RDMkit was launched, data management came under a new kind of pressure owing to the COVID-19 pandemic.
“When RDMkit was almost ready, the world was hit by the pandemic. It was then that we realised in the ELIXIR-CONVERGE project that data related to COVID and its requirements also had to be taken into account. That is why instructions were added in RDMkit specifically related to the processing of COVID-19 data, and the COVID-19 Data Portal was set up.”
RDMkit and ELIXIR’s data management instructions have also been adopted as part of the data management of EU’s Horizon Europe financial instruments.The RDMkit toolkit is recommended for use in the biosciences, and has also attracted interest worldwide. There are a considerable number of US users, and that country’s primary federal agency for conducting and supporting medical research, the National Institutes of Health, is interested in collaborating with the ELIXIR infrastructure.
RDMkit is a general collection of data management instructions with links to research data management tools such as IceBear.
“IceBear was originally designed for data management in crystallography and structural biology,” says Lari Lehtiö, Professor of Structural Biology at the Faculty of Biochemistry and Molecular Medicine.
Lehtiö is also the head of the Oulu unit of Instruct, a research infrastructure for structural biology. Structural biology unit of Biocenter Oulu, especially through efforts of Professor Rik Wierenga and the developer Ed Daniel, designed IceBear, a data management application for structural biology. The application has also been developed in the EOSC-Life network coordinated by ELIXIR. Instruct is part of this network. With the support of the EOSC-Life project, IceBear was transferred to the cPouta cloud service maintained by CSC.
Biocenter Oulu crystallises proteins and other macromolecules. The amino acid chain of proteins is folded in a 3D helical structure that is unique to each protein. As there is a huge number of ways in which the folding can occur, researchers need to study protein structures in laboratory conditions through experimentation, by crystallising. The three-dimensional structure of a protein can be determined on the basis of how X-ray radiation scatters from the protein crystal. Using the scattering data, mathematical transformation can be used to calculate the protein’s electron density map, indicating the location of atoms in the protein. These days structural research also makes use of cryogenic electron microscopy. This involves a frozen sample made of proteins being bombarded with electrons, with millions of individual 2D images of the proteins subsequently being combined into a 3D structure.
Automatic imaging equipment is used in the crystallisation of proteins. Proteins are crystallised in various solutions, with crystals being formed under certain conditions.
“A protein is crystallised in a droplet that is followed by imaging. Plates may contain up to 300 droplets, and the plates can number several hundred. With pictures of these being taken every day accumulating a lot of data. Crystallisation is usually carried out by robots,” says Lehtiö.
The crystal samples are picked manually from the microscope and placed in liquid nitrogen tanks. With the IceBear software, it is now possible to automate entries made of the samples and the data they contain.
“Often samples are sent to another infrastructure, to other synchrotrons [cyclic particle accelerators] in Europe. Thanks to IceBear, we can find out that eventually happened to the sample elsewhere. Metadata is transferred between databases used by European synchrotron and IceBear. Samples contain a fair amount of metadata, such as which protein the sample contains how it was crystallised and what the conditions were during crystallisation.”
IceBear does away with manual logs. Data can be transferred without the use of forms, and the links are created securely in the barcodes for each of the samples.
“You only need to do it once. The value of this application is that researchers’ time is saved even years from now,” says Lehtiö.
Read article in PDF