Proper data management enables high-quality research. Data management is now guided by the FAIR principles, which have been put in place to ensure that data is findable, accessible, interoperable (capable of being integrated with other data) and reusable. Under these principles, the ELIXIR infrastructure offers useful data management tools that support researchers at various stages of the process.
“Good scientific practice involves making sure that data is well documented and remains usable throughout the research process, and in such a way that results can be verified later. It is important that researchers and information systems are able to find and access compatible and reusable research outputs. To ensure this, the FAIR principles were set out by a consortium of scientists and organisations in 2016,” explains CSC’s data management specialist Minna Ahokas.
“With instructions and tools provided by ELIXIR, it is easier for researchers to make their data findable, accessible, interoperable and reusable.”
The RDMkit website, created in cooperation with the ELIXIR nodes of the member countries, aims to support and harmonise data management practices in Europe.
RDMkit includes instructions and tips concerning the entire life cycle of data – from data management planning and data analyses right up to publication and reuse.
“RDMkit has been implemented in a way that anyone dealing with data is able to access the tools. It offers not only instructions but also links to services that researchers and any support personnel may need at various stages of data management.”
Finland’s ELIXIR node, CSC, is one of the parties producing content and maintaining the toolkit.
Ahokas stresses that the site was designed right from the start transparently in collaboration with researchers and data management experts. Anyone belonging to the ELIXIR infrastructure can participate in the development. Everything has been documented in the GitHub portal designed for software development projects.
“Data can be viewed in RDMkit throughout its life cycle. There are instructions for data collection, description and publication.”
RDMkit was developed in the ELIXIR-CONVERGE project, the aim of which is to help harmonise life science data management across Europe. There was a clear demand for unifying data management, as research projects are as a rule international, with data being transferred across national borders.
“RDMkit is the first major international attempt to unify data management practices and instructions to enable reusable data that is also sufficient in quantity and quality and described in a uniform way. Data management entails the planning of data collection, processing and description: how and where it is stored and how version management is handled. Whether some data should be stored for the long term also needs to be considered. And decisions must also be made about what data can be deleted.”
Ahokas emphasises the importance of offering researchers services that help them comply with good data management practices.
“We are trying to avoid such situations as researchers being presented with some new lists of data management requirements every time they apply for funding, without being offered the services needed to ensure that they can comply with these. If we demand that research project data management follows the FAIR principles, then we must offer sufficient support and services to produce FAIR data.”
CSC, Finnish research organisations and universities have created a national data support network. It supports cooperation between CSC and organisations, and provides a forum for open discussion, questions and peer support.
And at Aalto University, for example, for each discipline is assigned a data agent – that is, data management experts with experience in research. They will collaborate with researchers to manage data.
At the time RDMkit was launched, data management came under a new kind of pressure owing to the COVID-19 pandemic.
“When RDMkit was almost ready, the world was hit by the pandemic. It was then that we realised in the ELIXIR-CONVERGE project that data related to COVID and its requirements also had to be taken into account. That is why instructions were added in RDMkit specifically related to the processing of COVID-19 data, and the COVID-19 Data Portal was set up.”
RDMkit and ELIXIR’s data management instructions have also been adopted as part of the data management of EU’s Horizon Europe financial instruments.The RDMkit toolkit is recommended for use in the biosciences, and has also attracted interest worldwide. There are a considerable number of US users, and that country’s primary federal agency for conducting and supporting medical research, the National Institutes of Health, is interested in collaborating with the ELIXIR infrastructure.
RDMkit is a general collection of data management instructions with links to research data management tools such as IceBear.
“IceBear was originally designed for data management in crystallography and structural biology,” says Lari Lehtiö, Professor of Structural Biology at the Faculty of Biochemistry and Molecular Medicine.
Lehtiö is also the head of the Oulu unit of Instruct, a research infrastructure for structural biology. Structural biology unit of Biocenter Oulu, especially through efforts of Professor Rik Wierenga and the developer Ed Daniel, designed IceBear, a data management application for structural biology. The application has also been developed in the EOSC-Life network coordinated by ELIXIR. Instruct is part of this network. With the support of the EOSC-Life project, IceBear was transferred to the cPouta cloud service maintained by CSC.
Biocenter Oulu crystallises proteins and other macromolecules. The amino acid chain of proteins is folded in a 3D helical structure that is unique to each protein. As there is a huge number of ways in which the folding can occur, researchers need to study protein structures in laboratory conditions through experimentation, by crystallising. The three-dimensional structure of a protein can be determined on the basis of how X-ray radiation scatters from the protein crystal. Using the scattering data, mathematical transformation can be used to calculate the protein’s electron density map, indicating the location of atoms in the protein. These days structural research also makes use of cryogenic electron microscopy. This involves a frozen sample made of proteins being bombarded with electrons, with millions of individual 2D images of the proteins subsequently being combined into a 3D structure.
Automatic imaging equipment is used in the crystallisation of proteins. Proteins are crystallised in various solutions, with crystals being formed under certain conditions.
“A protein is crystallised in a droplet that is followed by imaging. Plates may contain up to 300 droplets, and the plates can number several hundred. With pictures of these being taken every day accumulating a lot of data. Crystallisation is usually carried out by robots,” says Lehtiö.
The crystal samples are picked manually from the microscope and placed in liquid nitrogen tanks. With the IceBear software, it is now possible to automate entries made of the samples and the data they contain.
“Often samples are sent to another infrastructure, to other synchrotrons [cyclic particle accelerators] in Europe. Thanks to IceBear, we can find out that eventually happened to the sample elsewhere. Metadata is transferred between databases used by European synchrotron and IceBear. Samples contain a fair amount of metadata, such as which protein the sample contains how it was crystallised and what the conditions were during crystallisation.”
IceBear does away with manual logs. Data can be transferred without the use of forms, and the links are created securely in the barcodes for each of the samples.
“You only need to do it once. The value of this application is that researchers’ time is saved even years from now,” says Lehtiö.
Read article in PDF
CSC – IT Center for Science
is a non-profit, state-owned company administered by the Ministry of Education and Culture. CSC maintains and develops the state-owned, centralised IT infrastructure.
builds infrastructure in support of the biological sector. It brings together the leading organisations of 21 Euro- pean countries and the EMBL European Molecular Bio- logy Laboratory to form a common infrastructure for biological information. CSC – IT Center for Science is the Finnish centre within this infrastructure.