• Suomi
  • English

New drug molecules through determining the structure of proteins

The Bioinformatics Unit of BioCity Turku focuses on the analysis of gene and protein data. The data analyses are useful in understanding various disease mechanisms. Cancer diseases and type 1 diabetes in adults, in particular, have been studied in the unit. The goal of the unit is to improve the diagnostics, treatment and predictability of complex diseases by combining computational, experimental and clinical research.

Bioinformatics methods are used to analyse the three-dimensional structures of proteins. This makes it possible to figure out what kinds of partially developed drugs, typically small molecules, are likely to affect the protein. By utilising this information, researchers are able to understand the normal functioning of the cell and how the protein function should be affected. The end result may be a new drug molecule that affects the target protein as desired.

“An encounter between two molecules always results in interaction. Compatible form and chemistry greatly enhance this interaction. If the encounter is strong, it can change the molecule’s potential to affect a third molecule. A signal is thus transmitted in a chain with encounters between different molecules”, says University of Åbo Akademi researcher Jukka Lehtonen who specialises in information technology in the bio-industry.

Lehtonen emphasises, however, that the molecular pairs transmitting the message are not perfectly accurate, so it is not a straightforward messaging chain. Rather, we can talk about a network of molecular interactions.

“The so-called normal functioning of cells is a delicate state of equilibrium. Medication is used to try to maintain this normal state. With diabetes, for example, the insulin function of cells has been disrupted, so medication and diet are used to replace the reduced interactions.”

“Medication is also used to try and curb signal chains that function in a harmful manner.”

In the design of drug molecules, it is important that the chain of events functions in the desired manner in all molecules. If, for example, the third molecule in the signal chain activates excessively, the drug may not have the desired effects.

“The drug is effective and there are few side effects if the structures of the binding site between the drug molecule and the protein are sufficiently unique and compatible”, Lehtonen says.

“However, there are many proteins of the same type in the human body and even the most imprecise interactions can change the administered drug molecules chemically.”

Hence, there are two parts to drug design: designing the optimal molecules for the target protein and finding compounds that, when travelling through the body, change into drug molecules without side effects.

Structural model of the protein


The three-dimensional structure of a protein can be determined through X-ray crystallography. The electrons in a regular protein crystal bend the X-rays and the bending, or diffraction, can be used to calculate an electron density map. The structural model is generated by matching the atoms of the protein with the density of the electrons using computational algorithms and computer graphics.

“The crystallisation of a protein is a difficult phase. Finding the right crystallisation conditions is challenging. Some proteins do not crystallise as a whole”, says Lehtonen.

However, the number of protein structures has increased tremendously. In 1994, 1,000 structures were determined; now the number is already 100,000. The protein structures that have already been resolved are available in the PDB database.

“A significantly higher number of proteins exist and, based on other research findings, there are several potential drug targets whose structure has not yet been determined.”

If the structures of the target protein’s relatives are known, an attempt can be made to prepare a homology model.

“Relatives usually resemble each other. A theoretical model of the target’s structure can be drawn up based on a known relative. The model will inevitably resemble the original”, Lehtonen says, but points out that the model is not a result, but rather a tool.

The structural model is used to explain the experimental data collected on the functioning of the protein and to predict what may happen in an abnormal situation. The model can be used to predict, for example, what kinds of interactions different small molecules can have with the protein.

“However, the model must be assessed critically. All of its parts are not equally reliable. A structural model may depict the binding site of the drug molecule credibly even if it is otherwise uncertain.”

Lehtonen emphasises that modelling requires cooperating with research groups conducting experiments.

“Experimental arrangements that tell more about the research subject, while also revealing whether the model is reliable, are suggested based on the model. The modeller must decide whether the model can be used based on the data. The model is corrected and specified using the experimental data obtained. The cycle continues until the target is known well”, Lehtonen says.

Binding site of the drug


Drug design that is based on structure utilises information on the structure the protein’s binding site and known molecules that bind to the protein, called ligands. Drug molecules are often designed to resemble a ligand. At best, researchers have a defined protein structure that includes a ligand at their disposal. A protein can also be selectively mutated, in which case it can be deduced, based on the changes in binding intensity, which amino acid residues are involved in the binding. The binding site is usually a cavity in the protein structure. The cavities in the structural model can also be outlined computationally, but identifying the authentic binding site is not automatic.

“The mode of action of the ligand, i.e. the normal function of the protein, is in itself a valuable research result. If related structures, a group of ligands that bind to them and the differences in binding intensity are known, the most significant atomic level differences can be identified through structural analysis. This will reveal what is important in the structure of the ligand.”

The potential drug molecule should, therefore, have similar parts. If there is sufficient experimental data available on the binding site in the target protein, virtual screenings performed with databases and powerful computers can be used to quickly and reliably define the potential drug candidates from a large number of molecules. This also minimises the possible side effects of the drug.

“Virtual molecule libraries can be sifted through using the created search criteria, that is, by performing a computer search that excludes all of the completely unsuitable molecules. The remaining compounds are subjected to more specific modelling so that the group of compounds to be experimentally tested is reduced to a reasonable number.”

Search algorithm calculates the layout of the protein and another molecule


Modelling is used to find the molecules that are likely to react correctly with the protein, and the accuracy is tested through laboratory results. This provides an answer to what the potential drug candidates are and why they work and the others do not.

“If two molecular structures are placed side by side virtually, it can be asked how strong their interaction is. The strength of forces is affected by distances between atoms and the presence of other molecules, i.e. water. Physics and chemistry have produced the observation data and theories to assess forces. When the molecules move or are transformed, the calculated forces also change. The molecules can therefore be laid out in countless ways.”

Docking is a search algorithm that calculates the force between the protein and another molecule.

“Each docking algorithm uses a different strategy for group selection. The goal is to find the optimal layout which hopefully describes how the structures actually interact. The search is quite closely limited to the assumed binding site and the permissible transformations of the molecules are small. Otherwise the search space is too large, meaning that the amount of calculation increases disproportionally.”

In bioinformatics, docking is used to determine which ligand binds the most strongly. Once there is a model for the binding site, binding style and binding strength of each ligand, a proposal can be prepared on what the new drug molecules should look like in order for them to bind to the desired target protein. There are several different computing technologies available for docking. Molecular dynamics simulation, for example, permits the free movement of molecular pairs, and the work may not take weeks. That is why efficient computing resources are required. Molecular dynamics is a computationally heavy method for docking, but the reward is a more accurate understanding of the dynamic interaction between molecules. Molecular dynamics simulations are used for a more detailed modelling of the interactions and for evaluating the interaction and stability between the protein and a partially developed drug.

“The biggest mistake in modelling is to blindly believe the answers provided by the software. What is essential is the ability to evaluate the results critically and utilising modelling for problems for which it is suited”, Lehtonen emphasises.

Cloud service creates a transparent but secure resource


The Turku Bioinformatics Unit uses the cloud computing resource ePouta of the Finnish ELIXIR node in its research. It creates a transparent, local resource whose level of information security is very high. The user does not see that the computing takes place in the cloud and the data does not need to be transferred from one disk drive to another, especially via the public network. The higher information security level of ePouta is essential for research materials involving corporate secrets, for example.

“Thanks to ePouta, we have more computing capacity in the local network, which has suited us very well. In practice, our computing capacity has doubled. At the national level, CSC’s cloud is the most affordable way to create local computing resources.”

According to Lehtonen, ePouta creates a transparent, local resource. The user does not see that the computing takes place in the cloud and the data does not need to be transferred from one disk drive to another, especially via the public network. The higher information security level of ePouta is essential for some research materials.

“Since CSC is responsible for the computing resources and cloud service, it is possible to build an environment where the researcher feels comfortable at the client end. Maintaining software packages that CSC does not have is also easier this way.”

Ari Turunen


Article in PDF


Further information:

Biocenter Finland

Biocenter Finland (BF) is a distributed national research infrastructure of five biocenters in six Finnish universities:

BioCity Turku

BioCity Turku is an umbrella organization supporting and coordinating life science and molecular medicine related research in the University of Turku and Åbo Akademi University.

Turku Centre for Biotechnology

Turku Centre for Biotechnology is a joint department of the University of Turku and Åbo Akademi University, providing high-end technologies and expertise to academic and industrial researchers. .

CSC – IT Center for Science

CSC – The Finnish IT Center For Science is a non-profit, state-owned company administered by the Ministry of Education and Culture. CSC maintains and develops the state-owned, centralised IT infrastructure.


ELIXIR builds infrastructure in support of the biological sector. It brings together the leading organisations of 17 European countries and the EMBL European Molecular Biology Laboratory to form a common infrastructure for biological information. CSC – IT Center for Science is the Finnish
centre within this infrastructure.