Home Random Page



Development of bioinformatics tools for large-scale analysis of arrays of tandem repeats in proteins.


Dr Andrey KAJAVA

Directeur de Recherche CNRS
Equipe : Bioinformatique structurale et modélisation moléculaire
Centre de Recherches de Biochimie Macromoléculaire, (CRBM)
UMR 5237 CNRS, Université Montpellier 1 et 2
1919 Route de Mende, 34293 Montpellier, Cedex 5, France


FAX +33 4 34 35 95 99
Phone number 33 4 3435 9538 (lab)
e-mail andrey.kajava@crbm.cnrs.fr






We are using methods of bioinformatics and theoretical structural biology to understand principles of protein structures and biomolecular interactions. The obtained knowledge is applied for prediction of protein structures and functions, drug design, as well as for de novo design of proteins with deliberately chosen functions.

We are currently intensifying efforts on the development of bioinformatics tools that facilitate the large-scale decoding of the sequence-structure-function relationship of proteins. These tools are particularly pertinent as they will allow us to meet the challenges related to the dramatic growth of genomic data. Topics of our particular interest are: (i) proteins with tandem repeats, (ii) amyloids and prions (iii) design of molecules oriented on biomedical and biotechnological applications, (iv) 3D structure based approaches for vaccine development.


Project 1:

Development of bioinformatics tools for large-scale analysis of arrays of tandem repeats in proteins.


Dramatic growth of genomic data presents new challenges for scientists: making sense of millions of protein sequences requires systematic approaches and information about their 3D structure as well as their evolutionary and functional relationships. Today, the growth of the sequencing data significantly exceeds the growth of capacities to analyze these data. It is obvious that the immediate-term focus must be on the development of tools for large-scale, efficient, rapid, easy-to-use, whole and multi-genome analysis. Over the last two decades, the foremost efforts of bioinformatics scientists were devoted to proteins with aperiodic sequences having globular 3D structures. However, large portion of proteins (approximately every third human protein) also contain periodic sequences representing arrays of repeats that are directly adjacent to each other. Numerous studies have demonstrated the fundamental functional importance of such tandem repeats and their involvement in diseases. However, conventional bioinformatics approaches for annotation of proteomes developed for globular domains have limited success when applied to the regions with tandem repeats.

The main objective of this project is to fill this gap by developing new computational tools for bioinformatics analysis of protein tandem repeats.

The protein tandem repeats are frequently not perfect, containing a number of mutations (substitutions, indels) accumulated during evolution, and some of them cannot be easily identified. To solve this problem, over the last few years, several algorithms and software have been developed (see for review, Kajava, 2011). Depending on the size and character of the repeats some of them are performing better than others, but no best approach exists to cover the whole range of repeats. Our plan is to select the most accurate and rapid among them that will be able to cover the complete spectrum of tandem repeats. They will be used to create a meta-server for detection of tandem repeats. Each of these programs is using its own measure of the significance of found matches; therefore, selection and implementation of a unique measure will be one of the necessary steps. Necessary efforts will be made to optimize the meta-server for large-scale analysis of proteomes.


Kajava, A. V. Tandem repeats in proteins: From sequence to structure. J Struct Biol. (2011). PMID: 21884799



Topic 2:

Date: 2016-01-03; view: 1413

<== previous page | next page ==>
 | A Bioinformatics Approach to Predict Predisposition to Amyloidosis
doclecture.net - lectures - 2014-2024 year. Copyright infringement or personal data (0.007 sec.)