IMGT current projects in the field of Artificial Intelligence

Recognition and prediction of the immunoglobulins and T cell receptors at the genomic level

The Human Genome Project, which remains the world's largest collaborative genetics project was launched in 1990 and it was declared complete in 2003. Since then, the methodological and technological advances in the field have made it possible to obtain the whole genome sequencing of a species in much shorter time and in much deeper detail compared to the first version of the human genome. Nowadays, several species genomes are fully sequenced and accessible via the publicly available databases. The adaptive immune response, which appeared 450 million years ago, is found in all jawed vertebrates, that is in species going from fish to human. The identification and the annotation of the immunoglobulins (IG) and T cell receptors (TR) which are notorious for high similarity among themselves constitutes a tedious and time-consuming process which is also prone to mistakes. The current state of the art is relying on sequence alignments for the successful identification and annotation of the IG and TR.

In the under question project and in collaboration with ATOS, Deep Learning algorithmic approaches are explored to see if the identification and annotation of the immunoglobulins and T cell receptor genes can be achieved with more efficiency in comparison to the traditional sequence alignment based methodologies. Traditional and new Deep Learning architectures are being used such as Deep Neural Network (DNN), Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), Attention mechanism, Transformers and BERT to address the immunogenetic questions. The complete set of IG and TR of a genome apart from addressing fundamental questions in the fields of molecular evolution and comparative genomics it provides the necessary ground information to enter, for instance, into the personalized medicine as well as the livestock enhancement fields.

A knowledge base in immunogenetics for discovering new knowledge

Ontologies are major technological components in the context of Open and Big Data. In the life sciences, the vocabulary is abundant and the definitions sometimes subjective, which makes it difficult to formalize them in the form of computers. The foundations of IMGT-ONTOLOGY were published in 1999 and a first implementation in OWL language was made available in 2010 and published on the Bioportal of the National Center for Biomedical Ontology (NCBO) (IMGT-ONTOLOGY). Knowledge graphs (KG) are ontological models that describe the entities of interest in a domain and the relationships between them. Today, these graphs are the keys to obtaining a structured Web with interoperable, reusable and accessible data and knowledge, a concept called FAIR. In the field of immunogenetics, antibody engineering for therapeutic purposes is a booming branch that requires the structuring of knowledge in the form of knowledge graphs. Indeed, the structure of artificially modified monoclonal antibodies is very diverse and departs greatly from the classic pattern of natural antibodies. The main objective of the project, which is carried out in collaboration with the FADO team, is to offer tools to help the expert to extract information and knowledge from structured and unstructured data and thus provide support to generate and validate scientific hypotheses in the field of antibodies for therapeutic purposes. In this area, the IMGT/mAb-DB, which integrates, among other things, definition, clinical trials, targeted pathologies, description of sequences and formats and which is the result of time-consuming and manual work that requires searching through various unstructured and heterogeneous resources will be used. Antibodies for therapeutic purposes are the most promising new drugs at present in the treatment of serious diseases, such as several forms of cancer, as well as in a personalized medicine approach.

A knowledge base in immunogenetics for discovering new knowledge images

IMGT approach for the prediction and analysis of epitope and paratope interaction

Computational prediction of antigen peptide sequences that elicit T-cell immune response has broad and significant impact on vaccine design.

Major histocompatibility complex (MHC) molecules are expressed on the cell surface, where they present peptides to T cells, which gives them a key role in the development of T-cell immune responses. MHC molecules come in two main variants: MHC Class I (MHC-I) and MHC Class II (MHC-II). The binding and interaction between MHC molecules and intracellular peptides derived from a variety of proteolytic mechanisms play a crucial role in subsequent T-cell recognition of target cells and the specificity of the immune response. Computational scanning using machine learning of peptides that bind to a specific major histocompatibility complex (MHC) can speed up the MHC-peptide based vaccine development and therefore various methods are being actively developed. We are currently exploring a machine learning based method to accurately predict peptide binding to both MHC-I and MHC-II.

IMGT approach for the prediction and analysis of epitope and paratope interaction