Considering the connected nature of immunogenetics entities from IMGT® sequence databases to the monoclonal antibodies database, a need for integration of immunogenetics data arises. To cover this need, we built the IMGT Knowledge graph (IMGT-KG), the first knowledge graph in the immunogenetics field. It bridges the gap between nucleotide and protein sequences of IMGT® databases and will open the way for effective queries and integrative immuno-omics analyses. IMGT-KG acquires data from IMGT®, then represents, describes and structures immunogenetics entities and their interrelationships in a knowledge graph using semantic web standards and technologies.
IMGT-KG is built on top of an extended version of IMGT-ONTOLOGY. We prioritize reuse of existing terms in our knowledge graph as recommended by semantic web good practices. Many
of these terms are from Open Biological and Biomedical Ontology
We make openly and freely available IMGT-KG powered by
Knowledge graphs are emerging as one of the most popular means for data federation, transformation, integration and sharing, promising to improve data visibility and reusability. Immunogenetics is the branch of life sciences that studies the genetics of the immune system. Although the complexity and the connected nature of immunogenetics data make knowledge graphs a prominent choice to represent and describe immunogenetics entities and relations, hence enabling a plethora of applications, little effort has been directed towards building and using such knowledge graphs so far. In this work, we present the IMGT Knowledge Graph (IMGT-KG), the first of its kind FAIR knowledge graph in immunogenetics. IMGT-KG acquires and integrates data from different immunogenetics databases, hence creating links between them. Consequently, IMGT-KG provides access to 79 670 110 triplets with 10 430 268 entities, 673 concepts and 173 properties. IMGT-KG reuses many existing terms from domain ontologies or vocabularies and provides external links to other resources of the same domain, as well as a set of rules to guide inference on nucleotide sequence positions by applying Allen Interval Algebra. Such inference allows, for example, reasoning about genomics sequence positions. IMGT-KG fills in the gap between genomics and protein sequences and opens a perspective to effective queries and integrative immuno-omics analyses. We make openly and freely available IMGT-KG with detailed documentation and a Web interface for access and exploration.
Sanou, G., Giudicelli, V., Abdollahi, N., Kossida, S., Todorov, K., Duroux, P. (2022).