Introduction
IMGT/GENE-DB is part of IMGT®, the international ImMunoGeneTics information system®,
the high-quality integrated knowledge resource specializing in the immunoglobulins (IG) or antibodies, T cell receptors (TR),
and major histocompatibility (MH) of human and other vertebrate species, proteins of the immunoglobulin superfamily (IgSF)
and MH superfamily (MhSF), related proteins of the immune systems (RPI) of vertebrates and invertebrates,
therapeutic monoclonal antibodies (mAb), and fusion proteins for immune applications (FPIA), created in 1989 by Marie-Paule
Lefranc (LIGM, Université Montpellier 2, CNRS).
IMGT/GENE-DB is the IMGT genome database for IG and TR genes from human, mouse and other vertebrates, on the Web since February 2003.
IMGT/GENE-DB provides a full characterization of the genes and of their alleles: IMGT gene name and definition, chromosomal localization, number of alleles,
and for each allele, the IMGT allele functionality, and the IMGT reference sequences and other sequences from the literature.
IMGT/GENE-DB allele reference sequences are available in FASTA format
(nucleotide and amino acid sequences with IMGT gaps according to the IMGT unique numbering, or without gaps).
IMGT/GENE-DB includes links to the IMGT Repertoire standardized resources (Chromosomal localization, Locus representation,
Tables of alleles, Alignments of alleles, IMGT Protein displays, IMGT Colliers de Perles, etc.), to the IMGT/LIGM-DB and
IMGT/3Dstructure-DB structures and IMGT/2Dstructure-DB IMGT databases.
IMGT/GENE-DB is the official repository of all of the IG and TR genes
and alleles approved by the World Health Organization (WHO)/International Union of Immunological Societies (IUIS) Nomenclature Subcommittee
for IG and TR (Lefranc 2007, 2008a). Reciprocal links exist between IMGT/GENE-DB and the Human Genome Nomenclature Committee (HGNC)
database, NCBI Gene at the National Center for Biotechnology Information (NCBI).
IMGT/GENE-DB Query page
The IMGT/GENE-DB Query page shows, on the top right, the status of the database
(current date, number of genes, number of alleles and number of species).
Search according to the concepts of IMGT-ONTOLOGY
Searches in IMGT/GENE-DB are performed according to the concepts of
IDENTIFICATION,
LOCALIZATION
and CLASSIFICATION of
IMGT-ONTOLOGY.
IDENTIFICATION
- Species:
- only species for which genes have been entered in IMGT/GENE-DB are available.
- MolecularComponent:
- IG or TR.
- GeneType:
- allows the selection on one of the gene types.
- Functionality:
- allows the selection on IG and TR functionality.
- Clone name:
- enter a clone name or the first letters of a clone name. Clone names are those of the "Reference sequences" and "Sequences from the literature"
columns in Genes tables.
LOCALIZATION
- Locus:
- allows the selection on IG and TR loci (includes main loci and chromosomal orphon sets).
- Main Locus:
- allows the selection on a main IG or TR locus.
- Chromosomal orphon set:
- allows the selection on IG or TR genes which are outside the main loci or Chromosomal orphon set.
As Main Locus may contain RPI genes, a selection on 'Molecular component' 'IG' or 'TR' prior to the request will allow to only retrieve IG or TR genes.
CLASSIFICATION
- IMGT group:
- allows the selection on IMGT groups.
- IMGT subgroup:
- allows the selection on IMGT subgroups.
- IMGT gene:
- enter an IMGT gene name, for example IGHV1-2 (List of human genes
according to the IMGT nomenclature).
Note that the search is case sensitive and that UPPERcase is the rule.
You can also enter only the first letters of the IMGT gene name: for example the selection of IGHV will list
in the next page all genes which have an IMGT gene name beginning with IGHV.
You can consult the Correspondence between nomenclatures.
- Selection of genes which have been found:
- Allows the selection of genes which have been found rearranged and/or,
transcribed, for at least one allele.
LOCALIZATION IN GENOME ASSEMBLIES
- Species:
- only species for which the gene localization in genomes assemblies are managed in IMGT/GENE-DB are available.
- Locus:
- only locus for which the gene localization in genomes assemblies are managed in IMGT/GENE-DB are available.
- Assembly :
- allows to select the assembly.
- Assembly unit:
- allows to select the assembly unit, for example "Primary Assembly".
- Designation:
- allows to select the Designation for example "Full chromosome 14" (for Homo sapiens).
IMGT/GENE-DB direct links
Provides a set of direct links to query IMGT/GENE-DB according to an IMGT gene name,
an IMGT group or to get the links to IMGT/GENE-DB and generalist genomic databases.
RESULTS OF YOUR SEARCH
Depending on the number of resulting genes, you will see:
- for 0 resulting genes: the message "There are no genes in IMGT/GENE-DB according to your criteria"
- for one or more resulting gene(s): List of resulting genes
List of resulting genes
At the top of the page, the selected criteria are indicated with the number of resulting genes and the number of resulting alleles.
The list of resulting genes is a table with the following columns:
First column: select
Allows to select the genes and then
Choose your display
In the example,
Homo sapiens TRAV8-3 et TRAV8-4 have been selected.
IMGT gene names
Provides the gene names in the
IMGT gene nomenclature
(
List of human genes).
Functionality
Provides the IMGT gene
functionality according to the IMGT definition.
F:
Functional
ORF:
Open Reading Frame
P:
Pseudogene
The Functionality may be shown between parentheses or between brackets: corresponding rules are available
here
When more then one functionality is indicated for a gene (for example
F, [F]), this means that the gene shows several alleles with distinct functionalities.
IMGT gene definition
Provides the gene definition according to the
IMGT gene nomenclature.
Number of alleles
Provides, per gene, the number of
alleles currently identified in IMGT.
Chromosomal localization
Provides, for the allele *01, the IMGT/LIGM-DB accession number(s)
of the corresponding reference sequence(s).
Molecular component
Provides the molecular component for the gene.
Choose your display
Three choices of display are provided:
"Complete IMGT/GENE-DB entries" is selected by default. It displays the detailed results
for the selected genes
(see
IMGT/GENE-DB DETAILED RESULTS).
"IMGT/GENE-DB reference sequences in FASTA format" for the selected genes corresponds to :
- F+ORF+all P nucleotide sequences for coding region(s) or exon(s)
- F+ORF+in-frame P nucleotide sequences for coding region(s) or exon(s)
- F+ORF+in-frame P nucleotide sequences with IMGT gaps for V and C genes for coding region(s) or exon(s)
- F+ORF+in-frame P amino acid sequences for coding region(s) or exon(s)
- F+ORF+in-frame P amino acid sequences with IMGT gaps for coding region(s) or exon(s)
The FASTA header of IMGT/GENE-DB reference sequences in FASTA format is standardized. See
FASTA format of IMGT/GENE-DB reference sequences.
"IMGT label extraction from IMGT/LIGM-DB reference sequences"
allows to extract, from the IMGT/LIGM-DB reference sequences, and for each allele of the selected gene(s),
the sequences corresponding to
one or several IMGT labels and/or artificially spliced exons.
The list of IMGT/LIGM-DB labels is available here.
|
|
IMGT/GENE-DB DETAILED RESULTS
The IMGT/GENE-DB DETAILED RESULTS page provides the IMGT/GENE-DB entry (ies).
The top of this page reminds you the gene(s) you have selected.
You can click on each of them to view the corresponding IMGT/GENE-DB entry.
Content of an IMGT/GENE-DB entry :
IMGT gene name and definition
Provides the IMGT gene name (species and symbol in the
IMGT gene nomenclature) and the IMGT definition (full name) of the gene.
Chromosomal localization
Provides the name of the locus (main locus or chromosomal orphon set), the chromosome number and the cytogenetic localization on the chromosome
when known.
Localizations in genome assemblies
Provides the localizations of the gene and IMGT labels in the genome assemblies, if managed in IMGT/GENE-DB :
- Name of the assembly
- Assembly unit
- Designation
- Accession number in the assembly
- IMGT allele name if identified and validated by IMGT biocurators
- IMGT functionality of the allele if identified
- IMGT labels
- Positions of the gene and IMGT labels in the assembly, the link allows to retrieve the corresponding FASTA sequence.
- Orientation of the gene and IMGT label in the assembly.
Number of alleles
Provides the number of
alleles which have been currently identified in IMGT.
IMGT reference alleles
Provides a table in which are listed all identified alleles. For each allele are indicated:
- its functionality
- the names of the exons (for constant and conventional genes)
- the R column (for variable, diversity and joining genes) (if defined): it indicates if the allele has been found (+) or not been found (-) rearranged (R).
- the T and Pr columns (if defined): they indicate if the gene sequences have been found (+) or not been found (-) rearranged (R) transcribed (T), and/or
translated into protein (Pr)
- the IMGT/LIGM-DB reference sequence with :
- the subspecies (if relevant and if defined)
- the strain or breed or isolate (if relevant and if defined)
- the clone name (if defined)
- the accession number
- the secondary accession numbers (if defined)
- the molecule type (DNA or cDNA)
- the specificity of cDNA sequences is indicated in the last column on the right when known.
Below the IMGT reference alleles table, a second table provides links to display the IMGT/GENE-DB reference sequences in FASTA format
.
IMGT/GENE-DB reference sequences in FASTA format
The IMGT/GENE-DB reference sequences in FASTA format are provided according to the gene type.
The FASTA header is standardized according to
FASTA format of IMGT/GENE-DB reference sequences.
- For V genes
V-REGION
- F+ORF+all P: provides the nucleotide sequences of V-REGION for functional, ORF and all pseudogene alleles of the gene(s).
- F+ORF+in-frame P: provides the nucleotide and amino acid sequences of V-REGION for functional, ORF and in-frame pseudogene alleles of the gene(s).
The nucleotide sequences and the amino acid sequences are provided with IMGT gaps according to the IMGT unique numbering (IMGT Scientific chart) .
L-PART1+V-EXON
- F+ORF+all P: provides the nucleotide sequences of the artificially spliced L-PART1 and V-EXON for functional, ORF and all pseudogene alleles of the gene(s).
- F+ORF+in-frame P: provides the amino acid sequences of the artificially spliced L-PART1 and V-EXON for functional, ORF and in-frame pseudogene alleles of the gene(s).
- For D or J genes
- F+ORF+all P: provides the nucleotide sequences of D-REGION or J-REGION for functional, ORF and all pseudogene alleles of the D or J gene(s) respectively.
- F+ORF+in-frame P: provides the amino acid sequences of D-REGION or J-REGION for functional,
ORF and in-frame pseudogene alleles of the D or J gene(s) respectively.
Note that the J-REGION in cDNA and gDNA differ by one nucleotide in 3'.
In FASTA format, this nucleotide is restored if the reference sequence is from cDNA.
- For C genes and conventional genes
Individual constant exon(s)
- F+ORF+in-frame P: provides the nucleotide sequences of individual constant exon(s) for functional, ORF and in-frame pseudogene alleles of the C gene(s).
- F+ORF+in-frame P with IMGT gaps: provides the nucleotide and amino acid sequences with gaps of individual constant exon(s) for functional, ORF and in-frame pseudogene alleles of the C gene(s).
Gaps are according to the IMGT unique numbering (IMGT Scientific chart) .
Note that:
-
For exons of C-GENE or GENE, if splicing frame 1 or 2, a nucleotide is added in 5' of these exons to obtain a complete first codon.
In the FASTA header, in field 6, the added nucleotide is indicated followed by a comma before the start position.
Note the number of added nucleotides in 5' is indicated in the FASTA header field 9 (see FASTA format of IMGT/GENE-DB reference sequences).
- For exons of C-GENE or GENE, if splicing frame 1 or 2, a nucleotide is deleted in 3' of these exons to obtain a complete last codon.
In the FASTA header, in field 6, the end position is decreased by the number of deleted nucleotides in 3'.
Note the number of removed nucleotides in 3' is indicated in the FASTA header field 10. (see FASTA format of IMGT/GENE-DB reference sequences)
IMGT gaps
Gaps of the IMGT/GENE-DB reference sequences with IMGT gaps are shown for the positions unoccupied based on the IMGT unique numbering 'for C-DOMAIN'
(see 'Range of strand, turn and loop lengths in C-DOMAIN and C-LIKE-DOMAIN'
https://www.imgt.org/IMGTScientificChart/Numbering/IMGTIGVCsuperfamily.html).
In particular, they include the following additional positions for C-DOMAIN:
1.8-1.1 (A-STRAND)
15.1-15.3 (AB-TURN)
45.1-45.7 (CD-STRAND)
84.1-84.7, 85.7-85.1 (DE-TURN)
96.1-96.2 (EF-TURN).
Artificially spliced exon(s)
- F+ORF+in-frame P: provides the nucleotide and amino acid sequences of the artificially spliced exons for functional, ORF and in-frame pseudogene alleles of the C gene(s).
Note that the sequences include one nucleotide from the upstream donor exon,
added in 5' to obtain a complete first codon.
Provides for a given reference allele, the other sequences from the literature corresponding to that allele. For each allele of the gene is indicated the
IMGT/LIGM-DB reference sequence with the
clone name
(if known), the
accession number,
the
molecule type (DNA or cDNA).
The
specificity of cDNA sequences is indicated in the last column on the right when known.
IMGT Repertoire links
Provides additional IMGT Web resources concerning the gene in relation with its locus and group available in
IMGT Repertoire.
Annotated IMGT/LIGM-DB cDNA sequences
Provides:
- the number of annotated IMGT/LIGM-DB cDNA sequences for the selected gene.
- a link to a table of annotated IMGT/LIGM-DB cDNA sequences with the
accession number, the IMGT allele name, the sequence length, the sequence functionality, the sequence definition and the specificity.
Annotated IMGT/LIGM-DB rearranged genomic DNA sequences
Provides:
- the number of annotated IMGT/LIGM-DB rearranged genomic DNA sequences for the selected gene.
- a link to a table of annotated IMGT/LIGM-DB rearranged genomic DNA sequences with the
accession number, the IMGT allele name, the sequence length, the sequence functionality, the sequence definition and the specificity.
Annotated IMGT/3Dstructure-DB structures
Provides:
- the number of annotated IMGT/3Dstructure-DB structures for the selected gene.
- a link to a table of annotated IMGT/3Dstructure-DB structures with the
PDB code, the IMGT allele name, the IMGT protein name, the IMGT receptor type, the IMGT receptor description, the species, the chain ID.
External links
Provides external links concerning the gene to other nomenclature, genome and sequence databases.
IMGT label extraction from IMGT/LIGM-DB reference sequences
"IMGT label extraction from IMGT/LIGM-DB reference sequences" is one of the three
choices of Choose your display in RESULTS OF YOUR SEARCH.
It provides, for each allele of the selected gene(s), in FASTA format, the nucleotide sequences or the amino acid sequences
corresponding to the selected label(s) extracted from the IMGT/LIGM-DB reference sequences.
Nucleotide sequences are provided for F+ORF+all P alleles.
Amino acid sequences are provided for F+ORF+in-frame P alleles.
Three example are displayed below:
- Example of extraction of the FR3-IMGT label and the L-PART1+V-EXON artificially spliced label in nucleotides
- Example of extraction of the L-PART1 label in nucleotides with extension of 5 nucleotides
in 5' and 30 nucleotides in 3'
- Example of extraction of the L-PART1+V-EXON artificially spliced label in amino acids
Note that the FASTA header is standardized according to FASTA format of IMGT/GENE-DB reference sequences.
In addition, in case of extension with nucleotides in 5' and/or in 3', the added nucleotides in 5' and in 3' are indicated in the field 6 of the FASTA header
(see example)
Example of extraction of the FR3-IMGT label and the L-PART1+V-EXON artificially spliced label in nucleotides
Example of extraction of the L-PART1 label in nucleotides with extension of 5 nucleotides
in 5' and 30 nucleotides in 3' (see Choose your display)
Note that the number of added nucleotides in 5' and in 3' are indicated in the field 6 of the FASTA header.
Example of extraction of the L-PART1+V-EXON artificially spliced label in amino acids
IMGT/GENE-DB LOCALIZATION IN GENOME ASSEMBLIES
The genomic localizations of IMGT genes are provided according to the selection : Species, Locus, Assembly, Assembly unit and Designation.
- On the top of the page, the species and locus are indicated with the chromosomal localization and the orientation of the locus on the chromosome.
The number of localized genes in the assembly is then indicated with the corresponding number of labels between parenthesis.
- A link allows to display the list of genes of the locus that are not localized in the selected assembly, if any.
The table comprises one line per localized gene including :
- IMGT information regarding the gene in the locus:
- the IMGT gene name
- the IMGT gene order in the locus
- the orientation of the gene in the locus.
- the IMGT allele name and its functionality, if identified and validated by IMGT Biocurators, except for Mus musculus (mouse) locus.
Note that for Mus musculus (mouse) locus, the information provided is for IMGT allele *01.
- For the identified alleles, the IMGT/LIGM-DB accession numbers of the reference sequences.
- For the identified alleles, IMGT labels and positions in the reference sequences. Positions are provided for:
- L-V-GENE-UNIT and V-REGION for V genes
- D-GENE-UNIT and D-REGION for D genes
- J-GENE-UNIT and J-REGION for J genes
- C-GENE-UNIT and C exons, C domain and/or C-REGION for C genes
- HGNC gene ID (for Mus musculus (mouse): MGI gene ID; for Danio rerio (zebrafish): ZNC gene ID).
- NCBI information and IMGT label positions:
- NCBI gene ID
- NCBI accession number
- IMGT labels positions in NCBI accession number except for Mus musculus (mouse) locus.
Note that for Mus musculus (mouse) locus, positions are those provided by NCBI. For V genes, positions correspond to L-PART1+V-INTRON+V-EXON.
FASTA format of IMGT/GENE-DB reference sequences
The FASTA header of IMGT/GENE-DB reference sequences is standardized. It contains 15 fields separated by '|':
1. IMGT/LIGM-DB accession number(s)
2. IMGT gene and allele name
3. species
4. IMGT allele functionality
5. exon(s), region name(s), or extracted label(s)
6. start and end positions in the IMGT/LIGM-DB accession number(s)
7. number of nucleotides in the IMGT/LIGM-DB accession number(s)
8. codon start, or 'NR' (not relevant) for non coding labels
9. +n: number of nucleotides (nt) added in 5' compared to the corresponding label extracted from IMGT/LIGM-DB
10. +n or -n: number of nucleotides (nt) added or removed in 3' compared to the corresponding label extracted from IMGT/LIGM-DB
11. +n, -n, and/or nS: number of added, deleted, and/or substituted nucleotides to correct sequencing errors, or 'not corrected' if non corrected sequencing errors
12. number of amino acids (AA): this field indicates that the sequence is in amino acids
13. number of characters in the sequence: nt (or AA)+IMGT gaps=total
14. partial (if it is)
15. reverse complementary (if it is)
Note that the field 6. may be modified if:
- a nucleotide has been added in IMGT/GENE-DB reference sequence in 5' of a label, to obtain a complete first codon (for example for C-GENE exons if splicing frame 1 or 2):
the added nucleotide is indicated followed by a comma before the start position.
See for example the reference sequences of
Homo sapiens IGHA1 gene .
Note the number of added nucleotides in 5' is indicated in field 9.
- a nucleotide has been deleted in 3' of a label, to obtain a complete last codon (for example for C-GENE exons if splicing frame 1 or 2):
the end position is decreased by the number of deleted nucleotides in 3'.
See for example the reference sequences of
Homo sapiens IGHA1 gene .
Note the number of removed nucleotides in 3' is indicated in field 10.
- a nucleotide has been added in 3' of a label, to obtain the complete genomic sequence (for example for J-REGION reference sequence from cDNA):
the end position is followed by a comma and the added nucleotides in 3'.
See for example the reference sequences of
Homo sapiens TRAJ47 gene .
Note the number of added nucleotides in 3' is indicated in field 10.
Four examples are displayed below:
- Nucleotide sequences with IMGT gaps
- Amino acid sequences with IMGT gaps
- Nucleotide sequences (without gaps)
- Amino acid sequences (without gaps)
- Nucleotide sequences with IMGT gaps:
- Amino acid sequences with IMGT gaps:
- Nucleotide sequences:
- Amino acid sequences:
IMGT/GENE-DB reference sequences and gene orientation
An IMGT/GENE-DB reference sequence for a given IG or TR gene is provided in the
5' > 3' DNA strand orientation corresponding to the 'sense', 'plus' or 'coding strand'
of that gene (DNA strand orientation).
The orientation (direct or opposite) of an IG or TR gene in a given IMGT locus is given in Locus Gene order (Genomic orientation)
IMGT Repertoire (IG and TR) > 1. Locus and genes > 3. Locus descriptions > Locus gene order
Created: 31/01/2003
Last updated: 12/09/2019