IMGT annotation rules provide information on practical questions raised in the
process of annotation and which may be useful to the IMGT/LIGM-DB users.
Annotation level in IMGT/LIGM-DB
Annotation level is indicated on the ID line at the top of the flat file.
To retrieve sequences according to the annotation level, go to the
"Catalogue" module of IMGT/LIGM-DB.
Gene and allele names in IMGT/LIGM-DB
- Standardized IMGT gene and allele names are shown, in the
IMGT annotations and in the IMGT flat files, under the qualifiers "gene" and "allele", respectively.
These qualifiers are assigned to the entities (V-GENE, D-GENE, J-GENE and C-GENE)
and to the cores (V-REGION, D-REGION, J-REGION , and C-REGION).
-
Note that the gene names shown in the definitions and under the qualifier "gene_alias"
in the IMGT flat files are designations retrieved from the EMBL flat files or from the literature.
-
A change in the IMGT gene or allele names is indicated by a note with a date
(both in red) in the
Gene tables and
related documents of the IMGT Repertoire.
Limits of labels
Germline sequences
- V-REGION
The 3' end of V-REGION includes, if present, the 1 (or 2) nucleotide(s) between the
most 3' codon and the V-HEPTAMER.
- J-REGION
The 5' end of J-REGION includes, if present, the 1 (or 2) nucleotide(s) between the
most 5' codon and the J-HEPTAMER.
Rearranged sequences not yet annotated with
IMGT/JunctionAnalysis
- IGK and IGL
The V-REGION is limited in 3' by the last germline codon (even if mutated)
The J-REGION is limited in 5' by the first germline codon (even if mutated)
N-REGION is only added if there are additional nucleotides compared to the
length (in codon numbers) of the germline V-REGION and J-REGION
- IGH
The V-REGION is limited in 3' by the last germline codon (even if mutated)
The J-REGION is limited in 5' by the first codon which is found to be identical to the germline sequence.
The sequence between V-REGION and J-REGION is designated as N-AND-D-REGION.
- TRA and TRG
The V-REGION is limited in 3' by the last germline codon (even if there are substitutions due to the N-diversity).
The J-REGION is limited in 5' by the first codon which is found to be identical to the germline sequence.
The sequence between V-REGION and J-REGION is designated as N-REGION.
- TRB and TRD
The V-REGION is limited in 3' by the last germline codon (even if there are substitutions due to the N-diversity).
The J-REGION is limited in 5' by the first codon which is found to be identical to the germline sequence.
The sequence between V-REGION and J-REGION is designated as N-AND-D-REGION.
C-REGION in genomic DNA and cDNAs
- C-REGION in genomic DNA does not include the J-REGION last nucleotide which participates to the the
first codon. The translation tool starts at the third nucleotide of the C-REGION
(codon_start 3).
The J-REGION nucleotide and the first amino acid resulting from the splicing are shown between parentheses
in Alignments of alleles (IMGT Repertoire).
- C-REGION in cDNA includes the J-REGION last nucleotide.
CH3 (or CH4) exon and domain, and CHS
- CH3 exon (for IGHG) or CH4 exon (for IGHM and IGHE) includes CHS.
- CH3 domain (for IGHG) or CH4 domain (for IGHM and IGHE) does not include CHS.
Hinge
- CH2 exon (for IGHA) includes the hinge H (annotations in IMGT/LIGM-DB).
- CH2 domain (for IGHA) does not include the hinge H (annotations in IMGT/3Dstructure-DB).
Functionality
Functionality is shown between:
- parentheses, (F) and (P), when the accession number refers to a gene for which the functionality
needs to be confirmed by a gDNA sequence that, for V,D,J, should be germline. For example, (F) or (P) are assigned
to a C-REGION in cDNA or to a V-, D- or J-REGION in rearranged
gDNA, for which the corresponding gDNA c-REGION or germline V-, D- or J-REGION has not been yet isolated.
- brackets, [F] and [P], when the accession number refers to a gene for which the functionality
needs to be confirmed by a complete sequence. For example, [F] or [P] are assigned to gDNA of V-, D- or J-REGION,
not known as being germline or rearranged.
In IMGT Gene tables:
- functionality between parentheses ( ) corresponds to: c, #c or #g after the accession number,
- functionality between brackets [ ] corresponds to: ° after the accession number.
Partial sequences
-
An entity label may be assigned to partial sequences if there is no ambiguity
regarding this assignment.
- The V-REGION (core) label is always indicated, even if only one of its parts
(FR1-IMGT, CDR1-IMGT, ..) is present.
- Germline V-EXON, V-REGION, CDR3-IMGT, J-REGION are described as "complete"
("not partial") when the number of expected codons is present. That means that the
absence or presence of 1 (or 2) nucleotide(s) in 3' of the most downstream codon of
the V-EXON, V-REGION, CDR3-IMGT, or in 5' of the first codon of J-REGION is not taken
into account.
- The compound labels (such as D-J-REGION, V-EXON...) are only indicated if all
of their components (even partial) are present.
- The keyword "constant region" is not assigned to sequences which contain partial
C-REGION of only 18 nucleotides (or less). However, the C-REGION (core) label is always indicated.
Insertion in a sequence alignment
In case of an unusual insertion in a sequence alignment or display (Alignments of alleles,
Protein displays, etc.), the 5' part of the sequence with the insertion is preferentially
moved to the left to allow the insertion (if not possible, the 3' part of the sequence is
moved to the right to allow the insertion). In both cases, the sequences with no insertion
have a blank at that position.
Symbols for description
of genes and alleles in IMGT Repertoire
#: rearranged
#c: rearranged cDNA
#g: rearranged genomic DNA
c: cDNA sequence
°: genomic DNA, but not known as being germline or rearranged.
~: genomic DNA rearranged in a cassette (for example in the IGH locus of chondrichthyes)
~VD,DJ: VD,DJ genomic DNA rearranged in the cassette.
~VD: VD genomic DNA rearranged in the cassette.
~VDJ: VDJ genomic DNA rearranged in the cassette.
(m): transcript of a membrane chain.
(s) : transcript of a secreted chain.
(st): sterile transcript.
MAP: Mapped reference sequences: "mapped" refers to sequences which have been obtained from clones (phages, cosmids, YACs...)
either by subcloning or PCR, and does not apply to sequences obtained directly from genomic DNA.
Other symbols
- (nd): not defined.
- x-mer: indicates the length of a peptide linker or that of a chain motif where 'x' is the number of amino acids
Rules for description of genes and alleles in IMGT Repertoire
In Alignments of alleles, when several alleles are shown, the nucleotide mutations and
amino acid changes for a given codon are indicated in red letters.
Nucleotides and amino acids at the 3' end of V-REGION, 5' end of J-REGION, or on either
end of D-REGION, which may belong to the N-REGION, are shown in italics and in black.
This representation also applies for sequencing errors.
Translation
-
The translation of J-REGION (and C-REGION for cDNA) of unproductive rearranged
cDNA or gDNA is arbitrarily shown in the germline reading frame. In the annotations,
this is indicated by the qualifier "germline_frame".
Bibliographical references format
In IMGT Repertoire, the format for bibliographical references is the following:
- For an article
[1] Lefranc, M-P. et al., Dev. Comp. Immunol., 27, 55-77 (2003)
PMID: 12477501,
LIGM: 268.
- For a book
[4] Lefranc, M.-P. and Lefranc, G.,
The Immunoglobulin FactsBook,
Academic Press, 458 pages (2001)
ISBN: 012441351X.