1. INTRODUCTION
2. IMGT/LIGM-DB ANNOTATION
LEVEL
3. STRUCTURE OF AN ENTRY
4. LINE STRUCTURE
5. INDEX FILE FORMAT
The database is composed of sequence entries. Each entry corresponds to a single contiguous sequence as contributed or reported in the literature. In many cases, entries have been assembled from several papers reporting overlapping sequence regions. Conversely, a single paper often provides data for several entries, as when homologous sequences from different organisms are compared.
Annotation Definition Level -------------------------------------------------------------------------- - keyword level entries to which standardized keywords are assigned - by annotators sequences annotated by IMGT experts - automatic automatically annotated with IMGT tools
ID - identification (begins each entry; 1 per entry) AC - accession number (=1 per entry) DT - date (2 per entry) DE - description (=1 per entry) KW - keyword (=1 per entry) OS - organism species (=1 per entry) OC - organism classification (=1 per entry) RN - reference number (=1 per entry) RC - reference comment (=0 per entry) RP - reference positions (=1 per entry) RX - reference cross-reference (=0 per entry) RA - reference author(s) (=1 per entry) RT - reference title (=1 per entry) RL - reference location (=1 per entry) DR - database cross-reference (=0 per entry) FH - feature table header (=1 per entry) FT - feature table data (=0 per entry) CC - comments or notes (=0 per entry) XX - spacer line (many per entry) SQ - sequence header (1 per entry) bb - (blanks) sequence data (=1 per entry) // - termination line (ends each entry; 1 per entry)
A sample IMGT/LIGM-DB entry is shown below:
ID MMTCRGBV1 IMGT/LIGM annotation : by annotators; RNA; ROD; 290 BP. XX AC Z48588; XX DT 12-FEB-1996 (Rel. 3, arrived in LIGM-DB ) DT 04-JAN-2000 (Rel. 12, Last updated, Version 4) XX DE M.musculus mRNA for T-cell receptor gammaB-V1 segment. ; DE RNA; rearranged configuration; TcR-Gamma; regular; functionality DE productive; group TRGV; subgroup GV1. XX KW antigen receptor; immunoglobulin superfamily; TcR; TcR gamma-delta; KW TcR-Gamma; variable; IMGT reference sequence; t cell receptor. XX OS Mus musculus (house mouse) OC Eukaryota; Metazoa; Chordata; Vertebrata; Mammalia; Eutheria; Rodentia; OC Sciurognathi; Muridae; Murinae; Mus. XX RN [1] RP 1-290 RX MEDLINE; 96134008. RA Roger T.T., Morisset J., Seman M.; RT "Conservation of Tcrg-V5 and limited allelic sequence polymorphism of RT the other Tcrg-V genes used by mouse tissue-specific gd-T lymphocytes."; RL Immunogenetics 43:165-166(1996). XX RN [2] RP 1-290 RA Roger T.T.; RT ; RL Submitted (03-MAR-1995) to the EMBL/GenBank/DDBJ databases. RL Thierry T.R. Roger, Lab. d'immunodifferenciation, Pr Seman, Universite RL Denis Diderot, 2, place Jussieu, 75251 Paris cedex 05, France XX DR MGD; MGI:98631; Tcrg-V1 DR EMBL; Z48588. XX FH Key Location/Qualifiers FH FT V-REGION 1..290> FT /partial FT /chromosome="13" FT /cell_type="purified skin T cells" FT /strain="L-I (Biozzi mice)" FT /clone_lib="library M13mp19" FT /clone="3.4" FT /allele="TRGV1*08" FT /gene="TRGV1" FT /haplotype="Tcr-gB" FT /tissue_type="skin" FT /CDR_length="[8.6.X]" FT /translation="QLKQTEVSVTRETDESAQISCIASLPDFGNTEIHWYRQKAK FT QFEYLIYVQTNYNQRPLGGKHKKIEASKDFQTSTSTLKINYLKKEDEATYYCAVW FT " FT FR1-IMGT <1..72 FT /partial FT /AA_IMGT="3 to 26" FT /translation="QLKQTEVSVTRETDESAQISCIAS" FT 1st-CYS 61..63 FT CDR1-IMGT 73..96 FT /AA_IMGT="27 to 34" FT /translation="LPDFGNTE" FT FR2-IMGT 97..144 FT /AA_IMGT="39 to 55, AA 49 missing" FT /translation="IHWYRQKAKQFEYLIY" FT CONSERVED-TRP 103..105 FT CDR2-IMGT 145..162 FT /AA_IMGT="56 to 61" FT /translation="VQTNYN" FT FR3-IMGT 163..279 FT /AA_IMGT="66 to 104" FT /translation="QRPLGGKHKKIEASKDFQTSTSTLKINYLKKEDEATYYC" FT 2nd-CYS 277..279 FT CDR3-IMGT 280..290> FT /partial FT /translation="AVW" FT JUNCTION 277..290> FT /partial FT /translation="CAVW" XX SQ Sequence 290 BP; 114 A; 62 C; 52 G; 62 T; 0 other; cagctaaagc aaactgaagt atccgtcacc agagagacag atgagagtgc gcaaatatcc 60 tgtatagctt ctcttccaga cttcggcaac acagaaatac actggtaccg gcaaaaagca 120 aaacagtttg agtatctaat atatgtccaa acaaactaca atcaacgacc cttaggaggg 180 aagcacaaaa aaattgaagc aagtaaagat tttcaaactt ctacctcaac cttgaaaata 240 aattacttga agaaagaaga tgaagccacc tactactgtg cagtctggat 290 //Some entries will not be the current versions of the sequence represented in EMBL. The user is prompted to notice that although the AC line identifier of the entry is the same as in EMBL, the data represented in the IMGT/LIGM-DB entry are not the same. The differences will appear in the ID line, AC line, DT lines, DE lines, KW lines and the FT lines, all other information being derived from EMBL. Consequently the structure of only these line types will be described here.
4.3
The DT (DaTe) line
There are two DT (DaTe) lines, formatted as follows:
DT DD-MON-YYYY (Rel. #, Created) DT DD-MON-YYYY (Rel. #, Last updated, Version #)The first DT line indicates the date the entry was created in IMGT/LIGM-DB. The second DT line indicates the last revision of annotation in IMGT/LIGM-DB by an IMGT curator. The version number corresponds to the number of times the entry was validated by IMGT curators.
The IMGT/LIGM-DB assigned description includes information about the entries:
species:
eg:
Homo sapiens
Gorilla gorilla
loci, genes or chains:
Name | Description |
---|---|
Ig | refers to Immunoglobulin loci, genes, or chains |
Ig-Heavy | refers to Immunoglobulin Heavy loci, genes, or chains |
Ig-Heavy-Alpha | refers to Immunoglobulin Heavy Alpha genes or chains |
Ig-Heavy-Alpha-1 | refers to Immunoglobulin Heavy Alpha-1 genes or chains |
Ig-Heavy-Alpha-2 | refers to Immunoglobulin Heavy Alpha-2 genes or chains |
Ig-Heavy-Delta | refers to Immunoglobulin Heavy Delta genes or chains |
Ig-Heavy-Epsilon | refers to Immunoglobulin Heavy Epsilon genes or chains |
Ig-Heavy-Gamma | refers to Immunoglobulin Heavy Gamma genes or chains |
Ig-Heavy-Gamma-1 | refers to Immunoglobulin Heavy Gamma-1 genes or chains |
Ig-Heavy-Gamma-2 | refers to Immunoglobulin Heavy Gamma-2 genes or chains |
Ig-Heavy-Gamma-2-a | refers to Immunoglobulin Heavy Gamma-2-a genes or chains |
Ig-Heavy-Gamma-2-b | refers to Immunoglobulin Heavy Gamma-2-b genes or chains |
Ig-Heavy-Gamma-2-c | refers to Immunoglobulin Heavy Gamma-2-c genes or chains |
Ig-Heavy-Gamma-3 | refers to Immunoglobulin Heavy Gamma-3 genes or chains |
Ig-Heavy-Gamma-4 | refers to Immunoglobulin Heavy Gamma-4 genes or chains |
Ig-Heavy-Khi | refers to Immunoglobulin Heavy Khi genes or chains (Skate, Xenopus) |
Ig-Heavy-Mu | refers to Immunoglobulin Heavy Mu genes or chains |
Ig-Heavy-Nu | refers to Immunoglobulin Heavy Nu genes or chains, also designated as NAR (Nurse Shark) |
Ig-Heavy-Omega | refers to Immunoglobulin Heavy Omega genes or chains (Shark) |
Ig-Heavy-Upsilon | refers to Immunoglobulin Heavy Upsilon genes or chains |
Ig-Light | refers to Immunoglobulin Light loci, genes, or chains |
Ig-Light-Iota | refers to Immunoglobulin Light Iota loci, genes, or chains |
Ig-Light-Kappa | refers to Immunoglobulin Light Kappa loci, genes, or chains |
Ig-Light-Lambda | refers to Immunoglobulin Light Lambda loci, genes, or chains |
Ig-Surrogate | refers to Immunoglobulin pseudo-light genes or chains of the pre-B cell receptor |
Ig-Surrogate-Lambda-5 | refers to Immunoglobulin Surrogate Lambda-5 gene or chain of the pre-B cell receptor |
Ig-Surrogate-Lambda-like | refers to Immunoglobulin Surrogate Lambda-like gene or chain of the pre-B cell receptor |
Ig-Surrogate-VpreB | refers to Immunoglobulin Surrogate VpreB gene or chain of the pre-B cell receptor |
Ig-Surrogate-VpreB-1 | refers to Immunoglobulin Surrogate VpreB-1 gene or chain of the pre-B cell receptor |
Ig-Surrogate-VpreB-2 | refers to Immunoglobulin Surrogate VpreB-2 gene or chain of the pre-B cell receptor |
TcR | refers to T cell Receptor loci, genes, or chains |
TcR-Alpha | refers to T cell Receptor Alpha loci, genes, or chains |
TcR-Beta | refers to T cell Receptor Beta loci, genes, or chains |
TcR-Beta-1 | refers to T cell Receptor Beta-1 genes or chains |
TcR-Beta-2 | refers to T cell Receptor Beta-2 genes or chains |
TcR-Delta | refers to T cell Receptor Delta loci, genes, or chains |
TcR-Gamma | refers to T cell Receptor Gamma loci, genes, or chains |
TcR-Gamma-1 | refers to T cell Receptor Gamma-1 genes or chains |
TcR-Gamma-2 | refers to T cell Receptor Gamma-2 genes or chains |
TcR-Gamma-3 | refers to T cell Receptor Gamma-3 genes or chains |
TcR-Gamma-4 | refers to T cell Receptor Gamma-4 genes or chains |
TcR-Gamma-5 | refers to T cell Receptor Gamma-5 genes or chains |
TcR-PreT-Alpha | refers to T cell Receptor PreT Alpha genes or chain of the pre-T cell receptor |
configuration:
germline for sequences related to Ig or TcR variable gene, diversity segment, and joining segment rearranged for sequences related to Ig or TcR variable gene, diversity segment, and joining segment unknown for sequences related to Ig or TcR variable gene, diversity segment, and joining segment undefined for sequences related to Ig or TcR constant gene onlyfunctionality:
The definition of functionality for a germline entity V-GENE, C-GENE, J-SEGMENT and D-SEGMENT is based on the sequence analysis. FUNCTIONAL A germline entity (V-GENE, C-GENE, J-SEGMENT or D-SEGMENT) is functional if the coding region has an open reading frame without stop codon, and if there is no described defect in the splicing sites, recombination signals and/or regulatory elements. ORF (Open Reading Frame) A germline entity (V-GENE, C-GENE, J-SEGMENT or D-SEGMENT) is qualified as ORF (Open Reading Frame) if the coding region has an open reading frame, but : alterations have been described in the splicing sites, recombination signals and/or regulatory elements. and/or changes of conserved amino acids have been suggested by the authors to lead to uncorrect folding. and/or the germline entity is an ORPHON. A germline J-SEGMENT with an open reading frame and no described defect, but preceding a C-GENE which is a pseudogene, is qualified as ORF. PSEUDOGENE A pseudogene germline entity (V-GENE, C-GENE, J-SEGMENT or D-SEGMENT) is characterized by the presence of stop codon(s) and/or frameshift mutation(s). A V-GENE is considered as a pseudogene if these defect occur in the L-PART1 and/or V-EXON, or if there is a mutation in the L-PART1 INIT-CODON atg. VESTIGIAL (or relics) Defines germline sequences which cannot be assigned to a given subgroup because they are too divergent from the other pseudogenes and have too many stop codons and frameshifts.
PRODUCTIVE A rearranged (genomic or cDNA) entity is productive if the Ig or TcR sequence has an open reading frame, with no stop codon and no defect described in the initiation codon, splicing sites and/or regulatory elements, and an in frame JUNCTION. UNPRODUCTIVE A rearranged (genomic or cDNA) entity is unproductive if the Ig or TcR sequence is characterized by an out_of_frame JUNCTION and/or the presence of stop codon(s) and/or frameshift mutation(s), and/or a defect described in the splicing sites and/or the regulatory element(s), and/or unusual features (TRANSLOCATED, GENE FUSION...).structure and localisation:
chimeric defines an in vitro or in vivo fusion gene between Ig and/or TcR genes. [2 sources] engineered engineered defines an Ig or TcR gene modified by deliberate mutagenesis in vitro. [1 source] gene-fusion in vitro gene fusion between two or more different genes (at least one of them being Ig or TcR). [1 source (Ig or TcR) + X] humanized humanized defines a natural or synthetic human Ig or TcR gene modified in vitro with non-human recognition site sequences.[1 source (murine, rabbit...) + 1 source (human)] orphon Ig or TcR gene found in vivo on a different locus from the main locus (either on the same chromosome or on another chromosome), without hallmarks of RNA processing. processed defines an Ig or TcR gene found in vivo on a different locus from the main locus (either on the same chromosome or another chromosome) with hallmarks of RNA processing (spliced regions). regular Ig or TcR gene with no special characteristics regarding its in vivo localisation and with no in vitro modifications. scFv defines two immunoglobulin (or by extension T cell receptor) V-DOMAINs covalently linked by a short linker peptide in vitro [1 or 2 sources] transgene transgene Ig or TcR gene artificially introduced into a multicellular organism (mouse, plant...). translocated defines a fused gene resulting from a translocation (in vivo), at least one of the involved loci being Ig or TcR locus. [1 source (Ig or TcR) + X] transposed Defines an Ig or TcR transgene permanently inserted in a chromosome). unusual Defines an Ig or TcR gene with unexpected feature(s) (for instance, insertion of unknown sequences, unexpected rearrangements by inversion...
IGHV IGKV IGLV IGL1V IGL2V TRAV TRBV TRDV TRGVspecificity:
eg: anti-F(ab')2 anti-Fc anti-HIV anti-HIV_1 anti-HLA anti-HLA-DQ3 anti-HLA-DR anti-CD19 anti-CD29 anti-CD4 anti-CD8
The format of a DE line is: DE EMBL description (free text) DE species; receptor and chain; nucliec acid type; functionality; structure; DE chain; subgroup; specificity;4.5 The KW (KeyWord) lines
4.6 The FT (Feature Table) lines
The FT (Feature Table) lines provide a mechanism for the annotation
of the sequence data. Regions or sites in the sequence which are of interest
are listed in the table. In general, the features in the feature table
represent signals or other characteristics reported in the cited references.
In some cases, ambiguities or features noted in the course of data preparation
have been included. The feature table is subject to expansion or change
as more becomes known about a given sequence (see the ftable.doc for a
more complete description)
Columns Description ------- --------------------------- 14-25 entry name (left-justified) 27-29 division code 31-40 primary accession number 44-55 entry name (left-justified) 57-59 division code 61-70 primary accession number
1 10 20 30 40 50 60 70 80 +--------+---------+---------+---------+---------+---------+---------+---------+ Artificial gene AGGCHIA SYN K03553 Bos javanicus BOVTCRVB6 MAM L18950 Bos taurus (cattle) BTIGG1HC MAM X16701 Caiman crocodylus CCIGHVB VRT M12769 CCIG01 VRT V00146 +--------+---------+---------+---------+---------+---------+---------+---------+ 1 10 20 30 40 50 60 70 80
+--------+---------+---------+---------+---------+---------+---------+---------+ 1 10 20 30 40 50 60 70 80 IgD HSIGCB9 PRI K01311 IgG MMIGK21 ROD D14728 MMIGGVL ROD X81463 IgM HSIGCB6 PRI J00259 HSIGCB3 PRI K01307 HSIGCB5 PRI K01309 HSIGHZD PRI L29120 HSIGHZG PRI L29153 HSIGHZH PRI L29154 HSIGLZG PRI L29156 MMIGHDG ROD M11699 MMIGHDH ROD M11700 MMIGHDJ ROD M11702 MMIGHDM ROD M11705 MMIGHDP ROD M11708 +--------+---------+---------+---------+---------+---------+---------+---------+ 1 10 20 30 40 50 60 70 80
1 10 20 30 40 50 60 70 80 +--------+---------+---------+---------+---------+---------+---------+---------+ D01059 HSIGLYM1 PRI D12725 MMD12725 ROD D12727 MMD12727 ROD D12729 MMD12729 ROD D12733 MMD12733 ROD D12735 MMD12735 ROD +--------+---------+---------+---------+---------+---------+---------+---------+ 1 10 20 30 40 50 60 70 80
Columns Field Name Description ------- --------------- ------------------------------------------- 01-10 entry name left-justified 12-12 entry status + = new at this release * = updated at this release blank = unchanged from previous release 14-14 data class IMGT/LIGM annotation level 1 to 3 16-18 molecule type DNA or RNA 20-22 division three-letter division code 24-29 sequence length right-justified 31-80 description left-justifiedIf an entry's description cannot fit into columns 31-80, it will be continued onto one or more additional lines. Continuation lines contain description text (left-justified) in columns 31-80; columns 01-30 are blank. An excerpt from the short directory index file is given below (the ruler is presented for your convenience - it does not appear in the index file):
1 10 20 30 40 50 60 70 80 +--------+---------+---------+---------+---------+---------+---------+---------+ AGGCHIA + 2 RNA SYN 403 Mouse/human Ig active chimeric kappa-chain mRNA (V-J5:mouse/C: human). rearranged; Ig-Light-Kappa; VKappa; KV6; BOVTCRVB6 * 3 DNA MAM 415 Bos javanicus T cell receptor gene V-region, exons 1 (3' end) and 2 (5' end). TCR; functional; BTIGG1HC * 2 DNA MAM 1830 Bovine Ig germline heavy chain gamma-1-chain gene C-region, 3' end germline; Ig-Heavy; functional; +--------+---------+---------+---------+---------+---------+---------+---------+ 1 10 20 30 40 50 60 70 80
This manual and the database it accompanies may be copied and redistributed freely, without advance permission, provided that this statement is reproduced with each copy.
Last modified: February 2004
For any other use please contact Marie-Paule Lefranc lefranc@ligm.igh.cnrs.fr.
IMGT initiator and coordinator: Marie-Paule Lefranc
(lefranc@ligm.igh.cnrs.fr)
Bioinformatics manager: Véronique Giudicelli (giudi@ligm.igh.cnrs.fr) Computer manager: Denys Chaume (Denys.Chaume@igh.cnrs.fr) Interface design: Chantal Ginestoux (chantal@ligm.igh.cnrs.fr) © Copyright 1995-2004 IMGT, the international ImMunoGeneTics database |