1. INTRODUCTION
2. FORMAT EXAMPLE
3. FEATURE KEYS
4. FEATURE LOCATION
5. FEATURE QUALIFIERS
The feature table contains information about genes and gene products, as well as regions of biological significance reported in a sequence. It contains information on regions of the sequence that code for proteins and RNA molecules. It also enumerates differences between different reports of the same sequence and provides cross-references to other data collections, as described in more detail below.
The first two lines of the feature table in IMGT/LIGM-DB entries are feature header (FH) lines, specific to the EMBL flatfile format. The first one includes the column headers 'Key' and 'Location/Qualifier'. The second one is an empty spacer line.
Each feature consists of a feature key and a location (see below for details). If the location does not fit on the same line as the key, a continuation line may follow. If further information about the sequence is required, one or more additional lines containing feature qualifiers may follow.
Features appear on FT lines. The linetype code FT appears in columns 1-2 and columns 3-5 are blank. The feature key begins in column 6 and may be no more than 15 characters in length. The location begins in column 26. Feature qualifiers begin on subsequent FT lines at column 26. Location, qualifier, and continuation lines may extend from column 26 to 80. Each qualifier is added on a new line.
An example of the feature table format is:
----+----+----+----+----+----+----+----+----+----+----+----+----+----+
10 20 30 40 50 60 70
Key Location/Qualifiers
L-PART1 1..28
V-GENE 1..222
/cell_type="B cell"
/note="NCBI gi: 483900"
/partial
/product="immunoglobulin kappa chain, V-region
(SPK.4)"
/tissue_type="Graves' thyroid"
----+----+----+----+----+----+----+----+----+----+----+----+----+----+
10 20 30 40 50 60 70
Thus, there are 4 types of feature table lines:
Line type Content #/entry #/feature
--------- ------- ------- ---------
Header Column titles 1 N/A
Feature descriptor Key and location 1 to many 1
Feature qualifiers Qualifiers and values N/A 0 to many
Continuation lines Feature descriptor or 0 to many 0 to many
qualifier continuation
The position of the data items within the feature descriptor line is as
follows:
column position data item
--------------- ---------
1-5 blank (may be used to improve readability, ie FT)
6-24 feature key
25 blank
26-80 location
Data on the qualifier and continuation lines begins in column position 26 (the first 25 columns contain blanks the first character is a '/' followed by the the qualifier discription). Qualifiers used here are the same as the EMBL qualifiers except for one exception the AA_number qualifier.
The sections below provide a brief introduction to the new feature table format.
The first item on an FT line is the feature key. It starts in column 6 and can continue to column 24. The list of valid feature keys is shown below:
The second item on the FT line designates the location of the feature in the sequence. The location begins at column 26. Several conventions are used to indicate sequence location.
Base numbers in locations refer to the numbering in the entry, which is not necessarily the same as the numbering scheme used in the original report. The first base in the presented sequence is numbered base 1. Sequences are presented in the 5' to 3' direction.
A location can be one of the following:
o A single base.
o A contiguous span of bases.
A contiguous span of bases is indicated by the number of the first and last bases in the range separated by two periods (e.g., 23..79). Starting and ending positions can be indicated by base number.
Qualifiers provide additional information about features. They take the form of a slash (/) followed by a qualifier name and, if applicable, an equal sign (=) and a qualifier value. Feature qualifiers begin at column 26.
Qualifiers convey many types of information. Their values can, therefore, take
several forms:
o Free text.
o Controlled vocabulary or enumerated values.
o Citations or reference numbers.
o Sequences.
o Feature labels.
Text qualifier values are enclosed in double quotation marks. The text can consist of any printable characters (ASCII values 32-126 decimal). If the text string includes double quotation marks, each double quotation mark must be escaped by placing a double quotation mark in front of it (e.g., /note="This is an example of ""escaped"" quotation marks").
Citation or reference numbers for an entry are enclosed in square brackets ([]) to distinguish them from other numbers.
A literal sequence of bases (e.g., "atgcatt") is enclosed in quotation marks. Literal sequences are distinguished from free text by context. Qualifiers that take free text as their values do not take literal sequences, and vice versa.
The '/label=' qualifier takes a feature label as its qualifier. Although feature labels are optional, they allow unambiguous references to features. The feature label identifies a feature within an entry; when combined with the accession number and the name of the data bank from which it came, it is a unique tag for that feature.
The following is a list of valid feature qualifiers:
|
Qualifier |
Description |
| allele | Name of the allele for the a given gene |
| allotype | polymorphic extracellular marker detected by serological methods and present in different individuals of the same species |
| AA_IMGT | Amino Acid numerotation in the sequence according to IMGT |
| AA_number | Amino Acid numerotation in the sequence |
| cell_line | Cell line from which the sequence was obtained |
| cell_type | Cell type from which the sequence was obtained |
| chromosome | Chromosome (e.g. Chromosome number) from which the sequence was obtained |
| citation | Reference to a citation listed in the entry reference field |
| clone | Clone from which the sequence was obtained |
| clone_lib | Clone library from which the sequence was obtained |
| codon_start | Indicates the offset at which the first complete codon of a coding feature can be found, relative to the first base of that feature |
| cons_splice | Differentiates between intron splice sites that conform to the 5'-GT ... AG-3' splice site consensus |
| country | Country of origin for DNA sample, intended for epidemiological or population studies |
| CDR_length | Number of Amino Acids in CDR1-IMGT, CDR2-IMGT, CDR3-IMGT, separated by dots, and shown in brackets. X is used for partial or absent CDR |
| db_xref | Database cross-reference: pointer to related information in another database |
| dev_stage | If the sequence was obtained from an organism in a specific developmental stage, it is specified with this qualifier |
| evidence | Value indicating the nature of supporting evidence, distinguishing between experimentally determined and theoretically derived data |
| function | Function attributed to a sequence |
| gdb_xref | Genome Databank unique ID cross reference qualifier |
| gene | Symbol of the gene corrresponding to a sequence region |
| gene_alias | Other gene name in the litterature |
| germline | Denotes that the sequence is from immunoglobulin or T cell receptor unrearranged DNA or RNA |
| germline_frame | Translation arbitrarily shown in the germline reading frame, for J-REGION (and C-REGION in cDNA) of unproductive (genomic or cDNA) rearranged sequences |
| haplotype | Haplotype of the organism from which the sequence was obtained |
| insertion_seq | Insertion sequence element from which the sequence was obtained |
| in_frame | No frameshift in the JUNCTION |
| isolate | Individual isolate from which the sequence was obtained |
| isolation_source | Describes the physical, environmental and/or local geographical source of the biological sample from which the sequence was derived |
| IMGT_BAC_clone | Name of the BAC clone from which the sequence is derived |
| IMGT_cell_line | Name of the cell line from which the sequence is derived |
| IMGT_cosmid_clone | Name of the cosmid clone from which the sequence is derived |
| IMGT_MAC_clone | Name of the MAC clone from which the sequence is derived |
| IMGT_note | Comment added by the LIGM curators to the IMGT feature |
| IMGT_phage_clone | Name of the phage clone from which the sequence is derived |
| IMGT_plasmid_clone | Name of the plasmid clone from which the sequence is derived |
| IMGT_YAC_clone | Name of the YAC clone from which the sequence is derived |
| label | A label used to permanently identify a feature |
| lab_host | Laboratory host used to propagate the organism from which the sequence was obtained |
| map | Genomic map position of feature |
| nomgen | Name of the gene corrresponding to a sequence region |
| note | Any comment or additional information |
| number | A number to indicate the order of genetic elements (e.g., exons or introns) in the 5' to 3' direction |
| organism | The scientific name of the organism that provided the sequenced genetic material |
| out_of_frame | Frameshift in the JUNCTION |
| partial | Differentiates between complete regions and partial ones |
| product | Name of a product encoded by the sequence |
| protein_id | Protein Identifier, issued by International collaborators. This qualifier consists of a stable ID portion (3+5 format with 3 position letters and 5 numbers) plus a version number after the decimal point. |
| pseudo | Indicates that this feature is a non-functional version of the element named by the feature key |
| putative_limit | Refers to uncertain limit(s) of a subregion |
| PCR_conditions | Description of reaction conditions and components for PCR |
| rearranged | Denotes that the sequence is from immunoglobulin or T cell receptor rearranged DNA or RNA |
| replace | indicates that the sequence identified by a feature's intervals is replaced by the sequence shown in "text" |
| rpt_family | Type of repeated sequence; Alu or Kpn, for example |
| rpt_type | Organization of repeated sequence |
| rpt_unit | Identity of repeat unit that constitutes a repeat_region |
| sequenced_mol | Molecule from which the sequence was obtained |
| sex | Sex of organism from which the sequence was obtained |
| specificity | Specificity of an immunoglobulin or T cell receptor chain |
| specific_host | Natural host from which the sequence was obtained |
| specimen_voucher | An identifier of the individual or collection of the source organism and the place where it is currently stored, usually an institution |
| standard_name | Accepted standard name for this feature |
| strain | Strain from which the sequence was obtained |
| sub_clone | Sub-clone from which the sequence was obtained |
| sub_species | Sub-species name of organism from which the sequence was obtained |
| sub_strain | Sub-strain from which the sequence was obtained |
| tissue_lib | Tissue library from which the sequence was obtained |
| tissue_type | Tissue type from which sequence was obtained |
| transgenic | Identifies the source feature of the organism which was the recipient of transgenic DNA |
| translation | Automatically generated one-letter abbreviated amino acid sequence of the coding regions |
| transl_except | Translational exception: single codon the translation of which does not conform to genetic code defined by Organism and /codon |
| transposon | Transposable element from which the sequence was obtained |
This manual and the database it accompanies may be copied and redistributed freely, without advance permission, provided that this statement is reproduced with each copy.
Last modified: July 2004
Software material and data coming from IMGT server may be used for academic research only, provided that it is referred to IMGT, and cited as "IMGT, the international ImMunoGeneTics database http://imgt.cines.fr:8104 (Initiator and coordinator: Marie-Paule Lefranc, Montpellier, France)." References to cite: Lefranc, M.-P. et al., Nucleic Acids Research, 27, 209-212 (1999); Ruiz, M. et al., Nucleic Acids Research, 28, 219-221 (2000) Lefranc, M.-P., Nucleic Acids Research, 29, 207-209 (2001), Nucleic Acids Res., 31, 370-310 (2003) Full text.For any other use please contact Marie-Paule Lefranc lefranc@ligm.igh.cnrs.fr.
|
IMGT initiator and coordinator: Marie-Paule Lefranc
(lefranc@ligm.igh.cnrs.fr)
Bioinformatics manager: Véronique Giudicelli (giudi@ligm.igh.cnrs.fr) Computer manager: Denys Chaume (Denys.Chaume@igh.cnrs.fr) Interface design: Chantal Ginestoux (chantal@ligm.igh.cnrs.fr) © Copyright 1995-2004 IMGT, the international ImMunoGeneTics database |