Table of contents
IMGT/JunctionAnalysis [1],
at Montpellier, is an integrated analysis tool for the analysis of Immunoglobulin (IG)
and T cell receptor (TR)
JUNCTION nucleotide sequences.
IMGT/JunctionAnalysis analyses in a single search up to 5000
junctions provided that the IMGT V-GENE and J-GENE and ALLELE names are identified
[1-4].
The tool:
- analyses junction nucleotide sequences of rearranged IG
(IGH, IGK or IGL) and TR (TRA, TRB, TRG or TRD) genes,
- identifies accurately D-GENE and ALLELE in the IGH, TRB and TRD junctions,
- delimits precisely the different regions of the junctions:
3'V-REGION,
D-REGION,
5'J-REGION,
as well as
N-REGION (N, N1, N2)
and
P-REGION (P).
- determines the number of mutations in the 3'V-REGION, D-REGION and 5'J-REGION of the IG JUNCTIONs.
- calculates the "gc" content of the N-REGION of the IG and TR JUNCTION.
The IMGT/JunctionAnalysis Welcome page allows to enter the input information.
- Selection:
- Species (drop-down list).
- Locus (radio buttons): IGH, IGK, IGL, TRA, TRB, TRG, TRD
The analysis of IG and TR junctions of mice and humans can be
performed exhaustively. Analysis of junctions of other species (rat, rabbit, trout) becomes progressively more available
as genomic sequences are annotated in IMGT.
- JUNCTION nucleotide sequences:
- JUNCTION nucleotide sequences can be entered either directly
in the reserved box by typing or by "copy/paste", or by giving the path access to a local file
(click on 'Browse' or 'Parcourir' or type its full path in the reserved box).
- The required format is the FASTA format. Each JUNCTION nucleotide
sequences must be preceded by the following information:
- identifier ("input"), with a 10 character maximum length.
This identifier can be a sequence name, an accession number,
a clone name, etc.
Note that the identifier must be unique in a given set of JUNCTION.
- the name of the V-GENE and ALLELE according to the
IMGT gene name nomenclature.
- the name of the J-GENE and ALLELE according to the
IMGT gene name nomenclature.
- IMGT/JunctionAnalysis accepts up to 5000 junction in a single search. Sequences only need to
be entered in the same format, starting a new line for each sequence:
>Input1, V-GENE and ALLELE name, J-GENE and ALLELE name
nucleotide sequence (in uppercase or lowercase)
>Input2, V-GENE and ALLELE name, J-GENE and ALLELE name
nucleotide sequence (in uppercase or lowercase)
For instance,
>M62724, IGHV7-4-1*02, IGHJ4*02
tgtgcgagagaagatagcaatggctacaaaatatttgactactgg
>Z47269, IGHV1-69*06, IGHJ5*02
tgtgcgagagggggggctaaggtcgaatttttggagtggtttcatgggtactggttcgacccctgg
- If the V-GENE ALLELE or J-GENE ALLELE is unknown, the JunctionAnalysis
tool accepts a '?' character instead of the allele number (ex: IGHV1-2*?)
and will run the search against the allele *01 by default.
- If there are several proposed V-GENE and/or J-GENE,
the different V-GENE and ALLELE names and/or J-GENE and ALLELE
names have to be separated by the '/' character
(ex: IGHV1-2*01/IGHV1-3*?/IGHV1-18*02, IGHJ1*01/IGHJ2*01).
The IMGT/JunctionAnalysis tool will run the search against the first
V-GENE and ALLELE and J-GENE and ALLELE listed.
Note that:
- JUNCTION nucleotide sequences must start with the V-REGION 2nd-CYS codon and end with J-REGION J-PHE or J-TRP codon (positions
104 and 118, respectively, in the IMGT unique numbering for V-DOMAIN [5]).
- "V-GENE and ALLELE" and "J-GENE and ALLELE" are those obtained
by querying IMGT/V-QUEST [6].
If several alleles give the same score, select the most probable one.
- Example of IMGT/JunctionAnalysis results
The selection of the option 'Example of IMGT/JunctionAnalysis results' allows you to vizualize an example of the results
provided by IMGT/JunctionAnalysis.
Sequences used in the 'Example of IMGT/JunctionAnalysis results':
>Z70256,IGHV2-26*01,IGHJ4*02
tgtgtacgtgttgtgcagcgcctggtacccaaatatcactttgaccactgg
>Z70257,IGHV3-7*02,IGHJ2*01
tgtgcgagggatggcagctcttatgcccgcccctactggtacttcgatctctgg
>Z70606,IGHV4-31*03,IGHJ3*01
tgtgcgagagcgactacgcactatgcttttgatgtctgg
>Z70608,IGHV4-39*02,IGHJ3*02
tgtgccagagtaacgatttttggagtggttattccccgggggaatgcttttgatatctgg
>Z70610,IGHV4-34*09,IGHJ3*02
tgtgcgagagtcgggagcgatttttggagtggttattcccgacatgatgcttttgatatctgg
>Z70611,IGHV4-59*01,IGHJ5*02
tgtgcgagacatggtaactataatgccggcgttgactggttcgacccctgg
>Z70613,IGHV4-59*01,IGHJ4*02
tgtgcgagagcagcagctggtacctccctctttgactactgg
>Z70614,IGHV4-59*01,IGHJ4*02
tgtgcgagacactataattcggggacttatcccctcgactactgg
>Z70615,IGHV4-59*01,IGHJ2*01
tgtgcgagagggctggtaaagagggtttcggaatactggtacttcgatctctgg
>Z70616,IGHV4-34*01,IGHJ5*02
tgtgcgagagcgggtttgggttcccactggttcgacccctgg
>Z70620,IGHV4-30-4*01,IGHJ3*02
tgtgcgagagaccggggcgggatggttcgggatgcttttgatatctgg
>Z70621,IGHV4-39*01,IGHJ4*02
tgtgcgagacaccacgatttatggttcggggagtttgacccccttgactactgg
>Z70622,IGHV4-39*06,IGHJ4*03
tgtgcgagagattgccccgctcctgccaaaatgtattactatggttcggggatatgtacgtttgactactgg
- Display Results
- "List of all eligible D-GENE."
This option allows one to visualize all D genes that match a junction and to compare their score. It is displayed by
default when only one junction is analyzed in the run, but can be disable.
- "Colored IMGT AA classes and histogram."
The IMGT AA classes and histogram are displayed by
default with colors of the AA according to the 11 IMGT physicochemical
AA classes (Pommié et al. 2004) (IMGT Aide-mémoire>Amino acids, https://www.imgt.org) , but can be disable.
- "Output order" in "CDR3-IMGT length decreasing order" or in "CDR3-IMGT
length increasing order."
The results in "JUNCTION alignments with translation and IMGT AA classes" may be displayed in "Same order as
input" (default), in "CDR3-IMGT length decreasing order," or "CDR3-IMGT length increasing order".
- Advanced Parameters
- 5' and 3' ends of the JUNCTION:
- Default: the JUNCTION nucleotide sequences must start in 5' with a cystein (tgt or tgc) codon and must end in 3' with a
tryptophan (tgg) or phenylalanine (ttt or ttc) codon.
- The JUNCTION nucleotide sequences may start in 5' and/or may end in 3' with any codon.
- Nb of D-GENE (for IGH, TRB and TRD JUNCTION):
- Default values are 1 for IGH, 1 for TRB and 3 for TRD.
- You may modify it from 0 to 3.
- Number of accepted mutations in 3'V-REGION, D-REGION, and 5'J-REGION:
- Delimitation of 3'V-REGION, D-REGION and 5'J-REGION:
-
Default: ('m' indicates a mutation and '-' indicates an identical nucleotide)
3'V-REGION delimitation:
- IGH locus: the patterns 'm', 'm-' and 'mm--' are trimmed from the 3'V-REGION and 3' end of the D-REGION
- IGL and IGK locus: the patterns 'm', and '-m' are trimmed from the 5'J-REGION and 5' end of the D-REGION
- TR loci: the patterns 'm', 'm-', 'm--' and 'm---' and 'm----' are trimmed from the 3'V-REGION
D-REGION delimitation:
- The patterns 'm', '-m' and '--mm' are trimmed from the 5'end of the D-REGION
- The patterns 'm', 'm-' and 'mm--' are trimmed from the 3' end of the D-REGION
5'J-REGION delimitation:
- IGH locus: the patterns 'm', '-m' and '--mm' are trimmed from the 5'J-REGION
- IGL and IGK locus: the patterns 'm', and '-m' are trimmed from the 5'J-REGION and 5' end of the D-REGION
- TR loci: the patterns 'm', '-m', '--m' and '---m' and '----m' are trimmed from the 3'V-REGION
by comparison with the corresponding alleles germline sequences.
-
Stop trimming with the first encountered identical nucleotide
- D-GENE choice (if several have the same score):
- The less mutated one,
- The longest one,
- The one more upstream in the locus.
The IMGT/JunctionAnalysis Results comprises:
The information provided in the IMGT/JunctionAnalysis Search page
is reported in 3 columns (blue):
- Input
- V name: IMGT V-GENE and ALLELE name
- J name: IMGT J-GENE and ALLELE name
Results from the IMGT/JunctionAnalysis tool are displayed in the other columns:
- D name: IMGT D-GENE and ALLELE name for IGH and TRB loci
(In the case of the TRD locus the names of the 3 IGHD genes are
displayed above their respective sequences)
- Vmut: Number of mutations in the "input" 3'V-REGION identified
by the IMGT/JunctionAnalysis tool, by comparison to
the corresponding germline allele sequence.
- Dmut: Number of mutations in the D-REGION sequence
identified by the IMGT/JunctionAnalysis tool, by comparison
to the corresponding germline allele sequence.
- Jmut: Number of mutations in the "input" 5'J-REGION identified
by the IMGT/JunctionAnalysis tool, compared to the
corresponding germline allele sequence.
- Ngc: Ratio of the number of g+c nucleotides to the
total number of N region nucleotides.
-
JUNCTION decryption provides region lengths (in nt) of the N regions {N}, {N1} to {N4} (braces),
and of the (3'V), (D) and (5'J) regions (parentheses), with 3' and 5' indicating the number of trimmed (-)
or palindromic (+) nucleotides (mutually exclusive) at the 3' and 5' end of these regions: (3'V)3'{N}[5'(D)3'{N}]5'(5'J).
The JUNCTION decryption labels for the junction with no D, 1 D, 2 D or 3 D are shown in Table below.
Table. IMGT JUNCTION decryption labels
JUNCTION decryption |
Number of identified D genes
|
(3'V)3'{N}[5'(D)3'{N}]5'(5'J)a
|
IG locib
|
TR locib
|
No D
|
(3'V)3'{N}5'(5'J)
|
IGK, IGL, IGI
|
TRA, TRG
|
1 D
|
(3'V)3'{N1}5'(D)3'{N2}5'(5'J)
|
IGH
|
TRB, (TRD)
|
2 D
|
(3'V)3'{N1}5'(D1)3'{N2}5'(D2)3'{N3}5'(5'J)
|
(IGH)
|
(TRB), (TRD)
|
3 D
|
(3'V)3'{N1}5'(D1)3'{N2}5'(D2)3'{N3}5'(D3)3'{N4}5'(5'J)
|
(IGH)
|
TRD
|
a Square brackets correspond to JUNCTiON decryption labels which vary depending on the number of identified D genes (detailed in the table for 1D, 2D or 3D; in these cases, {N} becomes {N1}).
b Default value for search number of D-GENE is 1 for IGH, 1 for TRB and 3 for TRD. Optional values from 0 to 3 may be chosen, in Advanced parameters, for IGH (0, 2 or 3), TRB (2) or TRD (1 or 2) (locus shown between parentheses).
Legend:
(3'V): Number of nucleotides of the "input" 3'V-REGION.
(5'J): Number of nucleotides of the "input" 5'J-REGION.
(D) Number of nucleotides of the D-REGION sequence identified (if one D).
(D1) Number of nucleotides of the D1-REGION sequence identified (first D-REGION if 2 or 3 D).
(D2) Number of nucleotides of the D2-REGION sequence identified (second D-REGION if 2 or 3D).
(D3) Number of nucleotides of the D3-REGION sequence identified (third D-REGION if 3 D).
{N} Number of nucleotides of the N-REGION sequence identified (if no D).
{N1} Number of nucleotides of the N1-REGION sequence identified (first N-REGION if 1, 2 or 3 D).
{N2} Number of nucleotides of the N2-REGION sequence identified (second N-REGION if 1, 2 or 3D).
{N3} Number of nucleotides of the N3-REGION sequence identified (third N-REGION if 2 D).
{N4} Number of nucleotides of the N3-REGION sequence identified (third N-REGION if 3 D).
3': Number of nucleotides trimmed (0, minus sign) or palindromic P (plus sign) (mutually exclusive) at the 3' end
of the corresponding region ("input" 3'V-REGION or D-REGION sequence identified (D-, D1-, D2- or D3-REGION)).
5': Number of nucleotides trimmed (0, minus sign) or palindromic P (plus sign) (mutually exclusive) at the 5' end
of the corresponding region ("input" 5'J-REGION or D-REGION sequence identified (D-, D1-, D2- or D3-REGION)).
Click here for IMGT JUNCTION decryption values for (3'V)3'{N}[5'(D)3'{N}]5'(5'J).
Examples of JUNCTION without D (TRG) and with one D identified (TRB) are available for
Homo sapiens TRG and TRB, respectively.
JUNCTION alignments with translation and IMGT AA classes:
- Each JUNCTION nucleotide sequence is translated in amino acid sequences.
In the case of frameshifts, gaps indicated by one or two dots
are inserted to maintain the J-REGION reading frame and to facilitate
sequence comparison.
- Codons and amino acids are numbered according to the
IMGT unique numbering for V-DOMAIN.
- The numbering is made according to the longest JUNCTION
obtained in the results.
- Colors of the amino acid classes are according to the eleven
IMGT amino acid chemical characteristics classes [7]
- Underlined amino acids represent the mutated amino acids. You can click on a mutated amino acid to see the original one of the germline
region in the little rectangle below the sentence "Click on mutated (underlined) amino acid to see the original one": note that
JavaScript must be activated in the configuration of your browser.
Note that:
- Gaps inserted in JUNCTIONs may split a D-REGION or a J-REGION since the gaps
are localized at the top of the CDR3-IMGT loops and depend
on the CDR3-IMGT lengths and not on the sequence alignement.
- '*' indicates a STOP-CODON.
- '#' indicates a frameshift in the junction but with an artificially restored
reading frame to identify where would normally be the anchor 118 (J-PHE or J-TRP, depending on the locus).
- '+' and '-' at the end of the line indicates "in-frame" and "out-of-frame" JUNCTION, respectively.
- The CDR3-IMGT length, Molecular mass and pI values are indicated as "NR" (not relevant) in case of "out-of-frame" JUNCTION.
- The link to the "PhysicoChemical Descriptor" tool is indicated as "NR" (not relevant) in case of "out-of-frame" JUNCTION and/or stop codon.
The option "Colored IMGT AA classes and histogram" of 'Advanced parameters' allows the display the JUNCTION alignments with translation with or without IMGT AA classes and histogram.
List of junctions with no results
The list of junctions with no results is displayed at the bottom of the results page.
It comprises the sequence identifiers and related comments.
Note that: V gene and allele sequences that are partial in 3' or identified from rearranged sequences are not included
in the IMGT reference directory of IMGT/JunctionAnalysis because the 3' end of the V gene and allele cannot be delimitated correctly in the junctions.
- The IMGT/JunctionAnalysis is by far a more accurate tool for the D-GENE and ALLELE name identification
and delimitation. However, IMGT/V-QUEST has the advantage of proposing several solutions, which can be
useful in some cases.
- The way IMGT/V-QUEST and IMGT/JunctionAnalysis identify the D-GENE is not identical,
therefore the scores can be compared for a given tool, but score differences may be observed
between the tools.
- For two D-GENE and ALLELE with an identical score in the IMGT/V-QUEST results, IMGT/JunctionAnalysis,
in the default configuration selects the solution which gives the smallest N regions, or, in other terms,
selects a longer D (accepting nucleotide differences) to a shorter D (without nucleotide differences).
- IMGT/JunctionAnalysis has been used for IMGT standardization for statistical analyses of T cell receptor
TRAV-TRAJ junctions [8] and for recovering probabilities for nucleotide trimming processes for T cell receptor
TRA and TRG V-J [9]. Protocol of IMGT/JunctionAnalysis has been published in Cold Spring Harb Protoc; 2011 [10].
Authors:
The first version of IMGT/JunctionAnalysis tool was developed by Mehdi Yousfi,
student in the Licence d'Informatique,
Université Montpellier II,
during a stay in the
Laboratoire d'ImmunoGénétique Moléculaire,
IGH, CNRS, Montpellier, France.
IMGT/JunctionAnalysis in its present version has been developed by Denys Chaume,
Véronique Giudicelli and Patrice Duroux.
References:
[1] |
Yousfi Monod, M. et al., Bioinformatics, 20, I379-I385 (2004)
PMID:15262823
LIGM:289
|
[2] |
Lefranc, M.-P., Methods Mol. Biol., 248, 27-49 (2004)
PMID:14970490
LIGM:277
|
[3] |
Lefranc, M.-P., Current Protocols in Immunology, pp. A.1W.1-A.1W.15 (2006)
PMID:18432961
LIGM:311
|
[4] |
Giudicelli, V. and Lefranc, M.-P., Nova Science, pp77-105 (2005) LIGM :297 |
[5] |
Lefranc, M.-P. et al., Dev. Comp. Immunol., 27, 55-77 (2003)
PMID:12477501
with permission from Elsevier LIGM:268
|
[6] |
Giudicelli, V. et al., Nucl. Acids Res., 32, W435-440 (2004)
PMID:15215425
LIGM:287
|
[7] |
Pommié, C. et al., J. Mol. Recognit., 17, 17-32 (2004)
PMID:14872534
LIGM:284
|
[8] |
Bleakley, K. et al., In Silico Biol., Epub 2006, 6, 0051, 6, 573-588 (2006).
PMID:17518765
LIGM:319
|
[9] |
Bleakley, K. et al. BMC Bioinformatics, 9, 408 (2008).PMID: 18831754
PMID:18831754
LIGM:345
|
[10] |
Giudicelli, V., Lefranc, M.-P. Cold Spring Harb Protoc. 2011 Jun 1;2011(6): 716-725. pii: pdb.prot5634. doi: 10.1101/pdb.prot5634.
PMID:21632777
IMGT booklet
(high resolution)
(low resolution)
(with generous provision from Cold Spring Harbor (CSH) Protocols).
LIGM:389
|
Created: 31/08/2001
Last updated: 30/04/2020