Genome sequencing projects
In the genome sequencing projects, in the best of cases, a contig may be built from DNA clones from the same strain (for example, for the TRA/TRD loci, from Mus musculus strain 129/SvJ).
The mouse Celera sequence is the combination of 5 strains: 129S1/SvImJ, 129X1/SvJ, A/J, DBA/2J and C57BL/6J.
Celera sequenced four strains and incorporated the C57BL/6J from the public database.
Their computer algorithms created a "consensus" sequence from the analysis of the "track"
sequence runs of all these sequences, and there is no way to know which specific strain sequence
is used at a specific locus.
Three (A/J, DBA/2J, and 129X1/SvJ) cover roughly 30% of the mouse genome
each, and the 129S1/SvImJ is represented less that 1%. Each sequence is
a single run as it comes out of the ABI sequencer.
(Information provided by Lucio Castilla on the mgi-list of 15 Jan 2004)
RPCI-22 BACs (and RPCI-21 PACs) are derived
from the 129S6/SvEvTac strain, which matches the most commonly used stem
cells. The RP22 BAC library was inititally prepared to support the mouse
genome mapping and sequencing. However, a change in direction led to the
use of the RP23 and RP24 as the main resources for genome assembly and
these two libraries were prepared from the C57BL/6J strain. Many
additional BAC libraries from other mouse strains have been created in my lab
(see: http://bacpac.chori.org/libraries.php?disp=c).
However, so far only the C57BL/6J BACs have been mapped (by clone fingerprinting and
by BAC-end sequencing).
Two other strains are being mapped and annotated by BAC-end
sequencing: the NOD mouse strain (from the CHORI-29 BAC library created
by Michael Nefedov) see the announcement on the NIAID web site:
http://www.niddk.nih.gov/fund/diabetesspecialfunds/idd.pdf.
(Information provided by Pieter J. de Jong on the mgi-list of 30 Sept 2004)
In the case of the human genome sequencing project, contigs were built from DNA from different individuals (and from both 'maternal' and 'paternal' inherited chromosomes). Contigs do not, therefore, take into account the allelic polymorphisms and the haplotypes (alleles in linkage disequilibrium).
- In order to answer in part that question, the MHC haplotype project has for goal to sequence the MHC locus from homozygous cell lines for given MHC haplotypes.
- For precise information on the polymorphisms it is needed to refer to data which are provided in the different sections of the IMGT Repertoire, and more particularity Gene tables, Alignments of alleles, Colliers de Perles.
Provided that there has been no DNA exchange (absence of chimerism), a clone (BAC, YAC) can be used for haplotype analysis of the DNA fragment that it contains.
The largest bacterial genome to date (March 2005): Rhodococcus sp. RHA1.