Analaysis and Classification of Conserved DNA Elements Through Whole Genome Multiple Alignment

Rapid and inexpensive high-throughput sequencing is making available more and more complete genome sequences. Analyzing these genomes presents formidable challenges: even simple pairwise comparisons are hard, since we lack good models for genome structure and evolution. Current approaches are based on the identification of so-called syntenic blocks (genome fragments that present identical or highly similar collections of markers in most of the genomes under study). The identification of such blocks is the first step in comparative studies, yet its effect on final results has not been well studied, nor has any formal, biologically meaningful definition of syntenic blocks been proposed. (A syntenic block is simply the output of the synteny tool used, of which there are many -- FISH, Cinteny, ADHoRe, GRIMM- and DRIMM-Synteny, to name a few.) 

Syntenic blocks are in many ways analogous to genes -- in many cases, the markers used in constructing them are genes (as in OrthoCluster). Like genes, they can exist in multiple copies, in which case we could define analogs of orthology and paralogy. However, whereas genes are typically studied at the sequence level, syntenic blocks are too large for that level of detail -- it is their arrangement within the entire genome that is the main object of study. Thus the definition and construction of syntenic blocks involves both large-scale and small-scale evolutionary models.

We propose a new, principled, and well structured definition for syntenic blocks, based on homology statements at the nucleotide level. Sequence alignment is also based on such statements, but in the case of syntenic blocks, we expect a very sparse sampling of such statements (a few markers), and must deal with rearrangements, duplications, and losses, in addition to mutations and indels. We focus on a fundamental, theoretical definition rather than on an implementation, as this can provide a basis for comparison of the many operational definitions used in existing tools.

Keywords: Syntenic Block, Whole Genome Alignment, Comparative Genomics



 Cristina Ghiurcuta

Cristina Ghiurcuta
EPF Lausanne
Laboratory of Computational Biology and Bioinformatics
Bâtiment INJ 231, Station 14
CH - 1015 Lausanne

Phone: +41 21 693 13 92