Chromosomal DNA was prepared as described previously [1] Genome

Chromosomal DNA was prepared as described previously [1]. Genome sequencing and assembly The genome of strain DAL-1 was sequenced using a combination of Illumina and sellectchem 454 sequencing platforms (GS20). Pyrosequencing reads (506,607 raw reads of total read length 51,283,327 bp) showing sequence similarity to the Nichols genome sequence [1] were assembled using the Newbler assembler version into 235 contigs (45�� genome coverage). Newbler contigs were assembled according to the reference Nichols genome [6] using Lasergene software (DNASTAR, Madison, WI, USA), this assembly reduced the number of contigs to 52 separated by 52 gaps (total length of 19,545 bp). Gaps between contigs were closed using Sanger sequencing. Altogether, 43 individual PCR products were sequenced including 5 XL-PCR products.

The PCR products were sequenced using amplification and, when required, internal primers. In addition, 4 libraries of XL-PCR products were prepared and sequenced. The resulting complete genome sequence of strain DAL-1 was considered to be a draft sequence. Additional Illumina sequencing was applied to improve genome sequencing accuracy and the complete DAL-1 genome sequence was compiled from these data. A total of 2,881,557 raw Illumina reads (total length of 103,736,052 bp) were assembled, using the Velvet 0.6.05 assembler [40], into 303 contigs (with 91�� average coverage). Out of these 303 contigs, 295 showed sequence similarities to the T. pallidum Nichols genome leaving 46,148 bp of T. pallidum DAL-1 unsequenced using the Illumina method.

Each DAL-1 region not sequenced by Illumina and containing differences from the Nichols genome was resequenced using the Sanger method. In addition, all other discrepancies between the complete DAL-1 genome sequence and the Nichols genome sequence were resolved using Sanger sequencing of both DAL-1 and Nichols strains. Altogether, 15 errors were identified in the 1,093 kb Illumina resequenced region, indicating that the complete DAL-1 genome sequence contained 1 error per 73 kbp. Therefore, the final, corrected, strain DAL-1 genome sequence has an error rate less than 10-5. Genome annotation Strain DAL-1 genome was annotated with gene coordinates taken from the Nichols [1], SS14 [2] and Samoa D [4] genomes. These coordinates were adapted and recalculated. Genes identified in the DAL-1 genome were denoted with the prefix TPADAL followed by four numbers to indicate the gene number. Newly predicted genes were identified using the GeneMark and Glimmer programs. In most cases, the original Brefeldin_A locus tag values of annotated genes were preserved in the DAL-1 orthologs. Newly predicted genes in the DAL-1 genome were named according to the preceding gene with a letter suffix (e.g. TPADAL_0950a).

