Coverage The coverage of reads mapped to a

Coverage The coverage of reads mapped to a reference genome was assessed using BEDTools ( https://​github.​com/​arq5x/​bedtools2) and the genomeCoverageBed function. Plasmid analysis A query sequence

of 9299 bases, positions 3036 to 12334 from Lens plasmid pLPL (Accession: NC_006366) was used to search blast databases using blastall (blastn program) from NCBI. Overview of genome similarity BRIG (BLAST Ring Image Generator) was used to produce an image to illustrate the similarity between the Corby genome and one sequence from each of the BAPS clusters (except for Clusters 1 and 2 where two sequences were included, one from each clade on the phylogenetic tree produced from SBT data). Similarity was determined using BLASTn. Gene content analysis A novel method was used to Fosbretabulin price cluster the genes from this website all the genomes in the study. This method we have termed CoreAccess is reported in full in a paper currently under preparation. Briefly, the protein sequences of all genes from the genomes were

used as input for the program cd-hit [49]. These genes were either those already annotated in the sequence files of the GenBank genomes or those predicted using Glimmer3 [50] trained using the Corby sequence genes. The proteins were clustered using cd-hit using a hierarchical approach, first clustering at a high percentage cut-off and then stepwise lowering of the cut-off and clustering the clusters from the previous step. The final cut-off was 80%. This hierarchical approach overcomes errors that can arise in single CCI-779 step clustering as described on the cd-hit website (cd-hit.org). The hypothesis underlying this methodology is that the clusters contain homologous proteins from the different genomes and as such represent groups of proteins with the same or similar function from the different genomes. In order to be able to search the clusters and find for example genes shared by all the genomes, the information about the clusters

in the cd-hit output was collated into a sqlite3 database using tools within the Core Access suite. Phylogenetic Tree construction Methocarbamol Maximum likelihood tree phylogenetic trees were produced from mutiple fasta files by the MEGA software package [51] using the Tamura-Nei model, and testing the phylogeny with 500 bootstrap replicates. To construct a tree from the gene content analysis, the database generated by CoreAccess was queried using SQL so that the presence/absence of a protein representative from each strain in every cluster was recorded to produce a phylip compatible discrete state (binary 0/1) character matrix. The seqboot program for the Phylip package [52] was used to create 100 bootstrap replicates using the Discrete Morphology data type and Non-interleaved as parameters.

Comments are closed.