Database for Single Exon Coding Sequences in Mammalian Genomes
Home Statistics Tutorial Download New

Home > Tutorial

On the home page, there are three ways to access SinEx data.

(1) Browsing the database content through the phylogenetic schema:

Figure 1. Browsable phylogenetic schema.

Clicking on any of the eleven vertebrate boxes in Fig. 1 will display the KOG functional categories for the single exon genes (SEGs) of the respective genome. From this output page (Fig. 2), the predicted protein sequences can be accessed in FASTA format by clicking on an entry in the “Number” column.

Figure 2. Screen shot example if the user explores “Human”via the phylogenetic schema.

A screen shot of protein sequences recovered using Human, KOG category “I” (Lipid transport and metabolism) as an entry (black arrow, Fig. 2) is shown in Fig. 3. A selected protein sequence can be interrogated by BlastP in the NCBI non-redundant database or in the in-house SinEx database of SEGs and also by using profile hidden Markov models (HMMs) in HMMR by clicking on the respective icon.

Figure 3. Screen shot example of the sequence output and ancillary interrogation tools if the user explores the KOG category “I” (Lipid transport and metabolism).

(2) Using a protein sequence in FASTA format as a query against a protein database.

Figure 4. In-house database available to be searched.

Searches against SinEx will return SEG sequences from different mammalian genomes within this database that have sequence similarity to the query.

(3) Advanced search.

Select one or more of the available genomes and then choose either “Search by protein name or protein ID” or by “KOG category” (Fig. 5). This will return predicted SEGs from the in-house SinEx database. The search by protein name is not case sensitive but is sensitive to incorrect spelling.

Figure 5. Screen shot of advanced search against the SinEx database.

(4) Download.

Nucleotide and protein sequences of SEGs and protein sequences of MEGs from ten mammalian genomes included in SinEx DB are downloadable in FASTA format. For more information of mammalian genome builds and current version of this database, please check the “README” file on download section.