literature and homology reporter

Help

GeneReporter - a literature and homology reporter

This website uses Web Services from the EBI and NCBI. We would like to thank the service providers for the opportunity to share their facilities.

Frequently Asked Questions

  1. My job is running for ages now! Why that?
  2. Due to the usage of external web services, the processing time of a job depends strongly on the response time of these services. In general these are reasonable fast, but exceptions may happen. Look at the overview page to preview your results before the job has completely finished. Look at the 'Status' line to see which service is currenty running.

  3. I only find a few sequences that are distantly related with my input. How can I improve my hits?
  4. Select UniProt instead of SwissProt for the BLAST search. This may return many sequences with few annotations but it will definitly show more homologous sequences. You may also try PSI-BLAST to find more distantly related sequences.

  5. I previewed my results, the job is still running and the overview page does not proceed. Whats wrong?
  6. Probably nothing. Reload the overview page with the Reload button of your browser.

  7. I still could not solve my problem. How can I contact you?
  8. For any questions or problems with GeneReporter, please to not hesitate to contact Anne Bartsch.

Introduction

GeneReporter is a tool to create a report on a DNA or protein sequence. It is a WebService-based application for homology-based document retrieval and sequence analysis. It combines a BLAST search in UniProtKB with a subsequent PubMed search, making use of the names and synonyms of genes that were identified as homologous sequences. Thus, GeneReporter detects literature about proteins that are related to the query protein but might have other names. In addition, the sequence can be analysed by detection of signature matches in the InterPro database. Signal peptide and transmembrane regions are predicted by the PrediSi and Phobius WebServices.

Input is a DNA or protein sequence of interest. A maximum of 10 sequences in FASTA format is accepted in one query. The maximum sequence length is 5000 amino acids or 15000 nucleotides. Paste your sequence(s) into the input field and don't forget to adjust the type of sequence (DNA or protein).

As result you get a table of homologs, referring literature and sequence annotation. This 'report' can be downloaded as Microsoft Excel or CSV text file.

FASTA-format

The FASTA format is a text-based sequence format with the following properties: The first line is the "header" of the sequence. It starts with a ">" symbol and afterwords you can find the description of the sequence with maximum 80 letters. The following lines contain the amino acid or nucleotide sequence in one-letter code.

Homology-based document retrieval

GeneReporter searches for homologous sequences and extracts gene names and synonyms of these homologs. Names and synonyms are used for a subsequent literature search in the PubMed database. Thus, GeneReporter searches for literature about homologous sequences in the PubMed database.

BLAST searches

The search for homologous proteins is done by the Basic Local Alignment Search Tool (BLAST) that finds regions of local similarity between sequences. The BLAST search is executed either in the complete UniProtKB or the Swiss-Prot database. You can select up to 3 different BLAST services and choose the desired database. In addition, you can adjust the parameters E-value (exptected threshold value for statistical significance), minimal bit score of the alignment and the maximum number of passes (iterations) for the PSI-BLAST algorithm.
To match reverse complement or out-of-frame nucleotide sequences, select NCBI-BLAST with the blastx program option.

  • NCBI-BLAST ist a standard protein BLAST that was developed at the National Center of Biotechnology Information.
  • WU-BLAST is a modified NCBI-BLAST that was developed at the Washington University. It can find sequences of similarity more quickly, with minimum loss of sensitivity.
  • PSI-BLAST ist a Position-Specific Iterative BLAST for more sensitive protein-protein similarity searches. In PSI-BLAST a profile is automatically constructed from the first set of BLAST alignments and uses position-specific scoring matrices derived during the search. It refers to a feature of BLAST2 and is used to detect distant evolutionary relationships.

PubMed search

PubMed is a free search engine that provides access to the MEDLINE dabase of citations. It is developed and maintained by the National Center for Biotechnology Information (NCBI) and the U.S. National Library of Medicine (NLM), located at the National Institutes of Health (NIH).

GeneReporter uses names of homologous sequences for a search in PubMed. Additional query terms can be linked with 'AND' or 'NOT'. With the option "organism specific search" the name of the organism of the homologous sequence is added to the query. Furthermore, the years of publication can be limited and the number of references displayed can be rised up to 50 references per PubMed query.

Analysis of the protein sequences

For the analysis of the protein sequence you can use InterProScan to detect a signature of a given protein family, domain or functional site. Phobius and PrediSi are both programs to predict transmembrane topology and signal peptides from the amino acid sequence of a protein.

InterProScan

InterProScan integrates applications to search for protein families, domains, regions and sites. The applications use different methodologies and a varying degree of biological information on well-characterised proteins to derive protein signatures.

For using InterProScan you can choose the following applications:

BlastProDom

BlastProDom performs a BLAST search against ProDom, a protein domain family database.

FPrintScan

FPrintScan searches for the closest matching protein fingerprints in PRINTS, a compendium of protein fingerprints.

Gene3D

Gene3D is a program for studying proteins and the component domains. The database contains descriptions of protein families and domain architectures in complete genomes. Gene3D takes the HMM's derived from CATH families and afterwords it scans them against various sequence databases.

HMMPanther

HMMPanther searches against PANTHER, a database that classifies genes by their functions, using published scientific experimental evidence and evolutionary relationships to predict function even in the absence of direct experimental evidence.

HMMPfam

HMMPfam searches against the Pfam database, a large collection of protein families, each represented by multiple sequence alignments and hidden Markov models (HMMs).

HMMPIR

HMMPIR searches against the PIR SuperFamily (PIRSF), a classification system based on evolutionary relationship of whole proteins.

HMMSmart

HMMSMART searches against SMART, a Simple Modular Architecture Research Tool. It allows the identification and annotation of genetically mobile domains and the analysis of domain architectures.

HMMTigr

HMMTigr searches against TIGRFAMs, a collection of protein families, featuring curated multiple sequence alignments, hidden Markov models (HMMs) and annotation, which provides a tool for identifying functionally related proteins based on sequence homology.

ProfileScan

ProfileScan is an application of generalized profiles. It uses a very sensitive method for the discovery of distant sequence relationships.

PatternScan

PatternScan is a search against PROSITE, which consists of documentation entries describing protein domains, families and functional sites as well as associated patterns and profiles to identify them.

SignalPHMM

SignalPHMM predicts the presence and location of signal peptide cleavage sites in amino acid sequences from different organisms (Gram-positive prokaryotes, Gram-negative prokaryotes, and eukaryotes).

SuperFamily

Superfamily serches for structural and functional protein annotations for all completely sequenced organisms saved in the Superfamily database.

TMHMM

TMHMM predicts transmembrane helices in proteins.

Phobius

Phobius predicts transmembrane topology and signal peptides from the amino acid sequence of a protein.

PrediSi

PrediSi (PREDIction of SIgnal peptides) is a tool for predicting signal peptide sequences and their cleavage positions in bacterial and eukaryotic proteins. The calculation is performed in real time with high accuracy and uses a position weight matrix approach, which is improved by a frequency correction that takes the amino acid bias present in proteins in consideration.

For using PrediSi you have to declare the type of input sequence (Gram-positive prokaryotes, Gram-negative prokaryotes, and eukaryotes).