Functional analysis of Prokaryotes using Gene Set Enrichment Analysis on Transcriptome (RNA-Seq) or Proteome data.
     Gene Set Enrichment Analysis (GSEA) is a method to identify classes of genes or proteins that are over-represented in a set of genes or proteins, respectively. Two data sets are needed to perform a GSEA; 1) A list/set of genes or proteins derived from a transcriptomics or proteomics analysis, 2) A database of functional classes.
     For all complete bacterial genomes of RefSeq and Genbank (>20.000) we (re-)constructed nine functional Classes; GO, InterPro, KEGG, COG, PFAM, SMART, Superfamily, KEYWORDS and OPERONS. GSEA-Pro support 4 types of input: 1) a single list of locus-tags, 2) a single list with values, 3) multiple experiments and 4) clusters. Use this powerpoint TUTORIAL to understand the four input options and how to interpretate the results. Some examples can be selected below.
Both RefSeq and Genbank annotations are supported. Optionally use Genome2D for conversion between Genbank (old_locus-tags) and RefSeq locus-tags
[ ..]

Select Genome: Start typing parts of the genome name until the selection list is less than 15 genomes. If one genome is in the list, the genome will be selected automatically. e.g. to find Bacillus subtilis 168, just type 2 keywords such as 168 subt and select your genome from the list
RefSeq: Genbank:

Paste Tab delimited Data Table (such as Excel) below

Analyse data as:

Load Example Data:

Auto detect cutoff values

User defined cutoff:
< values >

The results below are devided in 3 sections; 1) an Overview Table, 2) Main Table and 3) Interactive Bargraphs.

Bookmark your results here
Start a new session

GSEA-Pro Overview table The nine classes and the number of genes found in overrepresented classes of each experiment/cluster.

Main Table
     Each class table shows over-represented classID's and its significance in each experiment/cluster. The values given are; 1) Rating value (1-5) reflect binned values based on: (TopHits/ClassSize) * -log2(adj-pvalue), 2) first number within the brackets is the number of genes/proteins having the classID, 3) calculated p-value.
The light to dark blue coloring represents low to high rating/importance, repectivily.
The columns LC and HM contain links to a graphical representation of the original data of the genes/proteins
     Line Chart (LC): shows the orginal data as scattered line
     Heat Map (HM): represents the original data as a heat map

Default sorting is on CLASS and Rate of the first experiment/cluster. Hold the SHIFT-key to sort on multiple columns.

Interactive Bargraphs The 3 interactive bargraphs shows number of over-represented functional classes per experiment, classID's per experiment and genes/proteins per classID per experiment in Bargraphs 1,2 and 3, respectivily
Bargraph 1. Overrepresented Functional Classes in all Experiments/Clusters
Bargraph 3. One ClassID in all Experiments/Clusters
Bargraph 2. All ClassID's of a Class in one Experiment/Cluster
GeneSetTable. All genes of the selected ClassID's

Description of all COG categories

DCell cycle control, cell division, chromosome partitioning ARNA processing and modification CEnergy production and conversion RGeneral function prediction only
MCell wall/membrane/envelope biogenesis BChromatin structure and dynamics EAmino acid transport and metabolism SFunction unknown
NCell motility JTranslation, ribosomal structure and biogenesis FNucleotide transport and metabolism
OPost-translational modification, protein turnover, and chaperonesKTranscription GCarbohydrate transport and metabolism
TSignal transduction mechanisms LReplication, recombination and repair HCoenzyme transport and metabolism
UIntracellular trafficking, secretion, and vesicular transport ILipid transport and metabolism
VDefense mechanisms PInorganic ion transport and metabolism
WExtracellular structures QSecondary metabolites biosynthesis, transport, and catabolism
YNuclear structure