Functional analysis of Prokaryotes using Gene Set Enrichment Analysis on Transcriptome (RNA-Seq) or Proteome data.
     Gene Set Enrichment Analysis (GSEA) is a method to identify functional classes in sets of genes or proteins. This GSEApro platform can analyse and uncover biological functions in; i) results derived from differential genes/proteins expression analysis, ii) multiple experiments such as time-series, iii) clusters derived from e.g. k-means clustering and iv) modules found in gene networks.

     For all complete bacterial genomes of RefSeq and Genbank (>20.000) we (re-)constructed nine functional Classes; GO, InterPro, KEGG, COG, PFAM, SMART, Superfamily, KEYWORDS and OPERONS. GSEA-Pro support 4 types of input: i) a single list of locus-tags, ii) a single list with values, ii) multiple experiments and iv) modules or clusters. Use this powerpoint TUTORIAL to understand the four input options and how to interpretate the results.

For annotating and Classifying your own genomes you can use the FACoP webserver
Genome2D support conversion between Genbank (old_locus-tags) and RefSeq locus-tags
For converting protein IDs to locus-tags use the ID mapping tool
[anne.de.jong .. rug.nl]

Select Genome: Start typing parts of the genome name until the selection list is less than 15 genomes. If one genome is in the list, the genome will be selected automatically. e.g. to find Bacillus subtilis 168, just type 2 keywords such as 168 subt and select your genome from the list
RefSeq: Genbank: FACoP sessionID:

Paste Tab delimited Data Table (such as Excel) below

Analyse data as:



Load Example Data:




Auto detect cutoff values

User defined cutoff:
< values >

GSEA-Pro Overview table The nine classes and the number of genes found in overrepresented classes of each experiment/cluster.

Bookmark your results here

Start a new session



Main Table

Over-represented classIDs and its significance in each experiment/cluster
The values given per experiment; 1) Score [0-9], 2) Hits / Class Size, 3) p-value.
Light to dark blue coloring represents low to high score [based one (Hits/ClassSize) * -log2(p-value)], respectivily
Columns LC and HM contain links to a graphical representation of the original data of the genes/proteins. Note; only shown in multiple experiment analyses
LC (Line Chart) shows the orginal data as scattered line and HM Heat Map of the original data
Show;  Score   Hits/Class Size   p-value   Gene set
Hold the SHIFT-key to sort on multiple columns

Select the Class for Interactive Bar Graph


Description of all COG categories

CELLULAR PROCESSES AND SIGNALINGINFORMATION STORAGE AND PROCESSINGMETABOLISMPOORLY CHARACTERIZED
DCell cycle control, cell division, chromosome partitioning ARNA processing and modification CEnergy production and conversion RGeneral function prediction only
MCell wall/membrane/envelope biogenesis BChromatin structure and dynamics EAmino acid transport and metabolism SFunction unknown
NCell motility JTranslation, ribosomal structure and biogenesis FNucleotide transport and metabolism
OPost-translational modification, protein turnover, and chaperonesKTranscription GCarbohydrate transport and metabolism
TSignal transduction mechanisms LReplication, recombination and repair HCoenzyme transport and metabolism
UIntracellular trafficking, secretion, and vesicular transport ILipid transport and metabolism
VDefense mechanisms PInorganic ion transport and metabolism
WExtracellular structures QSecondary metabolites biosynthesis, transport, and catabolism
YNuclear structure
ZCytoskeleton