Functional analysis of Prokaryotes using Gene Set Enrichment Analysis on Transcriptome (RNA-Seq) or Proteome data.
     Gene Set Enrichment Analysis (GSEA) is a method to identify classes of genes or proteins that are over-represented in a set of genes or proteins, respectively. Two data sets are needed to perform a GSEA; 1) A list/set of genes or proteins derived from a transcriptomics or proteomics analysis, 2) A database of functional classes.
     For all complete bacterial genomes of RefSeq and Genbank (>20.000) we (re-)constructed nine functional Classes; GO, InterPro, KEGG, COG, PFAM, SMART, Superfamily, KEYWORDS and OPERONS. GSEA-Pro support 4 types of input: 1) a single list of locus-tags, 2) a single list with values, 3) multiple experiments and 4) clusters. Use this powerpoint TUTORIAL to understand the four input options and how to interpretate the results. Some examples can be selected below.
Both RefSeq and Genbank annotations are supported. Optionally use Genome2D for conversion between Genbank (old_locus-tags) and RefSeq locus-tags

For annotating and Classifying your own genomes you can use the FACoP webserver (or stand-alone)
[anne.de.jong .. rug.nl]

Select Genome: Start typing parts of the genome name until the selection list is less than 15 genomes. If one genome is in the list, the genome will be selected automatically. e.g. to find Bacillus subtilis 168, just type 2 keywords such as 168 subt and select your genome from the list
RefSeq: Genbank: FACoP sessionID:

Paste Tab delimited Data Table (such as Excel) below

Analyse data as:



Load Example Data:




Auto detect cutoff values

User defined cutoff:
< values >

GSEA-Pro Overview table The nine classes and the number of genes found in overrepresented classes of each experiment/cluster.

Bookmark your results here

Start a new session



Main Table. Over-represented classIDs and its significance in each experiment/cluster
The values given per experiment; 1) Score [0-9], 2) Hits / Class Size, 3) p-value.
Light to dark blue coloring represents low to high score [based one (Hits/ClassSize) * -log2(p-value)], respectivily
Columns LC and HM contain links to a graphical representation of the original data of the genes/proteins. Note; only shown in multiple experiment analyses
LC (Line Chart) shows the orginal data as scattered line and HM Heat Map of the original data
Show;  Score   Hits/Class Size   p-value   Gene set
Hold the SHIFT-key to sort on multiple columns

Interactive Bar Graph(s)
GeneSetTable

Description of all COG categories

CELLULAR PROCESSES AND SIGNALINGINFORMATION STORAGE AND PROCESSINGMETABOLISMPOORLY CHARACTERIZED
DCell cycle control, cell division, chromosome partitioning ARNA processing and modification CEnergy production and conversion RGeneral function prediction only
MCell wall/membrane/envelope biogenesis BChromatin structure and dynamics EAmino acid transport and metabolism SFunction unknown
NCell motility JTranslation, ribosomal structure and biogenesis FNucleotide transport and metabolism
OPost-translational modification, protein turnover, and chaperonesKTranscription GCarbohydrate transport and metabolism
TSignal transduction mechanisms LReplication, recombination and repair HCoenzyme transport and metabolism
UIntracellular trafficking, secretion, and vesicular transport ILipid transport and metabolism
VDefense mechanisms PInorganic ion transport and metabolism
WExtracellular structures QSecondary metabolites biosynthesis, transport, and catabolism
YNuclear structure
ZCytoskeleton