Download the user guide in PDF format here.
Although the role of microarray statistical analysis software is important, the extraction of differentially expressed gene lists is a primary step and most times away from acquiring a real insight on the biological subject interrogated and the molecular mechanisms underlying it. The next logical step is to bundle these genes with annotated information in various databases concerning their functional role, in order to highlight both statistically significant and biologically relevant genes which characterize or distinguish the biological subject interrogated. Functional analysis steps usually include pathway analysis to uncover genes with a certain expression profile that share the same pathway, exploration for common regulatory elements among groups of genes and gene functional analysis based on biological databases or ontologies. The Gene Ontology (GO) database provides such functional annotation in a hierarchical way constituting a valuable tool for microarray experiment meta-analysis. In addition, the Kyoto Encyclopedia of Genes and Genomes biological pathway database comprises a well structured and constantly enriched library of molecular networks which has been widely used as a reference for biological interpretation of large-scale datasets generated by sequencing and other high-throughput experimental technologies.
StRAnGER is a web-based application, which performs functional analysis of high-throughput genomic datasets, starting from a list of significant genes derived from statistical and empirical thresholds, by utilizing the GO database and the KEGG pathway database as well as established statistical methods in order to relate the identified significant genes with important nodes in the GO tree structure or map those genes to over-represented metabolic pathways. In this way, cellular actions are seen as conceptual entities that are mapped as nodes to a hierarchical organizational schemas such as as the GO tree structure, where all functional annotations stem from the root nodes of molecular function, cellular component or biological process. Aim of the application is to suggest whole molecular pathways or parts of them, incorporating a number of significantly differentially expressed genes of the list, rather than isolated genes whose measurements are more susceptible to systematic or random errors, as interesting targets for further biological research. Regarding GO analysis, the rationale supporting StRAnGER is the exploitation of an essential property of the GO terms tree structure and subsequently of the population of GO terms (GOTs) derived by each significant gene list; many genes which are hierarchically lower (descendants) in the context of several biological functions are represented as ‘leaves’ in the GO tree structure but are connected to hierarchically higher biological entities through the same tree structure, and as a result inherit these GOTs too. The main goal of StRAnGER is to sort out among all the GOTs associated with the significant gene list, those associated with nodes higher in the GO hierarchy, which consequently encompass a number of genes that act on a specific biochemical pathway, and rank them according to their statistical significance, following their p-value score as derived from a suitable over-representation test. In this sense the result of the inference of noise on high-throughput genomic experiments is significantly mitigated, thus enabling the targeting of specific biological objects for further investigation.
Data importThe first page of the application is used for data import to be used in the subsequent analysis. The following sections depict the procedure and the capabilities of the application.
Gene listStRAnGER’s basic input is a text tab-delimited file with only unique gene identifiers corresponding to the microarray platform used (e.g. Affymetrix Human DNA chip 133A) or to the public database that is chosen to use genome background from (e.g. Ensembl). An additional column with p-values corresponding to each gene as well as any additional data columns containing e.g. expression values for each gene is optional and can be appended to the output. Alternative gene identifiers that are supported are HUGO gene names, Genbank and Entrez IDs. In case of file uploading, wizard allows the user to specify the columns containing information required for the subsequent analysis.
BackgroundStRAnGER offers a variety of sources and organisms to be used for the generation of the background dataset, including Bioconductor array annotation packages or Ensembl genes on various organisms. The user can also upload own annotation file including the minimum information required for StRAnGER analysis, in text tab-delimited format. In this case, a wizard allows the user to specify the columns containing information required for the subsequent analysis, otherwise, StRAnGER continues automatically.
Gene ListThe user should use the lists GeneID and p-value (optional) to select the corresponding columns in the gene list file uploaded for analysis. The GeneID is very important and should correspond to the selection of the previous page (ProbeID, Gene Symbol, Entrez ID or Genbank accession). If the selection does not correspond to the declaration of the previous page, StRAnGER will not run! The same applies if the user has pasted a gene list instead of uploading a file. Additionally, StRAnGER allows the user to append to the output any additional column contained in the uploaded file by checking the corresponding checkboxes.
Background ListIn case of uploaded background, the user should select the columns corresponding to the elements on the left, GeneID (should be of same type with GeneID in gene list!), Gene Name (Symbol), Gene Description and the column with Ontological terms (GO or KEGG). In the case of user uploaded background file, the column with GO terms should contain the terms in the GO format, that is GO:XXXXXXX. It does not mind if the column contains other element among the terms (e.g. descriptions) as StRAnGER will parse only the GO terms. The same applies to KEGG pathways which should be in the form YYYYY where Y is a number. KEGG pathway IDs should NOT have as prefix the organism acronym (e.g. 00640àcorrect, mmu00640àwrong).
Selecting the analysis parameters
Analysis parametersThe third (or second if user analyzes stored experimental results) page of the application allows the user to specify the statistical analysis parameters for StRAnGER as well as options regarding the type and the graphical representation of the output results.
Over-representation testStRAnGER currently supports three statistical tests for the identification of enriched ontological terms, given a list of selected genes and the appropriate background.
- The hypergeometric test, where the probability for a term Ti to be over-represented is given by the formula:
- The Fisher’s exact test where the probability for a term Ti to be over-represented is given by the formula (directly related to the hypergeometric distribution):
- The χ2 (chi-square) test, where the χ2 statistic for a term Ti is calculated firstly as follows: let
Then the χ2 statistic is given by
And the probability for Ti to be over-represented is given by (assuming w=χ2 and one degree of freedom for the χ2 cumulative distribution function)
where γ denotes the lower incomplete Gamma function and Γ denotes the Gamma function.
p-value cutoffThe p-value statistical threshold for the detection of over-represented ontological terms.
Number of iterationsThe number of bootstrap iterations that StRAnGER will perform in order to derive the robust cutoff for the distribution of enrichment elements.
Cutoff percentage (%)The percentile threshold of the enrichment elements distribution that defines the acceptable cutoff for significant terms.
BootstrapWhich distribution should the application bootstrap. Possible options are “Terms” for bootstrapping ontological terms or “Elements” for bootstrapping the enrichment elements distribution (default StRAnGER algorithm).
Run analysis onPossible options are GO terms and KEGG pathways.
Output optionsThis section describes the possibilities regarding the output types of StRAnGER as well as the graphical output.
OutputOne of “No visualization” for not constructing a graphical output (for Gene Ontology analysis) or PDF, PNG or SVG for the corresponding graphical outputs, as PDF document, or PNG/SVG images.
Node shapeThe shape of the nodes of the output GO tree.
Node outline colorThe outline color of each node in the output GO tree. It can be one of the colors in the list.
Node fill colorThe fill color of each node in the output GO tree. It can be one of the colors in the list.
Top scoring terms in treeHow many of the top statistical GO terms include in the output tree. If a large number is combined with a high ancestor level, the tree might become too complex in terms of display.
Ancestor levelHow many levels up in the GO hierarchy should each output GO term be connected to. If a large number is combined with a large number of top scoring terms, the tree might become too complex in terms of display.
OutputThis option should be used to determine the desired output formats of the analysis results. Possible choices are “All” for a complete output incorporating significant terms with their statistics, the genes that are found below each term and any additional information regarding the genes, “Only stats” for an output containing only the significant terms with and some summary statistics and “Only terms” for a single column output containing only the significant terms. In addition, the results can be displayed in HTML format: in case of GO based analysis, links to AMIGO are provided for GO terms and the genes under these GO terms are linked to the GeneCards database. In the case of KEGG based analysis, links to the KEGG PATHWAY database are provided for the significant pathways and the genes mapped to these pathways are linked to the GeneCards database. Genes mapped on the significant pathways can be colored but the time required due to the slow speed of the KEGG web service can be substantial, so in this case, it is suggested to get the results via mail instead of direct display. Two types for results retrieval are provided. The user can either directly download the results upon completion, or provide an e-mail account so that he can receive the results.
Using the CERVis toolThe CERVis tool combines two or more outputs of StRAnGER (for GO analysis only) to produce one graph which allows the use for example to combine or compare significant GO terms in an experimental design consisting of two doses of a specific drug administration. The user should select the graphical outputs and the amount of files that will be supplied. The next page prompts the user to upload the files.
A small tutorialThis section presents a small tutorial on StRAnGER very basic usage. For any question regarding further usage, difficulties or suggestions please contact Panagiotis Moulos (firstname.lastname@example.org). Firstly, download the test dataset from here. Unzip the contents of the zip file and upload the file “GeneList.txt” and “BackgroundList.txt” using StRAnGER’s interface and the corresponding fields. Optionally you can name your project (e.g. “Test”). Then, hit “Next”. Do NOT upload the test_dataset.zip file directly!
- When uploading or pasting a list of genes or probes in the respective list in the first page of the application and you choose to use one of the StRAnGER's database platforms as background, please be sure to select the correct one else there will be a mismatch in the IDs that you supply and those of the background resulting in execution failure.
- When uploading or pasting a list of genes or probes in the respective list in the first page of the application and you choose to use one of the StRAnGER's database platforms as background instead of your own background file, please be sure to specify the correct gene ID type (e.g. if the pasted IDs are HUGO gene symbols and the selected ID type is GenBank or ProbeID the execution will result in a failure).
- When uploading your own background, please be sure on its format (see example on the "Download a test dataset" link in the Application page). Only a column with Gene IDs will NOT work as there are no corresponding GO terms. If you are not sure about your background, use one of the StRAnGER's database or a generic one such as Ensembl for the corresponding organism. Also be sure about the type of the Gene ID that you specify as primary in the second page of the application. If these don't match, the execution will result in a failure.
- Do not paste/upload a very small list of genes. ANY enrcihment analysis program requires a bigger gene list that are representative of the underlying functions/pathways. Execution with such small lists will most probably result in failure as no over-represented GO terms/KEGG pathways will be found. If you are looking for the corresponding GO term of 2-3 genes, use other tools such as AMiGO.
- StRAnGER works only with gene IDs. Not other types like sequence (FASTA) files.
- Files to be uploaded MUST be tab delimited. No space, comma or other delimiters. We will soon add a more flexible uploader where users may specify the delimiter.
- Gene to be pasted should be in column format, not row.