|
Introduction
|
|
Survival analyses based on the
Kaplan-Meier estimate have been pervasively used to support or validate the
relevance of biological mechanisms in cancer research. Recently, with the
appearance of gene expression high-throughput
technologies, this kind of analysis has been applied to tumour transcriptomics data.
In a Ôbottom-upÕ approach, gene-expression profiles that are associated
with a deregulated pathway hypothetically involved in cancer progression
are first identified and then subsequently correlated with a survival
effect, which statistically supports or requires the rejection of such a
hypothesis.
In this work we propose a 'top-down'
approach, in which the clinical outcome (survival) is the starting point
that guides the identification of deregulated biological mechanisms in
cancer by a non-hypothesis-driven iterative survival analysis.
The method, named SURCOMED, was
implemented as a web-based tool, which is publicly available at
http://surcomed.vital-it.ch.
|
Download
Links:
SIB
Vital-IT
|
SURCOMED
flow chart
|
|
|
|
|
|
Figure 1. SURCOMED flow chart. SURCOMED takes as
input tumour transcriptomics
data and the clinical information from the corresponding patients, in
particular, the survival time. The output consists of
biological processes, molecular mechanisms or pathways up- or
down-regulated between groups of patients with long and short survival
time. These groups of patients are defined by sets of marker genes
identified by iterative survival analysis using an evolutionary algorithm.
The iterative survival analysis can be described in 3 steps: 1) Generate
combinations of marker genes. At the first iteration, the combinations are
totally random; in posterior iterations, the generation of new combinations
is based on the probability distribution of survival marker genes within
the best combinations in the previous iteration; 2) Evaluate combinations
of marker genes. This evaluation is based on the difference between the
restricted mean survival time between the pro- and anti-survival groups;
and 3) Select the best combinations of marker genes. Once the iterative
survival analysis finishes, the resulting optimized combinations of marker
genes are used to split the population of patients into pro- and
anti-survival groups. A gene set enrichment analysis (GSEA) is subsequently
applied in order to identify molecular mechanisms, biological processes or
pathways for which their constituent genes exhibit concordant differences
between pro- and anti- survival groups. This allows the identification of
deregulated mechanisms between pro- and anti-survival patients.
|
|
|
Quick
start
|
|
|
|
Figure 1. Select the search space. The search space is the
list of genes eligible for the combinations. They can be copied and
pasted in the entry box or loaded from a file. For a genome-wide search,
click the radio button with the label "genome". Each gene
symbol should be specified in a different line.
|
|
Figure 2. Add constraints. These constraints refer to gene
states that will be avoided or included by the algorithm in every
combination. Constraints can be manually added individually through the
interface or loaded as a set from a file. In the latter case, the state
can be either "high" or "low", and the syntax should
be as follows: gene_name::state
Example:
TNFSF9::low
IL2::high
CER1::low
|
|
Figure 3. Configure optimization. There are 5 optimization
parameters that can be changed, namely, population size, selection
number, elite size, iteration number, and combination size.
Population size. This parameter refers to the number of
combinations that are going to be evaluated at each
iteration of the algorithm. Large populations help to prevent the
optimization from becoming trapped in a local optimum, but they are more
time consuming.
Selection number.
This parameter refers to the number of combinations that are going
to be selected from the population after the evaluation of these
combinations in survival terms. We suggest to use
a selection number of half the population size.
Elite size. This parameter refers to the number of the best
historical combinations across iterations that will be directly
transferred from one generation to the next to prevent loss of
combinations with the best score. The elite number should be smaller than
the selection number. We suggest an elite number not higher than half of
the selection number in order to provide a certain freedom to the
optimization process.
Iteration number. This parameter refers to the number of
rounds of combinations that are going to be generated, evaluated, and
selected by the algorithm. We suggest to start
with some exploratory runs with 3-5 iterations, changing the population
size before performing an actual analysis with 20 or more iterations. The
greater the number of iterations, the longer it will take the algorithm
to finish the search.
Combination size. This parameter refers to the number of gene
states included in each combination. On the website, this number is
limited to a maximum value of 5.
|
|
Figure 4. Run the search. Once the optimization parameters
are set up, the user only has to click on the 'Search' button and wait.
We strongly recommend to complete the dedicated
box with an email address, so the search will be running on our server
and the user will receive a notification when the search is done with a
link to the results. By doing so, the user can leave the website and
follow the link later on to access the results.
|
|
Figure 5. Identify the underlying biological mechanisms. Once
the search is complete, the best combinations are displayed on the
website with their corresponding survival curves. Just below the
survival curves, there is a link to perform the Gene Set Enrichment
Analysis (GSEA), or, alternatively, to generate and download the
required files in order to run the GSEA locally with personalized
settings.
|
|
|
|
|
|
Figure 5. GSEA
results main page. Once the GSEA is done, the results can be explored
online using the Internet browser or may be downloaded (see Figure 5).
The main result is the list of up- and down-regulated biological mechanisms,
which can be visualized online in the browser or downloaded. Results from
GSEA can be easily explored following the links from the main page, which
can be accessed online or downloaded and visualized using your Internet
browser. These results include the list of up- and down-regulated
biological mechanisms associated with predefined gene sets, and other
complementary information concerning the GSEA such as enrichment plots
and statistical scores (p-values, normalized enrichment scores, or normalized
enrichment score (NES)).
|
SURCOMED
download
Download
|
|
|
|
|