SURvival COmbined effect driven cancer MEchanism Discovery

 

 

 

 

 

 

 

Introduction

 

 

Survival analyses based on the Kaplan-Meier estimate have been pervasively used to support or validate the relevance of biological mechanisms in cancer research. Recently, with the appearance of gene expression high-throughput technologies, this kind of analysis has been applied to tumour transcriptomics data. In a Ôbottom-upÕ approach, gene-expression profiles that are associated with a deregulated pathway hypothetically involved in cancer progression are first identified and then subsequently correlated with a survival effect, which statistically supports or requires the rejection of such a hypothesis.

 

In this work we propose a 'top-down' approach, in which the clinical outcome (survival) is the starting point that guides the identification of deregulated biological mechanisms in cancer by a non-hypothesis-driven iterative survival analysis.

 

The method, named SURCOMED, was implemented as a web-based tool, which is publicly available at http://surcomed.vital-it.ch.

 

Download

 

Links:

SIB

Vital-IT

 

SURCOMED flow chart

 

 

Description: Macintosh HD:Users:isaaccrespo:Desktop:NPJ_SBA_submission:Fig.2.pdf

 

 

Figure 1. SURCOMED flow chart.  SURCOMED takes as input tumour transcriptomics data and the clinical information from the corresponding patients, in particular, the survival time. The output consists of biological processes, molecular mechanisms or pathways up- or down-regulated between groups of patients with long and short survival time. These groups of patients are defined by sets of marker genes identified by iterative survival analysis using an evolutionary algorithm. The iterative survival analysis can be described in 3 steps: 1) Generate combinations of marker genes. At the first iteration, the combinations are totally random; in posterior iterations, the generation of new combinations is based on the probability distribution of survival marker genes within the best combinations in the previous iteration; 2) Evaluate combinations of marker genes. This evaluation is based on the difference between the restricted mean survival time between the pro- and anti-survival groups; and 3) Select the best combinations of marker genes. Once the iterative survival analysis finishes, the resulting optimized combinations of marker genes are used to split the population of patients into pro- and anti-survival groups. A gene set enrichment analysis (GSEA) is subsequently applied in order to identify molecular mechanisms, biological processes or pathways for which their constituent genes exhibit concordant differences between pro- and anti- survival groups. This allows the identification of deregulated mechanisms between pro- and anti-survival patients.

 

 

Quick start

 

 

Description: Macintosh HD:Users:isaaccrespo:Desktop:SURCO_user_guide&scripts:SURCOMED_guide_figure1.pdf

Figure 1. Select the search space. The search space is the list of genes eligible for the combinations. They can be copied and pasted in the entry box or loaded from a file. For a genome-wide search, click the radio button with the label "genome". Each gene symbol should be specified in a different line.

 

Description: Macintosh HD:Users:isaaccrespo:Desktop:SURCO_user_guide&scripts:SURCOMED_guide_figure2.pdf

Figure 2. Add constraints. These constraints refer to gene states that will be avoided or included by the algorithm in every combination. Constraints can be manually added individually through the interface or loaded as a set from a file. In the latter case, the state can be either "high" or "low", and the syntax should be as follows: gene_name::state

Example:

TNFSF9::low

IL2::high

CER1::low

 

Description: Macintosh HD:Users:isaaccrespo:Desktop:SURCO_user_guide&scripts:SURCOMED_guide_figure3.pdf

Figure 3. Configure optimization. There are 5 optimization parameters that can be changed, namely, population size, selection number, elite size, iteration number, and combination size.

Population size. This parameter refers to the number of combinations that are going to be evaluated at each iteration of the algorithm. Large populations help to prevent the optimization from becoming trapped in a local optimum, but they are more time consuming.

Selection number.  This parameter refers to the number of combinations that are going to be selected from the population after the evaluation of these combinations in survival terms. We suggest to use a selection number of half the population size.

Elite size. This parameter refers to the number of the best historical combinations across iterations that will be directly transferred from one generation to the next to prevent loss of combinations with the best score. The elite number should be smaller than the selection number. We suggest an elite number not higher than half of the selection number in order to provide a certain freedom to the optimization process.

Iteration number. This parameter refers to the number of rounds of combinations that are going to be generated, evaluated, and selected by the algorithm. We suggest to start with some exploratory runs with 3-5 iterations, changing the population size before performing an actual analysis with 20 or more iterations. The greater the number of iterations, the longer it will take the algorithm to finish the search.

Combination size. This parameter refers to the number of gene states included in each combination. On the website, this number is limited to a maximum value of 5.

 

Description: Macintosh HD:Users:isaaccrespo:Desktop:SURCO_user_guide&scripts:SURCOMED_guide_figure4.pdf

Figure 4. Run the search. Once the optimization parameters are set up, the user only has to click on the 'Search' button and wait. We strongly recommend to complete the dedicated box with an email address, so the search will be running on our server and the user will receive a notification when the search is done with a link to the results. By doing so, the user can leave the website and follow the link later on to access the results.

Description: Macintosh HD:Users:isaaccrespo:Desktop:SURCO_user_guide&scripts:SURCOMED_guide_figure5.pdf

Figure 5. Identify the underlying biological mechanisms. Once the search is complete, the best combinations are displayed on the website with their corresponding survival curves. Just below the survival curves, there is a link to perform the Gene Set Enrichment Analysis (GSEA), or, alternatively, to generate and download the required files in order to run the GSEA locally with personalized settings.

 

 

Description: Macintosh HD:Users:isaaccrespo:Desktop:SURCO_user_guide&scripts:Figure6b.pdf

Figure 5. GSEA results main page. Once the GSEA is done, the results can be explored online using the Internet browser or may be downloaded (see Figure 5). The main result is the list of up- and down-regulated biological mechanisms, which can be visualized online in the browser or downloaded. Results from GSEA can be easily explored following the links from the main page, which can be accessed online or downloaded and visualized using your Internet browser. These results include the list of up- and down-regulated biological mechanisms associated with predefined gene sets, and other complementary information concerning the GSEA such as enrichment plots and statistical scores (p-values, normalized enrichment scores, or normalized enrichment score (NES)).

 

 

SURCOMED download

Download

 

 

 

 

 

 

 

This work is supported by Vital-IT-SIB and the UniversitŽ de Lausanne

Last modified: 2013-10-03