daSAr - Software package for depdendency analysis of survival associated regions. The code is written in R programing langugae.

To cite the package:
A Faisal, R Louhimo, L Lahti, S Hautaniemi, S Kaski 
Biomarker discovery via dependency analysis of multi-view functional genomics data. 
In Proceedings of NIPS 2011 Workshop: From Statistical Genetics to Predictive Models in Personalized Medicine, Sierra Nevada, Spain 2011.

Contact: Ali Faisal (ali.faisal@aalto.fi)

This is experimental software provided as is; we welcome any comments and corrections but cannot give any guarantees about the code.

============= Depdendencies =================
In R the following packages must be installed:
1. biomart
2. pint (v 1.4.04 or later)
3. dmt

============= Running the code ===============
  - main.R is the main script to reproduce the results if you have access to TCGA glioblastoma dataset. Choose data-pair as either "cghexp" (i.e. copy number and gene expression) OR "mthynexp" (Methylation and gene expression) for the two cases.
  - loaddata.R: Loads and maps features to ensembl ids for CGH, MIRNA, GE and METHYLATION. Intermediate preprocessed files are saved as "data_ensembl.RData" in "data/preprocessed" folder.
  - analysis.R: The analysis pipeline in the original study used Anduril implementation of Kaplan–Meier (KM) estimator (to use that install Anduril from: http://www.anduril.org/anduril/site/), alternatively the KM analysis can also be performed in R, at the moment that part is not coded in the package. 

============= Preparing the data ==============
Inorder to re-run the analysis create folder data where you have saved the R scripts. Then create the following subdirectories within data folder
1. arrayCGH
Save the LOWESS normalized array CGH data where the lowest 0.02 quantile has been filtered out in this folder and rename the saved file as: copynumberMatrix.csv Also save corresponding annotation for the Matrix as: copynumberAnnotation.csv
2. gExpression
Save the gene expression data as gExpression.csv which has been normalized by TCGA. In the data probes mapping to multiple genes should be collapsed and combined to a single value using their samplewise median, and probe names need to be replaced with the corresponding Ensembl gene id.
3. methylation
Save the methylation data as rawMethylation.csv. This file shuold include probe names and four columns per sample. For each sample, the first column is the Beta value, the 2nd column the HGNC gene name corresponding to the probe, and the 3rd and 4th columns contain the probes genomic locus. Also create another file named "processed.csv", 
in this processed version of the methylation matrix, probe names should be replaced
with the respective HGNC gene names, the Beta values should be transformed into M-values, which are more normally distributed and all the extra columns for
each sample should be removed.

The transform2simcca.input function in loaddata.R does most of the above pre-processing. 

=============== Change Log ================
Version 1.1  20/06/11
Added the README file to re-run the analysis of Faisal et al., 2011 NIPS PM and Riku et al., 2011 CAMDA. 
