README.TXT

INSTUCTIONS TO USE LONGP
Lu Cheng
1.2.2018

**************************************************
We are currently working on a more comprehensive
documentation and instructions for the LonGP
package, and will upload them shortly.

**************************************************

Download the LonGP package here.

1. install GPstuff matlab version (>4.6)
2. move this folder "LonGP" to GPstuff directory ("GPstuffX.X")
3. start matlab and run "startup" in GPstuff
4. LonGP can be excuted now

%%%%  HOW TO RUN LONGP    %%%%%

Format of input (one need to proproc the original data into a .mat file containing the following variables)
X: matrix, every 2 columns corresponds to a covariate
   the first column is the flag vector, the second column is the actual value, use 0 for missing values
   continuous covariates are placed in the begining, categorical/binary covariates are place in the end, id must be the last covariate
y: target variable, column vector of the same size as input covariate, missing values set as "NaN"
nConVar: number of continuous covariate
nBinVar: number of categorical/binary covariate
varNames: names of covariates
kernelTypeArr: kernel type for each covariate, an integer vector, minus sign means no interaction is allowed for this covariate
fixedVarInds: covariate indexs that are fixed by user and not subjected to selection, "id" (index nConVar+nBinVar) must be include
resDir: directory to store the result


%%%%  HOW TO RUN PARA LONGP    %%%%%  

Format of input:
the same as lonGP except y, with two more variable
yMat: target matrix, nTarget x nSample, each row is a target
yInd: index of target
yInd=0: run the task manager
yInd=1~nTarget: run main worker for target yInd, results stored in "resDir/yInd"
yInd>nTarget: run slave yInd, working space in "resDir/yInd"


%%%%  HOW TO COLLECT LONGP RESULT    %%%%%
modify "collectResult.m" to collect the result, "tbl", the specified ".xlsx" file will contain the selected model, "varExplained.txt" contains the corresponding explained variance


%%%%  A SIMPLE EXAMPLE    %%%%%  

We demostrate running LonGP for a simulated dataset: y=f(age)+f(loc)+f(id)+f(age*id)+noise.

1. Download the example data , unzip and place it under the LonGP folder, type "cd('example')"
The file "data.xlsx" contains the original data, processed data and the target variable y.

2. Load the data "procX" and "y" into matlab using the "import data" icon in matlab,
OR create empty variables X and y, then click the variable and pasta the data

Create the following variables:
varNames = {'age','loc','id'};
kernelTypeArr=[1 3 4];
nBinVar=2;
nConVar=1;
outDir = 'tstout'

OR simply type "load('preprocData.mat')"

4. Run "lonGP(X, y, nConVar, nBinVar, kernelTypeArr, varNames, [3], outDir)",
it takes 1~2 hours since MCMC is computationally intensive

5. The results files are located in the new directory "tstout", where "finalResult.mat" is the final result.
The folder "tstout_backup" contains the example result.

6. Run the following commands to get the model and explained variance
% load results
load('tstout/finalResult.mat')

% selected model and explained variance
fprintf('selected model: %s.\n',model.description.long);
fprintf('explained variance: %s.\n',num2str(components.normEmpMagArr*100,'%.2f%% '));

7. Plot figures, type "showExampResult", you will find the figures for the shared components shown as follows
Note that plot is application specific and need to be manually implemented.
age component

loc component

id component