Combining time series expression datasets in pharmacogenomics studies


Simulated data results

Clustering comparison

Patient specific response

Pharmacogenomics and clinical studies that measure the temporal expression levels of patients can identify important pathways and biomarkers that are activated during disease progression or in response to treatment. However, researchers face a number of challenges when trying to combine expression profiles from these patients. Unlike studies that rely on lab animals or cell lines, individuals vary in their baseline expression and in their response rate. In this paper we present a generative model for such data. Our model represent patient expression data using two levels, a gene level which corresponds to a common response pattern and a patient level which accounts for the patient specific expression patterns and response rate. Using an EM algorithm we infer the parameters of the model. We used our algorithm to analyze multiple sclerosis patient response to Interferon-$\beta$. As we show, our algorithm was able to improve upon prior methods for combining patients data. In addition,our algorithm was able to correctly identify patient specific response patterns.

Top row: Three genes expressed in a similar way in all six patients according to the posterior computed by our algorithm. Bottom row: Three genes that were expressed similarly in five of the patients, but differently in the six. Time is on a log scale due to the sampling rate. Note that the average computed in the bottom row (dotted line) is affected by the outlier, while the consensus computed by our algorithm (blue line) is not.