Friday, December 11, 2015

Combining large PGx datasets from cancer cell lines


Testing cancer cell lines in vitro for drug sensitivity is a cornerstone of preclinical drug development. Large publically available datasets can be found at The Genomics of Drug Sensitivity in Cancer Project (GDSE) and The Cancer Cell Line Encyclopedia (CCLE).

Studies attempting to combine large public datasets and analyzing for correlation questioned the reliability of the data due to limited concordance, reported in [PMID: 24284626], discussed in [PMID:24284624] and a confirmation study here.

A new report in Nature describes different methods to analyze the data from CCLE and GDSE and concludes that “data from either study yields similar predictors of drug response” [PMID:26570998].

These papers demonstrate the continuing difficulty trying to compare across large datasets. Such problems include comparing different experimental protocols and measurements for drug sensitivity across studies, trouble matching the drug and cell line names to ensure like comparison, discordance in the genotyping data, and drugs that had few examples of cell lines that were drug sensitive.  As always, attention to detail in the documentation and description of the experiments can help mitigate some of these difficulties. While development of standard testing protocols and data curation and reporting frameworks may lead to better validation of drug response predictors going forward there will always be the need for methods to filter the noise that is inevitable in large datasets.

No comments: