Phenotypic profiling attempts to summarize multiparametric, feature-based analysis of cellular phenotypes of each sample so that similarities between profiles reflect similarities between samples. Profiling is well established for biological readouts such as transcript expression and proteomics. Image-based profiling, however, is still an emerging technology.
This image set provides a basis for testing image-based profiling methods wrt. to their ability to predict the mechanisms of action of a compendium of drugs. The image set was collected using a typical set of morphological labels and uses a physiologically relevant p53-wildtype breast-cancer model system (MCF-7) and a mechanistically distinct set of targeted and cancer-relevant cytotoxic compounds that induces a broad range of gross and subtle phenotypes.
Images
The images are of MCF-7 breast cancer cells treated for 24 h with a collection of 113 small molecules at eight concentrations. The cells were fixed, labeled for DNA, F-actin, and Β-tubulin, and imaged by fluorescent microscopy as described [Caie et al. Molecular Cancer Therapeutics, 2010].
There are 39,600 image files (13,200 fields of view imaged in three channels) in TIFF format. We provide the images in 55 ZIP archives, one for each microtiter plate. The archives are ~750 MB each.
A subset of the compound-concentrations have been identified as clearly having one of 12 different primary mechanisms of action. Mechanistic classes were selected so as to represent a wide cross-section of cellular morphological phenotypes. The differences between phenotypes in some cases were very subtle: we identified 6 of the 12 mechanisms visually (Actin disruptors, Aurora kinase inhibitors, Eg5 inhibitors, Microtubule destabilizers, Microtubule stabilizers, and Epithelial); the remainder were defined based on the literature.
All compounds were tested at eight doses. The top concentration was different for many of the compounds and was carefully selected from the literature. Not all concentrations are available in this dataset. Missing concentrations are due to one of three factors:
The dose was determined to be inactive. Activity was defined by setting a threshold on the Mahalanobis distance from the set of DMSO profiles: profiles of doses that were outside this threshold were considered active. The feature space corresponded to measurements extracted by a proprietary software tool used at AstraZeneca.
The dose was determined to be overly toxic, i.e., the images had no cells or very few cells.
The images did not pass QC, that is, they were either out of focus wells or contained image artifacts.
Metadata
The file BBBC021_v1_image.csv contains the metadata, with the following fields:
A subset of the compound-concentrations have been identified as clearly having one of 12 different primary mechanims of action. mechanistic classes were selected so as to represent a wide cross-section of cellular morphological phenotypes. The differences between phenotypes were in some cases very subtle: we were only able to identify 6 of the 12 mechanisms visually; the remainder were defined based on the literature.
The file BBBC021_v1_moa.csv contains the mechanisms of action of 103 compound-concentrations (38 compounds at 1–7 concentrations each). The fields are:
compound
concentration
moa
Example rows from the file:
compound
concentration
moa
PP-2
3.000000
Epithelial
emetine
0.300000
Protein synthesis
AZ258
1.000000
Aurora kinase inhibitors
NOTE: When evaluating accuracy of MOA classification, it is critical to ensure that the cross-validation is set up correctly. MOA classification is the task of classifying the MOA of an unseen compound. Therefore, the evaluation should be a leave-one-compound-out cross validation: in each iteration, hold out one compound (all replicates and at all concentrations), train on the remaining, and test on the held out compound.
The prediction of Mechanism-of-Action was performed with restrictions on the possible match. Not-Same-Compound (NSC) does not allow a match to the same compound. Not-Same-Compound-or-Batch (NSCB) does not allow a match to the same compound or any compound on the same batch. Evaluations were performed at level of individual wells (Per-Well) as well as that of individual treatments, where replicate wells were averaged to create a profile (Per-Treatment).