Inputs

The tool requires one of the following inputs: a CSV file containing a list of IF images to download, a TXT/CSV file with a list of proteins for which IF images will be downloaded, or a single path to a TSV file located in the CM4AI RO-Crate directory. It also requires path to file containing provenance information about input files in JSON format.

Below is the list and description of each input accepted by the tool.

  • samples.csv:

    CSV file with list of IF images to download. The file follow a specific format with columns such as filename, if_plate_id, position, sample, locations, antibody, ensembl_ids, and gene_names.

    Definition of columns:

    • filename - Filename of image (string)

    • if_plate_id - ID of plate for acquired image (int)

    • position - Position in plate for acquired image (string)

    • sample - Sample number identifier for acquired image (int)

    • locations - Comma delimited list of manual annotations for image (string)

    • antibody - Name of antibody used for acquired image (string)

    • ensembl_ids - Comma delimited list of Ensembl IDs (string)

    • gene_names - Comma delimited list of genes (string)

Example:

filename,if_plate_id,position,sample,status,locations,antibody,ensembl_ids,gene_names
/archive/7/7_C5_1_,7,C5,1,35,"Cytosol,Nuclear speckles",HPA005910,ENSG00000011007,ELOA
/archive/7/7_C5_2_,7,C5,2,35,"Cytosol,Nuclear speckles",HPA005910,ENSG00000011007,ELOA
/archive/7/7_E8_1_,7,E8,1,35,Nuclear speckles,HPA006628,ENSG00000239306,RBM14
/archive/7/7_E8_2_,7,E8,2,35,Nuclear speckles,HPA006628,ENSG00000239306,RBM14
  • proteins.txt:

    List of proteins for which HPA images will be downloaded. Each protein in new line.

Example:

ELOA
RBM14
SRSF11
MCM3
APEX1
  • CM4AI_TABLE_PATH:

    Path to TSV file in CM4AI RO-Crate directory. It is expected the directory also contains red/ blue/ green/ yellow/ directories with images.

    The .tsv file describes each image in the data set. Each row represents one image. The columns describe the staining from which the image was taken. The TSV file is expected to have the following columns:

    • Antibody ID - describes the antibody ID for the antibody applied to stain the protein visible in the “green” channel. The antibody ID can be looked up at proteinatlas.org to find out more information about the antibody.

    • ENSEMBL ID - indicates the ENSEMBL ID(s) of the gene(s) of the proteins visualized in the “green” channel.

    • Treatment - refers to how the cells that are depicted in the image were treated (with Paclitaxel, Vorinostat, or untreated)

    • Well - refers to the well coordinate on the 96-well plate

    • Region - is a unique identifier for the position in the well, where the cells were acquired

Example:

Antibody ID ENSEMBL ID      Treatment       Well    Region
CAB079904   ENSG00000187555 untreated       C1      R1
CAB079904   ENSG00000187555 untreated       C1      R2
CAB079904   ENSG00000187555 untreated       C1      R3
CAB079904   ENSG00000187555 untreated       C1      R5
  • provenance.json:

    Path to file containing provenance information about input files in JSON format. This is required and not including will output error message with example of file.

Example:

{
  "name": "Example input dataset",
  "organization-name": "CM4AI",
  "project-name": "Example",
  "edgelist": {
    "name": "sample edgelist",
    "author": "Krogan Lab",
    "version": "1.0",
    "date-published": "07-31-2023",
    "description": "AP-MS Protein interactions on HSC2 cell line, example dataset",
    "data-format": "tsv"
  },
  "baitlist": {
    "name": "sample baitlist",
    "author": "Krogan Lab",
    "version": "1.0",
    "date-published": "07-31-2023",
    "description": "AP-MS Baits used for Protein interactions on HSC2 cell line",
    "data-format": "tsv"
  },
  "samples": {
    "name": "u2os HPA IF images",
    "author": "Author of dataset",
    "version": "Version of dataset",
    "date-published": "Date dataset was published",
    "description": "Description of dataset",
    "data-format": "csv"
  },
  "unique": {
    "name": "u2os HPA IF images unique",
    "author": "Author of dataset",
    "version": "Version of dataset",
    "date-published": "Date dataset was published",
    "description": "Description of dataset",
    "data-format": "csv"
  }
}