Usage

This script facilitates the downloading of ImmunoFluorescent (IF) labeled images from the Human Protein Atlas (HPA). The tool requires an output directory to write results to and either a TSV file in CM4AI RO-Crate format, or CSV file with list of IF images to download and CSV file of unique samples.

In a project

To use cellmaps_imagedownloader in a project:

import cellmaps_imagedownloader

On the command line

For information invoke cellmaps_imagedownloadercmd.py -h

Usage

cellmaps_imagedownloadercmd.py OUTPUT_DIRECTORY [--provenance PROVENANCE_PATH] [OPTIONS]

Arguments

  • outdir

    The directory where the output will be written to.

Required

  • --provenance PROVENANCE_PATH

    Path to file containing provenance information about input files in JSON format.

Optional but either `samples`, `cm4ai_table`, `protein_list` or `cell_line` parameter is required

  • --samples SAMPLES_PATH

    CSV file with list of IF images to download. The file follow a specific format with columns such as filename, if_plate_id, position, sample, locations, antibody, ensembl_ids, and gene_names.

  • --protein_list

    List of proteins for which HPA images will be downloaded. Each protein in new line.

  • --cell_line

    Cell line for which HPA images will be downloaded. See available cell lines at https://www.proteinatlas.org/humanproteome/cell+line.

  • --cm4ai_table CM4AI_TABLE_PATH

    Path to TSV file in CM4AI RO-Crate directory.

Optional

  • --unique UNIQUE_PATH: (Deprecated: Using –samples flag only is enough) CSV file of unique samples. The file should have columns like antibody, ensembl_ids, gene_names, atlas_name, locations, and n_location.

  • --proteinatlasxml: URL or path to proteinatlas.xml or proteinatlas.xml.gz file.

  • --fake_images: If set, the first image of each color is downloaded, and subsequent images are copies of those images. If --cm4ai_table flag is set, the --fake_images flag is ignored.

  • --poolsize: If using multiprocessing image downloader, this sets the number of current downloads to run.

  • --imgsuffix: Suffix for images to download (default is .jpg).

  • --skip_existing: If set, skips download if the image already exists and has a size greater than 0 bytes.

  • --skip_failed: If set, ignores images that failed to download after retries.

  • --logconf: Path to the python logging configuration file.

  • --skip_logging: If set, certain log files will not be created.

  • --verbose, -v: Increases verbosity of logger to standard error for log messages.

  • --version: Shows the current version of the tool.

Example usage

The example file can be downloaded from cm4ai.org. Go to Products -> Data, log in, and download file for IF images with the desired treatment, then unpack the tar.gz (tar -xzvf filename.tar.gz).

cellmaps_imagedownloadercmd.py ./cellmaps_imagedownloader_outdir  --cm4ai_table path/to/downloaded/unpacked/dir --provenance examples/provenance.json

Alternatively, use the files in the example directory in the repository:

  1. samples file: CSV file with list of IF images to download (see sample samples file in examples folder)

  2. unique file: CSV file of unique samples (see sample unique file in examples folder)

  3. provenance: file containing provenance information about input files in JSON format (see sample provenance file in examples folder)

cellmaps_imagedownloadercmd.py ./cellmaps_imagedownloader_outdir  --samples examples/samples.csv --unique examples/unique.csv --provenance examples/provenance.json

Via Docker

Example usage

Coming soon...