cellmaps_imagedownloader package
cellmaps_imagedownloader.runner module
- class cellmaps_imagedownloader.runner.CM4AICopyDownloader[source]
Bases:
ImageDownloader
Copies over images from CM4AI RO-Crate
Constructor
- class cellmaps_imagedownloader.runner.CellmapsImageDownloader(outdir=None, imgsuffix='.jpg', imagedownloader=<cellmaps_imagedownloader.runner.MultiProcessImageDownloader object>, imagegen=None, imageurlgen=None, skip_logging=True, provenance=None, input_data_dict=None, provenance_utils=<cellmaps_utils.provenance.ProvenanceUtil object>, skip_failed=False, existing_outdir=False)[source]
Bases:
object
Downloads Immunofluorescent images from Human Protein Atlas storing them in an output directory that is locally registered as an RO-Crate
Constructor
- Parameters:
outdir (str) – directory where images will be downloaded to
imgsuffix (str) – suffix to append to image file names
imagedownloader (
ImageDownloader
) – object that will perform image downloadsimagegen (
ImageGeneNodeAttributeGenerator
) – gene node attribute generator for IF image dataimage_url (str) – Base URL for image download from Human Protein Atlas
skip_logging (bool) – If
True
skip logging, ifNone
orFalse
do NOT skip loggingprovenance (dict)
input_data_dict (dict)
provenance_utils (
ProvenanceUtil
) – Wrapper for fairscape-cli which is used for RO-Crate creation and population
- IMG_SUFFIX = '.jpg'
- SAMPLES_FILEKEY = 'samples'
- UNIQUE_FILEKEY = 'unique'
- static get_example_provenance(requiredonly=True, with_ids=False)[source]
Gets a dict of provenance parameters needed to add/register a dataset with FAIRSCAPE
- get_image_gene_node_attributes_file(fold)[source]
Gets full path to image gene node attribute file under output directory created when invoking
run()
- Returns:
Path to file
- Return type:
- get_image_gene_node_errors_file()[source]
Gets full path to image gene node attribute errors file under output directory created when invoking
run()
- Returns:
Path to file
- Return type:
- run()[source]
Downloads images to output directory specified in constructor using tsvfile for list of images to download
- Raises:
CellMapsImageDownloaderError – If there is an error
- Returns:
0 upon success, otherwise failure
- class cellmaps_imagedownloader.runner.FakeImageDownloader[source]
Bases:
ImageDownloader
Creates fake download by downloading the first image in each color from Human Protein Atlas and making renamed copies. The
download_file()
function is used to download the first image of each colorConstructor
- class cellmaps_imagedownloader.runner.ImageDownloader[source]
Bases:
object
Abstract class that defines interface for classes that download images
- class cellmaps_imagedownloader.runner.MultiProcessImageDownloader(poolsize=4, skip_existing=False, override_dfunc=None)[source]
Bases:
ImageDownloader
Uses multiprocess package to download images in parallel
Constructor
Warning
Exceeding poolsize of
4
causes errors from Human Protein Atlas site- Parameters:
poolsize (int) – Number of concurrent downloaders to use.
skip_existing (bool) – If
True
skip download if image file exists and has size greater then0
override_dfunc (
function
) – Function that takes a tuple (image URL, download str path) and downloads the image. IfNone
download_file()
function is used
- POOL_SIZE = 4
- download_images(download_list=None)[source]
Downloads images returning a list of failed downloads
from cellmaps_imagedownloader.runner import MultiProcessImageDownloader dloader = MultiProcessImageDownloader(poolsize=2) d_list = [('https://images.proteinatlas.org/992/1_A1_1_red.jpg', '/tmp/1_A1_1_red.jpg')] failed = dloader.download_images(download_list=d_list)
- cellmaps_imagedownloader.runner.download_file(downloadtuple)[source]
Downloads file pointed to by ‘download_url’ to ‘destfile’
Note
Default download function used by
MultiProcessImageDownloader
cellmaps_imagedownloader.gene module
- class cellmaps_imagedownloader.gene.CM4AITableConverter(cm4ai=None, fileprefix='B2AI_1_', cell_line='MDA-MB-468')[source]
Bases:
object
Converts CM4AI table in an RO-Crate to samples and unique lists compatible with
ImageGeneNodeAttributeGenerator
Constructor
- Parameters:
cm4ai (str) – Path to CM4AI RO-Crate, or CM4AI RO-Crate antibody_gene_table or URL where CM4AI RO-Crate can be downloaded
- get_samples_and_unique_lists()[source]
Gets samples and unique list compatible with
ImageGeneNodeAttributeGenerator
- Returns:
(samples list, unique list)
- Return type:
- class cellmaps_imagedownloader.gene.GeneNodeAttributeGenerator[source]
Bases:
object
Base class for GeneNodeAttribute Generator
Constructor
- get_gene_node_attributes()[source]
Should be implemented by subclasses
- Raises:
NotImplementedError – Always
- class cellmaps_imagedownloader.gene.GeneQuery(mygeneinfo=<mygene.MyGeneInfo object>)[source]
Bases:
object
Gets information about genes from mygene
Constructor
- get_symbols_for_genes(genelist=None, scopes='_id')[source]
Queries for genes via GeneQuery() object passed in via constructor
- Parameters:
- Returns:
result from mygene which is a list of dict objects where each dict is of format:
{ 'query': 'ID', '_id': 'ID', '_score': #.##, 'ensembl': { 'gene': 'ENSEMBLEID' }, 'symbol': 'GENESYMBOL' }
- Return type:
- class cellmaps_imagedownloader.gene.ImageGeneNodeAttributeGenerator(samples_list=None, unique_list=None, genequery=<cellmaps_imagedownloader.gene.GeneQuery object>)[source]
Bases:
GeneNodeAttributeGenerator
Creates Image Gene Node Attributes table
Constructor
samples_list is expected to be a list of
dict
objects with this format:# TODO: Move this to a separate data document
{ 'filename': HPA FILENAME, 'if_plate_id': HPA PLATE ID, 'position': POSITION, 'sample': SAMPLE, 'locations': COMMA DELIMITED LOCATIONS, 'antibody': ANTIBODY_ID, 'ensembl_ids': COMMA DELIMITED ENSEMBLID IDS, 'gene_names': COMMA DELIMITED GENE SYMBOLS }
Example:
{ 'filename': '/archive/1/1_A1_1_', 'if_plate_id': '1', 'position': 'A1', 'sample': '1', 'locations': 'Golgi apparatus', 'antibody': 'HPA000992', 'ensembl_ids': 'ENSG00000066455', 'gene_names': 'GOLGA5' }
unique_list is expected to be a list of
dict
objects with this format:{ 'antibody': ANTIBODY, 'ensembl_ids': COMMA DELIMITED ENSEMBL IDS, 'gene_names': COMMA DELIMITED GENE SYMBOLS, 'atlas_name': ATLAS NAME?, 'locations': COMMA DELIMITED LOCATIONS IN CELL, 'n_location': NUMBER OF LOCATIONS IN CELL, }
Example:
{ 'antibody': 'HPA040086', 'ensembl_ids': 'ENSG00000094914', 'gene_names': 'AAAS', 'atlas_name': 'U-2', 'locations': 'OS,Nuclear membrane', 'n_location': '2', }
- Parameters:
- LINKPREFIX_HEADER = 'linkprefix'
Column labels for samples file
- SAMPLES_HEADER_COLS = ['filename', 'if_plate_id', 'position', 'sample', 'locations', 'antibody', 'ensembl_ids', 'gene_names']
- UNIQUE_HEADER_COLS = ['antibody', 'ensembl_ids', 'gene_names', 'atlas_name', 'locations', 'n_location']
Column labels for unique file
- filter_samples_by_sample_urlmap(sample_url_map)[source]
Removes samples that lack a URL as noted in sample_url_map passed in.
- Raises:
CellMapsImageDownloaderError – if internal samples list is
None
- Parameters:
sample_url_map (dict) – map where key is image id and value is URL
- get_dicts_of_gene_to_antibody_filename()[source]
Gets a tuple of dictionaries from the sample list passed in via the constructor.
- get_gene_node_attributes(fold=1)[source]
Using samples_list and unique_list, builds a list of
dict
objects with updated Gene Symbols.Format of each resulting
dict
:{'name': GENE_SYMBOL, 'represents': ENSEMBL_ID, 'ambiguous': AMBIGUOUS_GENES, 'antibody': ANTIBODY, 'filename': FILENAME}
Example
{'ENSG00000066455': {'name': 'GOLGA5', 'represents': 'ensembl:ENSG00000066455', 'ambiguous': '', 'antibody': 'HPA000992', 'filename': '1_A1_2_,1_A1_1_'}}
- Returns:
(list of dict, list of errors)
- Return type:
- static get_image_id_for_sample(sample)[source]
Gets image id for sample passed in
- Parameters:
sample (dict) –
Assumed to be a dict of following format:
{'antibody': 'HPA0####', 'position': 'XXX', 'sample': 'XXX', 'if_plate_id: 'XXX'}
- Raises:
CellMapsImageDownloaderError – If sample is
None
, not a dict or is missing any of these keysantibody, position, sample, if_plate_id
- Returns:
<ANTIBODY WITH HPA0*|CAB0* REMOVED>/<IF_PLATE_ID>_<POSITION>_<SAMPLE>_
- Return type:
- static get_samples_from_csvfile(csvfile=None)[source]
Loads samples from a CSV file into a list of dictionaries.
- get_samples_list()[source]
Gets samples_list passed in via the constructor that has been filtered by unique_list passed in via the constructor
- Returns:
list of samples set via constructor
- Return type:
- get_samples_list_image_ids()[source]
Gets a list of image ids from the samples set via constructor
- Raises:
CellMapsImageDownloaderError – if samples list in constructor is
None
or if there was an issue parsing a sample- Returns:
image ids
- Return type:
cellmaps_imagedownloader.proteinatlas module
- class cellmaps_imagedownloader.proteinatlas.CM4AIImageCopyTupleGenerator(samples_list=None)[source]
Bases:
object
Gets URL to download images for given samples
- Parameters:
samples_list
- class cellmaps_imagedownloader.proteinatlas.ImageDownloadTupleGenerator(samples_list=None, reader=None, valid_image_ids=None)[source]
Bases:
object
Gets URL to download images for given samples
Constructor
- Parameters:
samples_list (list)
reader (
ProteinAtlasImageUrlReader
) – Used to get download URLs for imagesvalid_image_ids (set) – Image ids that need a download URL in format of
<ANTIBODY ID minus HPA or CAB prefix>/<IMAGE ID>
- get_next_image_url(color_download_map=None)[source]
Generator function that gets the next image URL to download
- Parameters:
color_download_map – dict of colors to location on filesystem
{'red': '/tmp/foo/red'}
- Returns:
list of tuples (image download URL, destination file path)
- Return type:
- class cellmaps_imagedownloader.proteinatlas.LinkPrefixImageDownloadTupleGenerator(samples_list=None)[source]
Bases:
object
Gets URL to download images for given samples
- Parameters:
samples_list
- class cellmaps_imagedownloader.proteinatlas.ProteinAtlasImageUrlReader(reader=None)[source]
Bases:
object
Takes a proteinatlas generator to get value between <imageUrl>XXX</imageUrl> lines with the keyword _blue in them
Constructor
- Parameters:
reader (
ProteinAtlasReader
)
- class cellmaps_imagedownloader.proteinatlas.ProteinAtlasProcessor(outdir=None, proteinatlas=None, proteinlist_file=None, cell_line=None)[source]
Bases:
object
cellmaps_imagedownloader.cellmaps_imagedownloadercmd module
- cellmaps_imagedownloader.cellmaps_imagedownloadercmd.main(args)[source]
Main entry point for program
- Parameters:
args (list) – arguments passed to command line usually
sys.argv[1:]()
- Returns:
return value of
cellmaps_imagedownloader.runner.CellmapsImageDownloader.run()
or2
if an exception is raised- Return type:
cellmaps_imagedownloader.exceptions module
Module contents
Top-level package for cellmaps_imagedownloader.