IceCat package¶

IceCat.IceCat submodule¶

class IceCat.IceCat.IceCat(log=None, xml_file=None, auth=('user', 'passwd'), data_dir='_data/')¶

Bases: object

Base Class for all Ice Cat Mappings. Do not call this class directly.

Parameters:	log – optional logging.getLogger() instance xml_file – XML product index file. If None the file will be downloaded from the Ice Cat web site. auth – Username and password touple, as needed for Ice Cat website authentication data_dir – Directory to hold downloaded reference and product xml files

class IceCat.IceCat.IceCatCatalog(suppliers=None, categories=None, exclude_keys=['Country_Markets'], fullcatalog=False, *args, **kwargs)¶

Bases: IceCat.IceCat.IceCat

Parse Ice Cat catalog index file. Special handling of the input data is based on IceCAT OCI Revision date: April 24, 2015, Version 2.46:

resolve supplier ID, and Category ID to their english names

unroll ean_upcs nested structure to flat value, or list

convert attribute names according to the table (to lower case)

drop keys in the exclude_list, default [‘Country_Markets’]

discard parent layers above ‘file’ key

Parameters:

suppliers – IceCatSupplierMapping object. If None specified a mapping is instantiated inside the class.
categories – IceCatCategoryMapping object. If None specified a mapping is instantiated inside the class.
exclude_keys – a list of keys to omit from the product index.
fullcatalog – Set to True to download full product catalog. 64-bit python is required for this option because of >2GB memory footprint. You will need ~4.5 GB of virtual memory to process a 500k item catalog.

Refer to IceCat class for additional arguments

TYPE = 'Catalog Index'¶

add_product_details(keys=['ProductDescription'])¶

Download and parse product details. Use add_product_details_parallel() instead, for a much improved performance.

Parameters:	keys – List of Ice Cat product detail XML keys to include in the output. Refer to Basic Usage Example.

add_product_details_parallel(keys=['ProductDescription'], connections=5)¶

Download and parse product details, using threads.

Parameters:	keys – List of Ice Cat product detail XML keys to include in the output. Refer to Basic Usage Example. connections – Number of simultanious download threads. Do not go over 100.

baseurl = 'https://data.icecat.biz/export/freexml/EN/'¶

dump_to_file(filename=None)¶

Save product attributes to a JSON file

Parameters:	filename – File name

get_data()¶: Return ordered list of product attributes

class IceCat.IceCat.IceCatCategoryMapping(log=None, xml_file=None, auth=('user', 'passwd'), data_dir='_data/')¶

Bases: IceCat.IceCat.IceCat

Create a dict of product category IDs to category names

Refer to IceCat class for arguments

FILENAME = 'CategoriesList.xml.gz'¶

TYPE = 'Categories List'¶

baseurl = 'https://data.icecat.biz/export/freexml/refs/'¶

get_cat_byId(cat_id)¶

Return a Product Category or False if no match

Parameters:	cat_id – Category ID

class IceCat.IceCat.IceCatProductDetails(keys, cleanup_data_files=True, filename=None, *args, **kwargs)¶

Bases: IceCat.IceCat.IceCat

Extract product detail data. It’s unusual to call this class directly. Used by add_product_details..()

Parameters:	keys – a list of product detail keys. Refer to Basic Usage Example cleanup_data_files – whether to delete xml files after parsing. filename – xml file with the product details

Refer to IceCat class for additional arguments

TYPE = 'Product details'¶

baseurl = 'https://data.icecat.biz/'¶

get_data()¶

o = {}¶

class IceCat.IceCat.IceCatSupplierMapping(log=None, xml_file=None, auth=('user', 'passwd'), data_dir='_data/')¶

Bases: IceCat.IceCat.IceCat

Create a dict of product supplier IDs to supplier names

Refer to IceCat class for arguments

FILENAME = 'supplier_mapping.xml'¶

TYPE = 'Supplier Mapping'¶

baseurl = 'https://data.icecat.biz/export/freeurls/'¶

get_mfr_byId(mfr_id)¶

Return a Product Supplier or False if no match

Parameters:	mfr_id – Supplier ID

IceCat.IceCat.langid = '1'¶: Process only English data

IceCat.bulk_downloader submodule¶

class IceCat.bulk_downloader.fetchURLs(log=None, urls=['http://www.google.com/', 'http://www.bing.com/', 'http://www.yahoo.com/'], data_dir='_data/product_xml/', auth=('goober@aol.com', 'password'), connections=5)¶

Bases: object

Download and save a list of URLs using parallel connections. A separate session is maintained for each download thread. If throttling is detected (broken connections) the thread is terminated in order to reduce the load on the web serve. If a local file already exists for a given URL, that URL is skipped. There is no check currently if remote document is newer than the local file. If the URL does not end with a file name fetchURLs will generate a default filename in the format <website>.index.html

Parameters:	urls – A list of absolute URLs to fetch data_dir – Directory to save files in connections – Number of simultanious download threads auth – Username and password touple, if needed for website authentication log – An optional logging.getLogger() instance

This class is usually called from IceCat

get_count()¶: Returns the number of successfully fetched urls

IceCat package¶

IceCat.IceCat submodule¶

IceCat.bulk_downloader submodule¶

Table Of Contents

Related Topics

This Page