IceCat package

IceCat.IceCat submodule

class IceCat.IceCat.IceCat(log=None, xml_file=None, auth=('user', 'passwd'), data_dir='_data/')

Bases: object

Base Class for all Ice Cat Mappings. Do not call this class directly.

Parameters:
  • log – optional logging.getLogger() instance
  • xml_file – XML product index file. If None the file will be downloaded from the Ice Cat web site.
  • auth – Username and password touple, as needed for Ice Cat website authentication
  • data_dir – Directory to hold downloaded reference and product xml files
class IceCat.IceCat.IceCatCatalog(suppliers=None, categories=None, exclude_keys=['Country_Markets'], fullcatalog=False, *args, **kwargs)

Bases: IceCat.IceCat.IceCat

Parse Ice Cat catalog index file. Special handling of the input data is based on IceCAT OCI Revision date: April 24, 2015, Version 2.46:

  • resolve supplier ID, and Category ID to their english names
  • unroll ean_upcs nested structure to flat value, or list
  • convert attribute names according to the table (to lower case)
  • drop keys in the exclude_list, default [‘Country_Markets’]
  • discard parent layers above ‘file’ key
Parameters:
  • suppliers – IceCatSupplierMapping object. If None specified a mapping is instantiated inside the class.
  • categories – IceCatCategoryMapping object. If None specified a mapping is instantiated inside the class.
  • exclude_keys – a list of keys to omit from the product index.
  • fullcatalog – Set to True to download full product catalog. 64-bit python is required for this option because of >2GB memory footprint. You will need ~4.5 GB of virtual memory to process a 500k item catalog.

Refer to IceCat class for additional arguments

TYPE = 'Catalog Index'
add_product_details(keys=['ProductDescription'])

Download and parse product details. Use add_product_details_parallel() instead, for a much improved performance.

Parameters:keys – List of Ice Cat product detail XML keys to include in the output. Refer to Basic Usage Example.
add_product_details_parallel(keys=['ProductDescription'], connections=5)

Download and parse product details, using threads.

Parameters:
  • keys – List of Ice Cat product detail XML keys to include in the output. Refer to Basic Usage Example.
  • connections – Number of simultanious download threads. Do not go over 100.
baseurl = 'https://data.icecat.biz/export/freexml/EN/'
dump_to_file(filename=None)

Save product attributes to a JSON file

Parameters:filename – File name
get_data()

Return ordered list of product attributes

class IceCat.IceCat.IceCatCategoryMapping(log=None, xml_file=None, auth=('user', 'passwd'), data_dir='_data/')

Bases: IceCat.IceCat.IceCat

Create a dict of product category IDs to category names

Refer to IceCat class for arguments

FILENAME = 'CategoriesList.xml.gz'
TYPE = 'Categories List'
baseurl = 'https://data.icecat.biz/export/freexml/refs/'
get_cat_byId(cat_id)

Return a Product Category or False if no match

Parameters:cat_id – Category ID
class IceCat.IceCat.IceCatProductDetails(keys, cleanup_data_files=True, filename=None, *args, **kwargs)

Bases: IceCat.IceCat.IceCat

Extract product detail data. It’s unusual to call this class directly. Used by add_product_details..()

Parameters:
  • keys – a list of product detail keys. Refer to Basic Usage Example
  • cleanup_data_files – whether to delete xml files after parsing.
  • filename – xml file with the product details

Refer to IceCat class for additional arguments

TYPE = 'Product details'
baseurl = 'https://data.icecat.biz/'
get_data()
o = {}
class IceCat.IceCat.IceCatSupplierMapping(log=None, xml_file=None, auth=('user', 'passwd'), data_dir='_data/')

Bases: IceCat.IceCat.IceCat

Create a dict of product supplier IDs to supplier names

Refer to IceCat class for arguments

FILENAME = 'supplier_mapping.xml'
TYPE = 'Supplier Mapping'
baseurl = 'https://data.icecat.biz/export/freeurls/'
get_mfr_byId(mfr_id)

Return a Product Supplier or False if no match

Parameters:mfr_id – Supplier ID
IceCat.IceCat.langid = '1'

Process only English data

IceCat.bulk_downloader submodule

class IceCat.bulk_downloader.fetchURLs(log=None, urls=['http://www.google.com/', 'http://www.bing.com/', 'http://www.yahoo.com/'], data_dir='_data/product_xml/', auth=('goober@aol.com', 'password'), connections=5)

Bases: object

Download and save a list of URLs using parallel connections. A separate session is maintained for each download thread. If throttling is detected (broken connections) the thread is terminated in order to reduce the load on the web serve. If a local file already exists for a given URL, that URL is skipped. There is no check currently if remote document is newer than the local file. If the URL does not end with a file name fetchURLs will generate a default filename in the format <website>.index.html

Parameters:
  • urls – A list of absolute URLs to fetch
  • data_dir – Directory to save files in
  • connections – Number of simultanious download threads
  • auth – Username and password touple, if needed for website authentication
  • log – An optional logging.getLogger() instance

This class is usually called from IceCat

get_count()

Returns the number of successfully fetched urls