IceCat package¶
IceCat.IceCat submodule¶
-
class
IceCat.IceCat.IceCat(log=None, xml_file=None, auth=('user', 'passwd'), data_dir='_data/')¶ Bases:
objectBase Class for all Ice Cat Mappings. Do not call this class directly.
Parameters: - log – optional logging.getLogger() instance
- xml_file – XML product index file. If None the file will be downloaded from the Ice Cat web site.
- auth – Username and password touple, as needed for Ice Cat website authentication
- data_dir – Directory to hold downloaded reference and product xml files
-
class
IceCat.IceCat.IceCatCatalog(suppliers=None, categories=None, exclude_keys=['Country_Markets'], fullcatalog=False, *args, **kwargs)¶ Bases:
IceCat.IceCat.IceCatParse Ice Cat catalog index file. Special handling of the input data is based on IceCAT OCI Revision date: April 24, 2015, Version 2.46:
- resolve supplier ID, and Category ID to their english names
- unroll ean_upcs nested structure to flat value, or list
- convert attribute names according to the table (to lower case)
- drop keys in the exclude_list, default [‘Country_Markets’]
- discard parent layers above ‘file’ key
Parameters: - suppliers – IceCatSupplierMapping object. If None specified a mapping is instantiated inside the class.
- categories – IceCatCategoryMapping object. If None specified a mapping is instantiated inside the class.
- exclude_keys – a list of keys to omit from the product index.
- fullcatalog – Set to True to download full product catalog. 64-bit python is required for this option because of >2GB memory footprint. You will need ~4.5 GB of virtual memory to process a 500k item catalog.
Refer to IceCat class for additional arguments
-
TYPE= 'Catalog Index'¶
-
add_product_details(keys=['ProductDescription'])¶ Download and parse product details. Use add_product_details_parallel() instead, for a much improved performance.
Parameters: keys – List of Ice Cat product detail XML keys to include in the output. Refer to Basic Usage Example.
-
add_product_details_parallel(keys=['ProductDescription'], connections=5)¶ Download and parse product details, using threads.
Parameters: - keys – List of Ice Cat product detail XML keys to include in the output. Refer to Basic Usage Example.
- connections – Number of simultanious download threads. Do not go over 100.
-
baseurl= 'https://data.icecat.biz/export/freexml/EN/'¶
-
dump_to_file(filename=None)¶ Save product attributes to a JSON file
Parameters: filename – File name
-
get_data()¶ Return ordered list of product attributes
-
class
IceCat.IceCat.IceCatCategoryMapping(log=None, xml_file=None, auth=('user', 'passwd'), data_dir='_data/')¶ Bases:
IceCat.IceCat.IceCatCreate a dict of product category IDs to category names
Refer to IceCat class for arguments
-
FILENAME= 'CategoriesList.xml.gz'¶
-
TYPE= 'Categories List'¶
-
baseurl= 'https://data.icecat.biz/export/freexml/refs/'¶
-
get_cat_byId(cat_id)¶ Return a Product Category or False if no match
Parameters: cat_id – Category ID
-
-
class
IceCat.IceCat.IceCatProductDetails(keys, cleanup_data_files=True, filename=None, *args, **kwargs)¶ Bases:
IceCat.IceCat.IceCatExtract product detail data. It’s unusual to call this class directly. Used by add_product_details..()
Parameters: - keys – a list of product detail keys. Refer to Basic Usage Example
- cleanup_data_files – whether to delete xml files after parsing.
- filename – xml file with the product details
Refer to IceCat class for additional arguments
-
TYPE= 'Product details'¶
-
baseurl= 'https://data.icecat.biz/'¶
-
get_data()¶
-
o= {}¶
-
class
IceCat.IceCat.IceCatSupplierMapping(log=None, xml_file=None, auth=('user', 'passwd'), data_dir='_data/')¶ Bases:
IceCat.IceCat.IceCatCreate a dict of product supplier IDs to supplier names
Refer to IceCat class for arguments
-
FILENAME= 'supplier_mapping.xml'¶
-
TYPE= 'Supplier Mapping'¶
-
baseurl= 'https://data.icecat.biz/export/freeurls/'¶
-
get_mfr_byId(mfr_id)¶ Return a Product Supplier or False if no match
Parameters: mfr_id – Supplier ID
-
-
IceCat.IceCat.langid= '1'¶ Process only English data
IceCat.bulk_downloader submodule¶
-
class
IceCat.bulk_downloader.fetchURLs(log=None, urls=['http://www.google.com/', 'http://www.bing.com/', 'http://www.yahoo.com/'], data_dir='_data/product_xml/', auth=('goober@aol.com', 'password'), connections=5)¶ Bases:
objectDownload and save a list of URLs using parallel connections. A separate session is maintained for each download thread. If throttling is detected (broken connections) the thread is terminated in order to reduce the load on the web serve. If a local file already exists for a given URL, that URL is skipped. There is no check currently if remote document is newer than the local file. If the URL does not end with a file name fetchURLs will generate a default filename in the format <website>.index.html
Parameters: - urls – A list of absolute URLs to fetch
- data_dir – Directory to save files in
- connections – Number of simultanious download threads
- auth – Username and password touple, if needed for website authentication
- log – An optional logging.getLogger() instance
This class is usually called from IceCat
-
get_count()¶ Returns the number of successfully fetched urls