IceCat 2.1.0

IceCat module pulls down a local copy of data from the http://icecat.biz/ open catalog. The module requires login credentials to the IceCat website. The basic catalog version if free with 500k products. The full catalog contains ~3mln products and distrubuted with a paid subscription.

Requirements

  • python 3.3 or above, (64-bit for full catalog import)
  • requests, urlib3, xml2dict, progressbar2 libraries.
  • see requirements.txt in the source distribution for details

Features

  • For each product category id, manufacturer id are resolved to their actual names.
  • Product detail data can be added to the daily and full index, with flexible data fields
  • English language data import
  • The output is a flat JSON file (nested lists are flattened)
  • Fast parallel download of the product xml files with threads
  • Source data files are preserved in the filesystem for reference
  • Flexible XML field mapping
  • Tested against live IceCat web API

Basic usage

from IceCat import IceCat

# setup temp data directory, output file name, auth info
data_dir = '_daily_test_data/'
auth = ('icat_user', 'icat_passwd')
output_file = 'daily.json'

# specify additional product detail keys
detail_keys=['ProductDescription[@LongDesc]',
                        'ShortSummaryDescription',
                        'LongSummaryDescription',
                        'ProductDescription[@ShortDesc]']

# create the catalog instance.
# this will pull reference files, and the daily produc index file
catalog = IceCat.IceCatCatalog(data_dir=data_dir, auth=auth)

# add product details
# this will download and parse individual product XML for
# every item listed in the daily file
catalog.add_product_details_parallel(keys=detail_keys,connections=100)

# save the results to a JSON file
catalog.dump_to_file(output_file)

Advanced usage:

By default a daily product index file is downloaded and parsed, usually limited serveral thousand items. For a full catalog processing a 64-bit version of python is needed to address >2GB of virtual memory

catalog = IceCat.IceCatCatalog(data_dir=data_dir,
        auth=auth,
        fullcatalog=True)

To process a local XML index file, instead of downloading one from Ice Cat:

catalog = IceCat.IceCatCatalog(data_dir=data_dir,
        auth=auth,
        xml_file="_test_data/daily.index.test.xml")

Or with local supplier/categories reference data:

categories = IceCat.IceCatCategoryMapping(log=log,
        xml_file="_test_data/CategoriesList.xml",
        data_dir=data_dir)
suppliers = IceCat.IceCatSupplierMapping(log=log,
        auth=auth,
        xml_file="_test_data/supplier_mapping.xml",
        data_dir=data_dir)
catalog = IceCat.IceCatCatalog(log=log,
        xml_file="_test_data/daily.index.test.xml",
        suppliers=suppliers,
        categories=categories,
        data_dir=data_dir)

Indices and tables