Accelerating Memory-Bound Machine Learning Models on Intel® Xeon® Processors

Machine learning models must retrieve and process data from memory during training. Given the type, size, and shape of that data, as well as the compute hardware and software stack, the training time will be gated by either the computation speed (compute-bound) or the data retrieval speed (memory-bound). This article outlines behavior symptomatic of memory-bound deep learning applications, and suggests optimizations which may accelerate training times up to 100X in similar settings.

Download File