Author Bio Image

Arjun Bansal

Vice President and General Manager, Artificial Intelligence Software and Lab at Intel

We are excited to share neon’s v1.2 release with the community, which has several major features (Kepler support, new macrobatch and serialization enhancements) and examples, along with an expanded Model Zoo to help users get started with their use cases.

  • New storage format (docs) and data loader for loading datasets that do not fit in memory. Pipelines disk-to-host transfer, data decoding (if needed), data augmentation (if needed), and host-to-device transfer using thread pools so that computation on device is never starved for data. Loader is currently fully functional for image classification, but we will soon add support for additional dataset types such as video, speech, and text under the same framework.
  • New model serialization format that includes model architecture (we will soon be releasing a set of tools to allow porting Caffe models to neon format)
  • CPU installation defaults to multithreaded CPU backend using optimized BLAS libraries
  • Support for Kepler GPUs.  Backward compatibility with Kepler class GPUs will allow users to get started with neon using the older GPUs provided by AWS and other cloud providers.

Some benchmark numbers for neon vs. Caffe (using cuDNNv3), and for Nervana Cloud (Titan X) vs AWS (Grid K520) are below (smaller numbers are better). Clearly, neon with Maxwell GPUs and our Cloud are still the recommended ways of using neon for the best performance (typically 10x faster vs. using AWS). Even though we did not prioritize optimizing for AWS, we surpassed cuDNN v3 performance for fprop (inference) for AlexNet on AWS. Also, note that these networks typically run for several days or weeks, and these are just times for 1 iteration, so even small differences here could correspond to hours or days saved by users using the Nervana Platform over using AWS. Combined with our multi-GPU implementation we can achieve a ~70x speedup over AWS g2.2xlarge performance.

benchmark multi gpu implementation

performance numbers multi gpu implementation

GoogLeNet and VGG are too large to fit on AWS GPUs. Numbers below are for Nervana Cloud (Titan X).
benchmark my multi gpu performancefast benchmark multi gpu

We continue to top the speed benchmarks, and are continuously working on improving ease of use. Expanding our automatic differentiation feature beyond individual layers to work with full networks is our next major milestone to make exploratory investigations even easier. We look forward to the creative ways in which the deep learning community will use neon. Drop us a note at products@nervanasys.com with any feedback (both positive and negative!).

Author Bio Image

Arjun Bansal

Vice President and General Manager, Artificial Intelligence Software and Lab at Intel

Related Blog Posts

neon v2.3.0: Significant Performance Boost for Deep Speech 2 and VGG models

We are excited to announce the release of neon™ 2.3.0.  It ships with significant performance improvements for Deep Speech 2 (DS2) and VGG models running on Intel® architecture (IA). For the DS2 model, our tests show up to 6.8X improvement1,4 with the Intel® Math Kernel Library (Intel® MKL) backend over the NumPy CPU backend with…

Read more

#Release Notes

BDW-SKX Normalized Throughput

neon v2.1.0: Leveraging Intel® Advanced Vector Extensions 512 (Intel® AVX-512)

We are excited to announce the availability of neon™ 2.1 framework. An optimized backend based on Intel® Math Kernel Library (Intel® MKL), is enabled by default on CPU platforms with this release. neon™ 2.1 also uses a newer version of the Intel ® MKL for Deep Neural Networks (Intel ® MKL-DNN), which features optimizations for…

Read more

#Release Notes

neon™ 2.0: Optimized for Intel® Architectures

neon™ is a deep learning framework created by Nervana Systems with industry leading performance on GPUs thanks to its custom assembly kernels and optimized algorithms. After Nervana joined Intel, we have been working together to bring superior performance to CPU platforms as well. Today, after the result of a great collaboration between the teams, we…

Read more

#Release Notes