neon v1.2 release: Kepler & AWS support are back, Deep ResNets, and more

We are excited to share neon’s v1.2 release with the community, which has several major features (Kepler support, new macrobatch and serialization enhancements) and examples, along with an expanded Model Zoo to help users get started with their use cases.

  • New storage format (docs) and data loader for loading datasets that do not fit in memory. Pipelines disk-to-host transfer, data decoding (if needed), data augmentation (if needed), and host-to-device transfer using thread pools so that computation on device is never starved for data. Loader is currently fully functional for image classification, but we will soon add support for additional dataset types such as video, speech, and text under the same framework.
  • New model serialization format that includes model architecture (we will soon be releasing a set of tools to allow porting Caffe models to neon format)
  • CPU installation defaults to multithreaded CPU backend using optimized BLAS libraries
  • Support for Kepler GPUs.  Backward compatibility with Kepler class GPUs will allow users to get started with neon using the older GPUs provided by AWS and other cloud providers.

Some benchmark numbers for neon vs. Caffe (using cuDNNv3), and for Nervana Cloud (Titan X) vs AWS (Grid K520) are below (smaller numbers are better). Clearly, neon with Maxwell GPUs and our Cloud are still the recommended ways of using neon for the best performance (typically 10x faster vs. using AWS). Even though we did not prioritize optimizing for AWS, we surpassed cuDNN v3 performance for fprop (inference) for AlexNet on AWS. Also, note that these networks typically run for several days or weeks, and these are just times for 1 iteration, so even small differences here could correspond to hours or days saved by users using the Nervana Platform over using AWS. Combined with our multi-GPU implementation we can achieve a ~70x speedup over AWS g2.2xlarge performance.

benchmark multi gpu implementation

performance numbers multi gpu implementation

GoogLeNet and VGG are too large to fit on AWS GPUs. Numbers below are for Nervana Cloud (Titan X).
benchmark my multi gpu performancefast benchmark multi gpu

We continue to top the speed benchmarks, and are continuously working on improving ease of use. Expanding our automatic differentiation feature beyond individual layers to work with full networks is our next major milestone to make exploratory investigations even easier. We look forward to the creative ways in which the deep learning community will use neon. Drop us a note at with any feedback (both positive and negative!).