Adaptable Deep Learning Solutions with nGraph™ Compiler and ONNX*

Artificial intelligence methods and deep learning techniques based on neural networks continue to gain adoption in more industries. As neural networks’ architectures grow in complexity, they gain new capabilities, and the number of possible solutions for which they may be used also grows at an increasing rate.  With so many constantly-changing variables at play, finding some common ground for developers to collaborate to improve or adapt their solutions is important. Community-supported projects like ONNX are introducing standards that help frameworks interoperate and accommodate new architectures. Intel’s open-source nGraph Library and Compiler suite was an early supporter of ONNX. The nGraph team has already released a Python importer for running inference with ONNX-formatted models and is planning to support the newly-released ONNXIFI interface soon.

Given that algorithms are increasingly making use of a larger landscape of data of increasing quality, it may be difficult to predict what your future machine-learning requirements may be. At Intel, we are primed to give machine learning and AI developers maximum flexibility for software integration: freedom to create or use an optimized or scalable end-to-end system with any framework while also avoiding hardware lock-in. Using the most efficient and flexible implementations of algorithms for whatever part of the stack you are working on—from the cloud to bare metal—is increasingly important.

Many businesses offer services that harness deep learning in processing user requests. A common use case starts with training a specialized neural network on a big data set in a lab or on a large computing cluster. The trained network becomes part of a solution, which must then be deployed on scalable and affordable cloud edge infrastructure. Cloud infrastructure equipped with dedicated neural network accelerator hardware or GPUs is still rare and expensive; most data centers offer servers based on Intel CPUs. In this blog, we present a general overview of ONNX and nGraph and share some example code that can help anyone become acquainted with some of the work done thus far.

What is ONNX?

The Open Neural Network Exchange (ONNX) is a standard file format for storing neural networks. Models trained in a variety of frameworks can be exported to an ONNX file and later read with another framework for further processing. ONNX is an open standard backed by large industry players such as Microsoft, Facebook, and Amazon, as well as a broad community of users. Support for ONNX is being built into a growing number of deep learning frameworks including PyTorch*, Microsoft*’s Cognitive Toolkit (CNTK),  Caffe2*, and Apache MXNet*. Tools to use ONNX with many other frameworks like TensorFlow*, Apple* CoreML, Chainer*, and SciKit-Learn* are also under active development. ONNX files are useful for analysis and visualization of networks in tools such as Netron.

ONNX is aiming to be the standard solution for interoperability between different types of deep-learning software and exchange of models by the Machine Learning community. Thanks to ONNX, we can use any one of the compatible frameworks for designing, training, debugging, and deploying our neural networks. When the model is ready, we can export it to an ONNX file and run inference in an application.

What is nGraph?

nGraph is a Compiler, Library and runtime suite of tools (APIs) for custom deep learning solutions. The nGraph Compiler is Intel’s computational graph compiler for Neural Networks, able to transform a deep learning model into an executable, optimized function which runs efficiently on a variety of hardware, including Intel® Architecture Processors (CPUs), Intel® Nervana™ Neural Network Processor (Intel® Nervana™ NNP), graphics cards (GPUs) and other backends. nGraph provides both a C++ API for framework developers and a Python API which can be used to run inference on models imported from ONNX.

nGraph uses the Intel® Math Kernel Library for Deep Neural Networks (Intel MKL-DNN), and provides a significant performance boost on CPUs, such as those running in a cloud datacenter.

nGraph can also be used as a backend by deep learning frameworks such as MXNet*, TensorFlow* and neon™. Because nGraph optimizes the computation of an entire graph, in some scenarios, it can outperform even versions of frameworks optimized to use MKL-DNN directly.

Using nGraph-ONNX

The following example shows how easy it is to export a trained model from PyTorch to ONNX and use it to run inference with nGraph. More information about exporting ONNX models from PyTorch can be found here.

Start by exporting the ResNet-50 model from PyTorch’s model zoo to an ONNX file:

from torch.autograd import Variable
   import torch.onnx
   import torchvision

   # ImageNet input has 3 channels and 224x224 resolution
   imagenet_input = Variable(torch.randn(1, 3, 224, 224))

   # Download ResNet (or construct your model)
   model = torchvision.models.resnet50(pretrained=True)

   # Export model to an ONNX file
   torch.onnx.export(model, imagenet_input, 'resnet.onnx')

This should create a resnet.onnx file containing the model. Try opening the file in Netron to inspect it.  For detailed information about exporting ONNX files from frameworks like PyTorch Caffe2, CNTK, MXNet, TensorFlow, and Apple CoreML, tutorials are located here.

After the export is complete, you can import the model to nGraph using the ngraph-onnx companion tool which is also open source and available on GitHub.

from ngraph_onnx.onnx_importer.importer import import_onnx_file

   # Import the ONNX file
   models = import_onnx_file('resnet.onnx')

   # Import produces a list of models defined in the ONNX file
   [{'inputs': [<Parameter: '0' ([1, 3, 224, 224], float)>],
     'name': 'output',
     'output': <Add: 'output' ([1, 1000])>}]

nGraph’s Python API is easy to use and allows you to run inference on the imported model:

   import ngraph as ng

   # Create an nGraph runtime environment
   runtime = ng.runtime(backend_name='CPU')

   # Select the first model and compile it to a callable function
   model = models[0]
   resnet = runtime.computation(model['output'], *model['inputs'])

   # Load your input as a numpy array (here we just use dummy data)
   import numpy as np
   picture = np.ones([1, 3, 224, 224])

   # Run inference on the input data
   resnet(picture)

During the first run, your model is compiled to an executable function which will be called on every subsequent call to inference. For optimal results, set up a server which loads and compiles the model and waits for incoming requests.

Performance advantages of using nGraph

Depending on the hardware platform and framework you’re using, performance benefits of using nGraph can be quite significant. Figure 1 below presents a comparison of running inference on a ResNet-50 model natively in PaddlePaddle, PyTorch, Caffe2 and CNTK with running the ONNX version of the same model in nGraph. See the Configuration Details in the footnotes for how we achieved these numbers.


Figure 1: Inference latency for ResNet50 using various frameworks compared to running the same model via ONNX in nGraph. Batch size=1, input size=3x224x224. Smaller bar (shorter inference time) is better

In conclusion, we are excited about the results we’ve obtained thus far: vastly improved latency performance over native implementations of inference solutions across multiple frameworks. Inference applications that are designed, built, tested, and deployed to make use of ONNX and our nGraph Python APIs provide a significant performance advantage and can help developers adapt and evolve their AI platforms and solutions anywhere on the stack.

Configuration Details

Hardware configuration:

2S Intel(R) Xeon(R) Platinum 8180 CPU @ 2.50GHz (28 cores), HT enabled, turbo enabled, 384GB (12 * 32GB) DDR4 ECC SDRAM RDIMM @ 2666MHz (Micron* part no. 36ASF4G72PZ-2G6D1), 960GB SSD 2.5in SATA 3.0 6Gb/s Intel SSDSC2KB96, ethernet adapter: Intel PCH Integrated 10 Gigabit Ethernet Controller

Software configuration:

Ubuntu 16.04.4 LTS (GNU/Linux 4.4.0-127-generic x86_64).

Software Release Versions:

ngraph - commit 6ccfbeb
ngraph-onnx - commit 6ad1f49
onnx - commit 410530e
Caffe2 - version 0.8.1, commit 2063fc7
CNTK - version 2.5.1
PyTorch - version 0.4.0-cp35-cp35m-linux_x86_64
PaddlePaddle - commit 94a741d

Measurement:

inference on ResNet-50 with batch size 1, median performance based on 10000 repeats

Scripts:

https://github.com/NervanaSystems/ngraph-onnx/tree/pub_blog_benchmarks/benchmarks

Prerequisites:

Please make sure the following are installed on your system:

  • git
  • git-lfs
  • Docker

Command lines:

git clone -b pub_blog_benchmarks \
  https://github.com/NervanaSystems/ngraph-onnx/
cd ngraph-onnx/benchmarks
./run_benchmarks.sh -s 10000

Notices and Disclaimers

Tests document performance of components on a particular test, in specific systems. Differences in hardware, software, or configuration will affect actual performance. Consult other sources of information to evaluate performance as you consider your purchase.  For more complete information about performance and benchmark results, visit www.intel.com/benchmarks.

Intel, Intel Nervana, Xeon, and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries

© Intel Corporation

*Other names and brands may be claimed as the property of others.