Preview Release: Intel® Nervana™ Graph
Nov 17, 2016
Nov 17, 2016
The field of deep learning is moving at a rapid pace. Practitioners need tools that are flexible enough to keep up. Theano popularized the notion of computational graphs as a powerful abstraction, and more recently, TensorFlow iterated on that concept. Together, they demonstrate some first steps in unlocking the potential of deep learning, but we now need even more from our tools to bring about the next generation of complex network topologies.
In working with our customers in healthcare, finance, agriculture, and the automotive industries, modern data scientists need:
To enable these capabilities, as tool builders, we need:
From our years of experience maintaining one of the fastest deep learning libraries, and over a year iterating on graph based designs, we now wish to share a preview release of the Nervana Graph (ngraph) to address these aims. This release is composed of three parts:
Let us consider each of these in turn and the way they empower users.
The computational graphs of Theano and TensorFlow require a user to reason about the underlying tensor shapes while constructing the graph. This is tedious and error prone for the user and eliminates the ability for a compiler to reorder axes to match the assumptions of particular hardware platforms.
Instead, the ngraph API enables users to define a set of named axes, attach them to tensors during graph construction, and specify them by name (rather than position) when needed. These axes can be named according to the particular domain of the problem at hand to help a user with these tasks. This flexibility then allows the necessary reshaping/shuffling to be inferred by the transformer before execution. Additionally, these inferred tensor axis orderings can then be optimized across the entire computational graph for ordering preferences of the underlying runtimes/hardware platforms to optimize cache locality and runtime execution time.
These capabilities underline one of the tenets of ngraph which is to operate at a high enough layer of abstraction that transformers can make execution efficient without needing a “sufficiently smart compiler” while also allowing users and frontends to more easily compose these building blocks together.
Most applications and users don’t need the full flexibility offered by the ngraph API, so we are also introducing a higher level neon API which offers a user a composable interface with the common building blocks to construct deep learning models. This includes objects like common optimizers, metrics, and layer types such as linear, batch norm, convolutional, and RNN. We also illustrate these with example networks training on MNIST digits, CIFAR-10 images, and the Penn Treebank text corpus.
This next generation of the neon deep learning API together with the ngraph backend machinery will eventually replace our current neon library, while still offering the same world leading performance, and extensive open model catalog as before. We will be making this transition when performance, stability, and the available models and tooling match what is currently available. We expect this to occur sometime in the next several months.
We also realize that users already know and use existing frameworks today and might want to continue using/combine models written in other frameworks. To that end, we demonstrate the capability to convert existing TensorFlow models into ngraphs and execute them using ngraph transformers. This importer supports a variety of common operation types today and will be expanding in future releases. We also plan on implementing compatibility with other frameworks in the near future, so stay tuned.
Additionally, we wish to stress that because ngraph offers the core building blocks of deep learning computation and multiple high performance backends, adding frontends is a straightforward affair and improvements to a backend (or new backends) are automatically leveraged by all existing and future frontends. So users get to keep using their preferred syntax while benefiting from the shared compilation machinery.
Making sure that models execute quickly with minimal memory overhead is critical given the millions or even billions of parameters and weeks of training time used by state of the art models. Given our experience building and maintaining one of the fastest deep learning libraries, we appreciate the complexities of modern deep learning performance:
With these realities in mind, we designed ngraph transformers to automate and abstract these details away from frontends through clean APIs, while allowing the power user room to tweak things all simultaneously while not limiting the flexible abstractions for model creation.
In ngraph, we believe the key to achieving these goals rests in standing on the shoulders of giants in modern compiler design to promote flexibility and experimentation in choosing the set and order of compiler optimizations for a transformer to use. These operating principles increase the flexibility of our tools while reducing complexity. This makes it easier for contributors to add backend code to support exotic models without needing to understand or modify assumptions made elsewhere in the system.
Each ngraph transformer (or backend in LLVM parlance) targets a particular hardware backend and acts as an interface to compile an ngraph into a computation that is ready to be evaluated by the user as a function handle.
Today, ngraph ships with a transformer for GPU and CPU execution, and in the future we plan on implementing heterogeneous device transformers with distributed training support.
For an example of building and executing ngraphs, please see the walkthrough in our documentation, but we include here a “hello world” example, which will print the numbers 1 through 5.
import ngraph as ng
import ngraph.transformers as ngt
# Build a graph
x = ng.placeholder(())
x_plus_one = x + 1
# Construct a transformer
transformer = ngt.make_transformer()
# Define a computation
plus_one = transformer.computation(x_plus_one, x)
# Run the computation
for i in range(5):
As this is a preview release, we have much work left to do. Currently we include working examples of:
Following Nervana’s acquisition by Intel, we have a rapidly growing team of world-class experts spanning compilers, distributed systems, systems software and deep learning contributing to this project. We are actively working towards:
With the rapid pace of development in the deep learning community we realize that a project like this won’t succeed without community participation. That’s why we’re putting this preview release out to get early feedback and encourage people like you to join us define the next wave of deep learning tooling. Towards this, we’ve also decided to release our entire commit history to show our trajectory and the many previous approaches we’ve tried to get here. We also encourage hardware developers to get involved to make ngraph the gold reference in performance for all hardware platforms.
We are building the Intel Nervana Graph project to be the LLVM for deep learning, and today we are excited to announce a beta release of our work we previously announced in a technical preview. We see the Intel Nervana Graph project as the beginning of an ecosystem of optimization passes, hardware backends and frontend…
Keep tabs on all the latest news with our monthly newsletter.