Training Generative Adversarial Networks in Flexpoint

Jun 19, 2017

Training Generative Adversarial Networks in Flexpoint

With the recent flood of breakthrough products using deep learning for image classification, speech recognition and text understanding, it’s easy to think deep learning is just about supervised learning. But supervised learning requires labels, which most of the world’s data does not have. Instead, unsupervised learning, extracting insights from unlabeled data will open deep learning to a diverse set of applications.

There are obvious use cases such as using generative models for tasks such as texture generation or super-resolution ( Even more interesting are the possibilities for  semi-supervised learning that involves learning of efficient data representations, achieving similar performance to models that today require millions of images, or thousands of hours of speech to train with just a handful of labeled examples. In this blog post, we will explore how this research ties in with our work on high performance, low bit-width deep learning hardware.


The Twitter Cortex team uses GANs for superresolution. Left is the original image, right, the high-resolution version produced by the network.


Unsupervised deep learning has gained substantial momentum with Generative Adversarial Networks (GANs), which poses the network training as a two-player game, with two networks competing against each other. One of the networks, the generator, learns to transform low-dimensional noise to mimic the training data (e.g. images). The second network, the discriminator, learns to distinguish fake images produced by the generator from real images in the training data. Thus the cost function of the GAN is based on a simple binary classification problem: the discriminator is trained to classify as accurately as possible, and the generator optimized to confuse the discriminator as much as possible. In a perfectly trained GAN with sufficient capacity, the generator outputs data with indistinguishable statistics as real data and the discriminator performs at pure chance level; this is a stable state theoretically, though convergence to it is often tricky to obtain in practice. GANs were invented by Ian Goodfellow in 2014, while he was a grad student in Yoshua Bengio’s lab in Montreal. Ian has a lot of practical information about training GANs in his NIPS 2016 Tutorial. A particularly popular flavor of this model is the DC-GAN that was developed by researchers at FAIR.


One dog is real, one is generated by the DC-GAN algorithm. Left is the real dog, right is the generated dog.

The Wasserstein GAN (W-GAN) marked a recent and major milestone in GAN development, developed by Martin Arjovsky at NYU’s Courant Institute of Mathematical Sciences together with Facebook researchers. The W-GAN has two big advantages: It is easier to train than a standard GAN because the cost function provides a more robust gradient signal. It also comes with a cost function (the Wasserstein-1 distance estimate) that can be used to monitor convergence, making it much easier to design a good model and find the right set of hyperparameters.

GANs are popular for generative models of images, but haven’t reached the holy grail of generating the full distribution of natural images yet. They do a lot better when trained on images from a particular class such as birds and flowers, facesand for unfathomable reasons, images of bedrooms, where a set of 3 million images is available from Princeton’s large-scale scene understanding dataset.  

Unsupervised Learning meets low precision

At Nervana, we follow the cutting edge of machine learning research very closely, so we can optimally support new models in the next generation of AI hardware we are developing. Our team of data scientists implements new models in our neon deep learning framework, where we can run it through our suite of simulators. One of the main differences between Nervana hardware and other accelerators is that we use the Flexpoint data format, which combines the hardware-friendly aspects of fixed point with the “it just works” user friendliness of floating point.

There has been a lot of research into low precision data types for deep learning, ranging from 16-bit floating point all the way down to binary neural networks, which we have blogged about previously. Often these networks require significant changes from their 32-bit counterparts, while Flexpoint is designed to work without any changes to the network or training procedure.

To prove the point, we took our W-GAN implementation and trained it on the LSUN bedroom dataset both in 32-bit floating point and Flexpoint with a 16-bit mantissa and 5-bit exponent. The only changes we made to the original model were to use uniform instead of Gaussian noise (since it’s a little bit faster to sample), and a noise dimensionality of 128 instead of 100 (we really like powers of two). Neither of these changes seem to affect the quality of the results, which are shown below. The samples generated by the two models are shown after every epoch of training for a fixed set of noise inputs, so the content of the generated images changes frequently at the beginning and then stabilized over the course of training. Results from the model trained in Flexpoint are indistinguishable visually, and in fact inspecting the learning curve shows that convergence is unchanged.

Right: Real LSUN images; Left: Learning Curves in floating point and Flexpoint



Right: Images generated from Flex 16+5 model; Left: Images generated from 32-bit floating point model


As far as we know GANs have not yet received any attention in reduced bit-width deep learning, yet we can train them without having to make changes to the model or our (simulated) hardware. The code we used for training this model in 32 and 16 bit floating point (although unfortunately not the Flexpoint simulator tools) is open source and available on GitHub as part of our neon examples.

“Training Generative Adversarial Networks in Flexpoint” was written by Urs Köster and Xin Wang.

Related Blog Posts

Adaptable Deep Learning Solutions with nGraph™ Compiler and ONNX*

Artificial intelligence methods and deep learning techniques based on neural networks continue to gain adoption in more industries. As neural networks’ architectures grow in complexity, they gain new capabilities, and the number of possible solutions for which they may be used also grows at an increasing rate.  With so many constantly-changing variables at play, finding…

Read more

#Technical Blog

Compressing Deep Learning Models with Neural Network Distiller

Deep Learning (DL) and Artificial Intelligence (AI) are quickly becoming ubiquitous. Naveen Rao, Intel's Artificial Intelligence Products Group's GM, recently stated that "there is a vast explosion of [AI] applications," and Andrew Ng calls AI “the new electricity”. Deep learning applications already exist in the cloud, home, car, our mobile devices, and various embedded IoT…

Read more

#Technical Blog

Using Intel® Xeon® for Multi-node Scaling of TensorFlow* with Horovod*

TensorFlow* is one of the leading deep learning and machine learning frameworks today. Earlier in 2017, Intel worked with Google* to incorporate optimizations for Intel® Xeon® processor-based platforms using Intel® Math Kernel Library (Intel® MKL) [1].  Optimizations such as these with multiple popular frameworks have led to orders of magnitude improvement in performance. Intel has…

Read more

#Technical Blog