Intel AI showcased at Neural Information Processing Systems (NIPS)

Dec 04, 2017

Author Bio Image

Jessica Rosenthal

Sr. Content Marketing and Creative Lead, Artificial Intelligence Products Group

Today marks the beginning of the thirty-first annual conference on Neural Information Processing Systems (NIPS 2017), an interdisciplinary conference that brings together researchers in all aspects of neural and statistical information processing and computation, and their applications. The Intel AI team will be presenting publications and posters along with numerous workshops throughout the week.

 


Poster Sessions

Mon, Dec 4th 12:20 pm – 2:00 pm | WiML poster session / Pacific Ballroom
Sparse 3D Convolutional Networks for Efficient Object Classification
Xiaofan Xu (Movidius Machine Learning Group, Intel) and Ananya Gupta (University of Manchester)

Currently, most 3D CNNs are composed of multiple 3D convolutional layers with millions of parameters. Based on the premise that 3D data is inherently sparse, we believe that the learnt features should be sparse as well in order to represent the data. Our work on sparsifying the weights and kernels of these 3D networks aims to make them more efficient hence leading to a faster inference process.

Mon, Dec 4th 6:30 pm – 10:30 pm | Pacific Ballroom #75
Flexpoint: An Adaptive Numerical Format for Efficient Training of Deep Neural Networks
Arjun K Bansal (Intel), William Constable (Intel), Oguz Elibol (Intel), Stewart Hall (Intel), Luke Hornof (Intel), Amir Khosrowshahi (Intel), Carey Kloss (Intel), Urs Köster (Intel), Marcel Nassar (Intel), Naveen Rao (Intel), Xin Wang (Intel), Tristan Webb (Intel)

Deep neural networks are commonly developed and trained in 32-bit floating point format. Significant gains in performance and energy efficiency could be realized by training and inference in numerical formats optimized for deep learning. Despite advances in limited precision inference in recent years, training of neural networks in low bit-width remains a challenging problem. Here we present the Flexpoint data format, aiming at a complete replacement of 32-bit floating point format training and inference, designed to support modern deep network topologies without modifications. Flexpoint tensors have a shared exponent that is dynamically adjusted to minimize overflows and maximize available dynamic range. We validate Flexpoint by training AlexNet, a deep residual network and a generative adversarial network, using a simulator implemented with the neon deep learning framework. We demonstrate that 16-bit Flexpoint closely matches 32-bit floating point in training all three models, without any need for tuning of model hyperparameters. Our results suggest Flexpoint as a promising numerical format for future hardware for training and inference.

Tues, Dec 5th 6:30 pm – 10:30 pm | Pacific Ballroom #122
Learning to Inpaint for Image Compression
Mohammad Haris Baig (Dartmouth College), Vladlen Koltun (Intel Labs), Lorenzo Torresani (Dartmouth)

We study the design of deep architectures for lossy image compression. We present two architectural recipes in the context of multi-stage progressive encoders and empirically demonstrate their importance on compression performance. Specifically, we show that: (a) predicting the original image data from residuals in a multi-stage progressive architecture facilitates learning and leads to improved performance at approximating the original content and (b) learning to inpaint (from neighboring image pixels) before performing compression reduces the amount of information that must be stored to achieve a high-quality approximation. Incorporating these design choices in a baseline progressive encoder yields an average reduction of over 60% in file size with similar quality compared to the original residual encoder.

Tues, Dec 5th 6:30 pm – 10:30 pm | Pacific Ballroom #138
Geometric Matrix Completion with Recurrent Multi-Graph Neural Networks
Federico Monti (Università della Svizzera italiana) · Michael Bronstein (USI Lugano / Tel Aviv University / Intel) · Xavier Bresson (NTU)

Matrix completion models are among the most common formulations of recommender systems. Recent works have showed a boost of performance of these techniques when introducing the pairwise relationships between users/items in the form of graphs, and imposing smoothness priors on these graphs. However, such techniques do not fully exploit the local stationarity structures of user/item graphs, and the number of parameters to learn is linear w.r.t. the number of users and items. We propose a novel approach to overcome these limitations by using geometric deep learning on graphs. Our matrix completion architecture combines graph convolutional neural networks and recurrent neural networks to learn meaningful statistical graph-structured patterns and the non-linear diffusion process that generates the known ratings. This neural network system requires a constant number of parameters independent of the matrix size. We apply our method on both synthetic and real datasets, showing that it outperforms state-of-the-art techniques.

Thurs, Dec 7th 4:15 pm – 4:35 pm |Hall A (Deep Reinforcement Learning Symposium)
RAIL: Risk-Averse Imitation Learning
Anirban Santara (IIT Kharagpur), Abhishek Naik (IIT Madras), Prof. Balaraman Ravindran (IIT Madras), Dipankar Das (Intel Labs, India), Dheevatsa Mudigere (Intel Labs, India), Sasikanth Avancha (Intel Labs, India), Bharat Kaul (Intel Labs, India)

Imitation learning algorithms learn viable policies by imitating an expert’s behavior when reward signals are not available. Generative Adversarial Imitation Learning (GAIL) is a state-of-the-art algorithm for learning policies when the expert’s behavior is available as a fixed set of trajectories. We evaluate in terms of the expert’s cost function and observe that the distribution of trajectory-costs is often more heavy-tailed for GAIL-agents than the expert at a number of benchmark continuous-control tasks. Thus, high-cost trajectories, corresponding to tail-end events of catastrophic failure, are more likely to be encountered by the GAIL- agents than the expert. This makes the reliability of GAIL-agents questionable when it comes to deployment in risk-sensitive applications like robotic surgery and autonomous driving. In this work, we aim to minimize the occurrence of tail-end events by minimizing tail risk within the GAIL framework. We quantify tail risk by the Conditional-Value-at-Risk (CV aR) of trajectories and develop the Risk-Averse Imitation Learning (RAIL) algorithm. We observe that the policies learned with RAIL show lower tail-end risk than those of vanilla GAIL. Thus the proposed RAIL algorithm appears as a potent alternative to GAIL for improved reliability in risk-sensitive applications.

Fri, Dec 8th 8:00 am – 8:45 pm | 103 A+B (Competition Track)
Anil Thomas (Intel), Oguz Elibol (Intel)

Showcasing Intel AI Lab’s 3rd place winning work in the Defense Against Adversarial Attack competition and our 4th place winning work in the Targeted Adversarial Attack competition.

Fri, Dec 8th 10:20 am – 10:50 am | 104 A (Machine Learning for Health (ML4H) Workshop)
Wrist Sensor Fusion Enables Robust Gait Quantification Across Walking Scenarios
Zeev Waks (Intel), Itzik Mazeh (Intel), Chen Admati (Intel), Michal Afek (Intel), Yonatan Dolan (Intel), Avishai Wagner (Intel)

Quantifying step abundance via single wrist-worn accelerometers is a common approach for encouraging active lifestyle and tracking disease status. Nonetheless, step counting accuracy can be hampered by fluctuations in walking pace or demeanor. Here, we assess whether the use of various sensor fusion techniques, each combining bilateral wrist accelerometer data, may increase step count robustness. By collecting data from 27 healthy subjects, we find that high-level step fusion leads to substantially improved accuracy across diverse walking scenarios. Gait cycle analysis illustrates that wrist devices can recurrently detect steps proximal to toe-off events. Collectively, our study suggests that dual-wrist sensor fusion may enable robust gait quantification in free-living environments.

Fri, Dec 8th 11:45 am – 12:30 pm | Grand Ballroom A
Sequence modeling using a memory controller extension for LSTM
Itamar Ben-Ari (Intel), Ravid Shwartz-Ziv (Intel)

The Long Short-term Memory (LSTM) recurrent neural network is a powerful model for time series forecasting and various temporal tasks. In this work we extend the standard LSTM architecture by augmenting it with an additional gate which produces a memory control vector signal inspired by the Differentiable Neural Computer (DNC) model. This vector is fed back to the LSTM instead of the original output prediction. By decoupling the LSTM prediction from its role as a memory controller we allow each output to specialize in its own task. The result is that our LSTM prediction is dependent on its memory state and not the other way around (as in standard LSTM). We demonstrate our architecture on two time-series forecast tasks and show that our model achieves up to 8% lower loss than the standard LSTM model.

Sat, December 9th , 5:00 pm- 7:00 pm | Hall C (Bayesian Deep Learning)
Unsupervised Deep Structure Learning by Recursive Independence Testing
Raanan Y. Yehezkel Rohekar (Intel), Guy Koren (Intel), Shami Nisimov (Intel), Gal Novik (Intel)

We introduce a principled approach for unsupervised structure learning of deep, feed-forward, neural networks. We propose a new interpretation for depth and inter-layer connectivity where conditional independencies in the input distribution are encoded hierarchically in the network structure. Neurons in deeper layersencode low-order (small condition sets) independencies and have a wide scope of the input, whereas neurons in the first layers encode higher-order (larger conditionsets) independencies and have a narrower scope. Thus, the depth of the network is equal to the maximal order of independence in the input distribution. Moreover, this results in structures allowing neurons to connect to neurons in any deeper layer, skipping intermediate layers. The proposed algorithm constructs three main graphs: 1) a deep generative-latent-graph, learned recursively from data using a conditional independence oracle, 2) a stochastic inverse, and 3) a discriminative graph constructed from the stochastic inverse. We prove that conditional-dependency relations in the learned generative latent graph are preserved in both, the stochastic inverse and the class-conditional discriminative graph. Finally, a deep neural net-work structure is constructed from the discriminative graph. We demonstrate on image classification benchmarks that the deepest layers (convolutional and denselayers) of common convolutional networks can be replaced by significantly smaller learned structures, achieving high classification accuracy. Our structure learning algorithm requires a small computational cost and runs efficiently on a standard desktop CPU.

 


Workshops

Mon, Dec 4th 11:00 am – 4:45 pm & Thurs, Dec 7th 12:45 pm – 4:35 pm  | Long Beach Convention Center, Room 104
Women in Machine Learning
This technical workshop gives female faculty, research scientists, and graduate students in the machine learning community an opportunity to meet, network and exchange ideas, participate in career-focused panel discussions with senior women in industry and academia and learn from each other.

Thurs, Dec 7th 4:15 pm – 4:35 pm | Long Beach Convention Center, Room 104
Representation Learning in Large Attributed Graphs
Graphs (networks) are ubiquitous and allow us to model entities (nodes) and the dependencies (edges) between them. Graph data is often observed directly in the natural world (e.g., biological or social networks) or constructed from non-relational data by deriving a metric space between entities and retaining only the most significant edges. Learning a useful feature representation from graph data lies at the heart and success of many machine learning tasks such as classification, anomaly detection, link prediction, among many others. Many existing techniques use random walks as a basis for learning features or estimating the parameters of a graph model for a downstream prediction task. Examples include recent node embedding methods such as Deep-Walk, node2vec, as well as graph-based deep learning algorithms. However, these approaches are inherently transductive and do not generalize to unseen nodes and other graphs. Furthermore, most of these approaches lack support for rich graph data with attributes and structural features. In this work, we discuss a generic framework for inductive network representation learning based on the notion of attributed random walks. This is achieved by learning functions that generalize to new nodes and graphs. We show that our proposed framework is more accurate on a variety of graphs.

Fri, Dec 8th 6:30 pm | Renaissance Hotel
Intel® Movidius™ Neural Compute Stick
Learn how to use the Intel® Movidius™ Neural Compute Stick to deploy deep neural networks at the edge.  

Fri, Dec 8th 6:30 pm – 10:30 pm | 104C
NERSC Workshop on Deep Learning for Physical Sciences
The Deep Learning for Physical Sciences (DLPS) workshop invites researchers to contribute papers that demonstrate progress in the application of machine and deep learning techniques to real-world problems in physical sciences (including the fields and subfields of astronomy, chemistry, Earth science, and physics).

Sat, Dec 9th 9:00 am – 6:00 pm | Grand Ballroom A
Hierarchical RL Workshop
The goal of this workshop is to improve cohesion and synergy among the research community and increase its impact by promoting better understanding of the challenges and potential of HRL. This workshop further aims to bring together researchers studying both theoretical and practical aspects of HRL, for a joint presentation, discussion, and evaluation of some of the numerous novel approaches to HRL developed in recent years.

Sat, December 9th , 5:00 pm- 7:00 pm | Hall C
Bayesian Deep Learning Workshop
While deep learning has been revolutionary for machine learning, most modern deep learning models cannot represent their uncertainty nor take advantage of the well studied tools of probability theory. This has started to change following recent developments of tools and techniques combining Bayesian approaches with deep learning. The intersection of the two fields has received great interest from the community over the past few years, with the introduction of new deep learning models that take advantage of Bayesian techniques, as well as Bayesian models that incorporate deep learning elements [1-11]. In fact, the use of Bayesian techniques in deep learning can be traced back to the 1990s’, in seminal works by Radford Neal [12], David MacKay [13], and Dayan et al. [14]. These gave us tools to reason about deep models’ confidence, and achieved state-of-the-art performance on many tasks. However earlier tools did not adapt when new needs arose (such as scalability to big data), and were consequently forgotten. Such ideas are now being revisited in light of new advances in the field, yielding many exciting new results.


Spotlighted Demos

Coach is a research framework for developing reinforcement learning algorithms targeted at researchers and students. It features 17 built in state-of-the-art algorithms, and a modular design that allows easy development of new ones by reusing different building blocks for developing new and more complex agents.

 

Natural Language Processing Enabled by Intel® Nervana™ Platform
Intel provides an open and flexible framework for ML researchers to create models and solutions for NLP. Our researchers have created NLP components and libraries for use by Intel customers and internal R&D.

 

Accelerating Deep Workloads on IA
BigDL is a distributed deep learning library for Apache Spark*. Write your deep learning applications as standard Apache Spark programs and run directly on top of existing Apache Spark or Hadoop* clusters. BigDL provides better TCO than bringing up and hosting a dedicated GPU cluster.
We are also showcasing optimized deep learning frameworks (Caffe, Tensorflow, MXNet, neon™, Theano) and performance libraries for Deep Neural Networks (clDNN), Intel Math Kernel Library for Deep Neural Networks (Intel MKL-DNN), Intel Data Analytics Acceleration library (Intel DAAL), and Python.

 

The Intel® Nervana™ Neural Network Processor (NNP) is a new architecture built from the ground up for neural networks. The goal of this new architecture is to provide the needed flexibility to support all deep learning primitives while making core hardware components as efficient as possible. The NNP gives neural network designers powerful tools for solving larger and more difficult problems. It minimizes data movement and maximizes data re-use in order to leverage all computational resources.

 

SSD detects objects in images using a single deep neural network. MobileNets are a series of networks that are optimized to run on mobile devices. Single Shot Multibox Detector (SSD) with MobileNets as the feature extractor enables a scalable object detection solution for mobile form factors. This demo demonstrates the MobileNet-SSD accelerated on the Movidius Neural Compute Stick (NCS). Several images (from disk or camera) are sent to the application which runs them through the MobileNet-SSD loaded onto the NCS. The output will be show the neural network detecting and classifying (using MobileNet) various objects in that image. These objects are then displayed with bounding boxes, identified classification and prediction
accuracy.

 

Intel AI partnered with the NASA Frontier Lab Development team to collaborate on the AI space resource exploration mission challenge: Lunar Water and Volatiles. The purpose of the challenge was to use AI to determine the location and most promising access points for vital lunar water, in terms of cost effectiveness and engineering constraints. Water on the moon is typically found near the poles of the moon. The main challenge with imagery from the poles is that most of the lunar water is in permanently shadowed regions that lack proper illumination. This also introduces issues with co-registration and artifacts in the imagery. In order to help with exploration of the poles, craters are a common feature that can be used to help with mapping these difficult areas. Crater detection has typically been a manual process and with the help of AI can be transformed into an automated process.
We performed object localization/classification for lunar crater detection using our implementation of the Single Shot Detector Multibox Detector (SSD) in neon™ .

 

Scaling Low Power AI on NCS and Raspberry Pi: Multi-stick Demo
Developers probably wonder whether you can plug multiple Neural Compute Sticks into a hub and run them all simultaneously to further speed-up a neural network that’s processing successive frames of video – and the answer is a resounding ‘yes’! Here we demonstrate 4 NCS sticks in a hub, operating a neural network recognizing objects as the 4 sticks are successively processing images to recognize the objects in those images. This demo showcases the scalability when prototyping neural networks using NCS.

 

SigOpt is an Optimization-as-a-Service platform that seamlessly tunes AI and ML model configuration parameters via an ensemble of optimization algorithms behind a simple API. This results in captured performance that may otherwise be left on the table by conventional techniques while also reducing the time and cost for developing and optimizing new models. Our demo shows how you can tune the hyperparameters and architecture of AI models better and faster than standard techniques by using SigOpt and Intel® Nervana™ Platform.

 


News Updates

At the Intel AI NIPS 2017 Afterparty, we had our first public unveiling of the Intel Nervana NNP.


See these demos and more at Booth 209. We will also have our AI experts and recruiting representatives on-hand to answer your questions.

Interested in a career? Learn more about Intel AI’s openings here.

Author Bio Image

Jessica Rosenthal

Sr. Content Marketing and Creative Lead, Artificial Intelligence Products Group

Related Blog Posts

Artificial Intelligence Conference

Intel® Nervana™ and O’Reilly Host the Inaugural AI Conference in San Francisco

Intel® Nervana™ and O’Reilly co-presented the inaugural Artificial Intelligence Conference in San Francisco on September 17th. The conference spotlighted keynotes from luminaries such as Andrew Ng (Coursera), Peter Norvig (Google), Carlos Guestrin (Apple), Michael Jordan (UC Berkeley), Jia Li (Google) and Naveen Rao (Intel Nervana). Uber, Apple, Nvidia, IBM, Facebook, Amazon, Microsoft, Google, and Baidu[1]…

Read more

#Events

Intel Demonstrates Latest AI & Computer Vision Tech at CVPR

The 2017 Conference on Computer Vision and Pattern Recognition (CVPR 2017) will be taking place on July 21-26th in Honolulu, Hawaii. CVPR is known as the premier annual computer vision event consisting of poster sessions, co-located workshops, and tutorials. Intel will have strong presence at the event through its Intel® Nervana™ Platinum Sponsorship, accepted research papers, EXPO…

Read more

#Events

#IntelAI Day

On November 17th, 2016, Intel hosted its first ever “AI Day” at Bespoke in San Francisco. #IntelAI soared to the top of Twitter’s trending hashtags as nearly 500 people piled in to watch Intel’s Brian Krzanich (CEO), Diane Bryant (EVP & GM Data Center Group), Doug Fisher (SVP & GM Software), and Doug Davis (SV IOTG) deliver…

Read more

#Events