Intel AI showcased at Neural Information Processing Systems (NIPS)
Dec 04, 2017
Dec 04, 2017
Mon, Dec 4th 12:20 pm – 2:00 pm | WiML poster session / Pacific Ballroom
Sparse 3D Convolutional Networks for Efficient Object Classification
Xiaofan Xu (Movidius Machine Learning Group, Intel) and Ananya Gupta (University of Manchester)
Currently, most 3D CNNs are composed of multiple 3D convolutional layers with millions of parameters. Based on the premise that 3D data is inherently sparse, we believe that the learnt features should be sparse as well in order to represent the data. Our work on sparsifying the weights and kernels of these 3D networks aims to make them more efficient hence leading to a faster inference process.
Mon, Dec 4th 6:30 pm – 10:30 pm | Pacific Ballroom #75
Flexpoint: An Adaptive Numerical Format for Efficient Training of Deep Neural Networks
Arjun K Bansal (Intel), William Constable (Intel), Oguz Elibol (Intel), Stewart Hall (Intel), Luke Hornof (Intel), Amir Khosrowshahi (Intel), Carey Kloss (Intel), Urs Köster (Intel), Marcel Nassar (Intel), Naveen Rao (Intel), Xin Wang (Intel), Tristan Webb (Intel)
Deep neural networks are commonly developed and trained in 32-bit floating point format. Significant gains in performance and energy efficiency could be realized by training and inference in numerical formats optimized for deep learning. Despite advances in limited precision inference in recent years, training of neural networks in low bit-width remains a challenging problem. Here we present the Flexpoint data format, aiming at a complete replacement of 32-bit floating point format training and inference, designed to support modern deep network topologies without modifications. Flexpoint tensors have a shared exponent that is dynamically adjusted to minimize overflows and maximize available dynamic range. We validate Flexpoint by training AlexNet, a deep residual network and a generative adversarial network, using a simulator implemented with the neon deep learning framework. We demonstrate that 16-bit Flexpoint closely matches 32-bit floating point in training all three models, without any need for tuning of model hyperparameters. Our results suggest Flexpoint as a promising numerical format for future hardware for training and inference.
Tues, Dec 5th 6:30 pm – 10:30 pm | Pacific Ballroom #122
Learning to Inpaint for Image Compression
Mohammad Haris Baig (Dartmouth College), Vladlen Koltun (Intel Labs), Lorenzo Torresani (Dartmouth)
We study the design of deep architectures for lossy image compression. We present two architectural recipes in the context of multi-stage progressive encoders and empirically demonstrate their importance on compression performance. Specifically, we show that: (a) predicting the original image data from residuals in a multi-stage progressive architecture facilitates learning and leads to improved performance at approximating the original content and (b) learning to inpaint (from neighboring image pixels) before performing compression reduces the amount of information that must be stored to achieve a high-quality approximation. Incorporating these design choices in a baseline progressive encoder yields an average reduction of over 60% in file size with similar quality compared to the original residual encoder.
Tues, Dec 5th 6:30 pm – 10:30 pm | Pacific Ballroom #138
Geometric Matrix Completion with Recurrent Multi-Graph Neural Networks
Federico Monti (Università della Svizzera italiana) · Michael Bronstein (USI Lugano / Tel Aviv University / Intel) · Xavier Bresson (NTU)
Matrix completion models are among the most common formulations of recommender systems. Recent works have showed a boost of performance of these techniques when introducing the pairwise relationships between users/items in the form of graphs, and imposing smoothness priors on these graphs. However, such techniques do not fully exploit the local stationarity structures of user/item graphs, and the number of parameters to learn is linear w.r.t. the number of users and items. We propose a novel approach to overcome these limitations by using geometric deep learning on graphs. Our matrix completion architecture combines graph convolutional neural networks and recurrent neural networks to learn meaningful statistical graph-structured patterns and the non-linear diffusion process that generates the known ratings. This neural network system requires a constant number of parameters independent of the matrix size. We apply our method on both synthetic and real datasets, showing that it outperforms state-of-the-art techniques.
Thurs, Dec 7th 4:15 pm – 4:35 pm |Hall A (Deep Reinforcement Learning Symposium)
RAIL: Risk-Averse Imitation Learning
Anirban Santara (IIT Kharagpur), Abhishek Naik (IIT Madras), Prof. Balaraman Ravindran (IIT Madras), Dipankar Das (Intel Labs, India), Dheevatsa Mudigere (Intel Labs, India), Sasikanth Avancha (Intel Labs, India), Bharat Kaul (Intel Labs, India)
Imitation learning algorithms learn viable policies by imitating an expert’s behavior when reward signals are not available. Generative Adversarial Imitation Learning (GAIL) is a state-of-the-art algorithm for learning policies when the expert’s behavior is available as a fixed set of trajectories. We evaluate in terms of the expert’s cost function and observe that the distribution of trajectory-costs is often more heavy-tailed for GAIL-agents than the expert at a number of benchmark continuous-control tasks. Thus, high-cost trajectories, corresponding to tail-end events of catastrophic failure, are more likely to be encountered by the GAIL- agents than the expert. This makes the reliability of GAIL-agents questionable when it comes to deployment in risk-sensitive applications like robotic surgery and autonomous driving. In this work, we aim to minimize the occurrence of tail-end events by minimizing tail risk within the GAIL framework. We quantify tail risk by the Conditional-Value-at-Risk (CV aR) of trajectories and develop the Risk-Averse Imitation Learning (RAIL) algorithm. We observe that the policies learned with RAIL show lower tail-end risk than those of vanilla GAIL. Thus the proposed RAIL algorithm appears as a potent alternative to GAIL for improved reliability in risk-sensitive applications.
Fri, Dec 8th 8:00 am – 8:45 pm | 103 A+B (Competition Track)
Anil Thomas (Intel), Oguz Elibol (Intel)
Fri, Dec 8th 10:20 am – 10:50 am | 104 A (Machine Learning for Health (ML4H) Workshop)
Wrist Sensor Fusion Enables Robust Gait Quantification Across Walking Scenarios
Zeev Waks (Intel), Itzik Mazeh (Intel), Chen Admati (Intel), Michal Afek (Intel), Yonatan Dolan (Intel), Avishai Wagner (Intel)
Quantifying step abundance via single wrist-worn accelerometers is a common approach for encouraging active lifestyle and tracking disease status. Nonetheless, step counting accuracy can be hampered by fluctuations in walking pace or demeanor. Here, we assess whether the use of various sensor fusion techniques, each combining bilateral wrist accelerometer data, may increase step count robustness. By collecting data from 27 healthy subjects, we find that high-level step fusion leads to substantially improved accuracy across diverse walking scenarios. Gait cycle analysis illustrates that wrist devices can recurrently detect steps proximal to toe-off events. Collectively, our study suggests that dual-wrist sensor fusion may enable robust gait quantification in free-living environments.
Fri, Dec 8th 11:45 am – 12:30 pm | Grand Ballroom A
Sequence modeling using a memory controller extension for LSTM
Itamar Ben-Ari (Intel), Ravid Shwartz-Ziv (Intel)
The Long Short-term Memory (LSTM) recurrent neural network is a powerful model for time series forecasting and various temporal tasks. In this work we extend the standard LSTM architecture by augmenting it with an additional gate which produces a memory control vector signal inspired by the Differentiable Neural Computer (DNC) model. This vector is fed back to the LSTM instead of the original output prediction. By decoupling the LSTM prediction from its role as a memory controller we allow each output to specialize in its own task. The result is that our LSTM prediction is dependent on its memory state and not the other way around (as in standard LSTM). We demonstrate our architecture on two time-series forecast tasks and show that our model achieves up to 8% lower loss than the standard LSTM model.
Sat, December 9th , 5:00 pm- 7:00 pm | Hall C (Bayesian Deep Learning)
Unsupervised Deep Structure Learning by Recursive Independence Testing
Raanan Y. Yehezkel Rohekar (Intel), Guy Koren (Intel), Shami Nisimov (Intel), Gal Novik (Intel)
We introduce a principled approach for unsupervised structure learning of deep, feed-forward, neural networks. We propose a new interpretation for depth and inter-layer connectivity where conditional independencies in the input distribution are encoded hierarchically in the network structure. Neurons in deeper layersencode low-order (small condition sets) independencies and have a wide scope of the input, whereas neurons in the first layers encode higher-order (larger conditionsets) independencies and have a narrower scope. Thus, the depth of the network is equal to the maximal order of independence in the input distribution. Moreover, this results in structures allowing neurons to connect to neurons in any deeper layer, skipping intermediate layers. The proposed algorithm constructs three main graphs: 1) a deep generative-latent-graph, learned recursively from data using a conditional independence oracle, 2) a stochastic inverse, and 3) a discriminative graph constructed from the stochastic inverse. We prove that conditional-dependency relations in the learned generative latent graph are preserved in both, the stochastic inverse and the class-conditional discriminative graph. Finally, a deep neural net-work structure is constructed from the discriminative graph. We demonstrate on image classification benchmarks that the deepest layers (convolutional and denselayers) of common convolutional networks can be replaced by significantly smaller learned structures, achieving high classification accuracy. Our structure learning algorithm requires a small computational cost and runs efficiently on a standard desktop CPU.
Mon, Dec 4th 11:00 am – 4:45 pm & Thurs, Dec 7th 12:45 pm – 4:35 pm | Long Beach Convention Center, Room 104
Women in Machine Learning
This technical workshop gives female faculty, research scientists, and graduate students in the machine learning community an opportunity to meet, network and exchange ideas, participate in career-focused panel discussions with senior women in industry and academia and learn from each other.
Thurs, Dec 7th 4:15 pm – 4:35 pm | Long Beach Convention Center, Room 104
Representation Learning in Large Attributed Graphs
Graphs (networks) are ubiquitous and allow us to model entities (nodes) and the dependencies (edges) between them. Graph data is often observed directly in the natural world (e.g., biological or social networks) or constructed from non-relational data by deriving a metric space between entities and retaining only the most significant edges. Learning a useful feature representation from graph data lies at the heart and success of many machine learning tasks such as classification, anomaly detection, link prediction, among many others. Many existing techniques use random walks as a basis for learning features or estimating the parameters of a graph model for a downstream prediction task. Examples include recent node embedding methods such as Deep-Walk, node2vec, as well as graph-based deep learning algorithms. However, these approaches are inherently transductive and do not generalize to unseen nodes and other graphs. Furthermore, most of these approaches lack support for rich graph data with attributes and structural features. In this work, we discuss a generic framework for inductive network representation learning based on the notion of attributed random walks. This is achieved by learning functions that generalize to new nodes and graphs. We show that our proposed framework is more accurate on a variety of graphs.
Fri, Dec 8th 6:30 pm | Renaissance Hotel
Intel® Movidius™ Neural Compute Stick
Learn how to use the Intel® Movidius™ Neural Compute Stick to deploy deep neural networks at the edge.
Fri, Dec 8th 6:30 pm – 10:30 pm | 104C
NERSC Workshop on Deep Learning for Physical Sciences
The Deep Learning for Physical Sciences (DLPS) workshop invites researchers to contribute papers that demonstrate progress in the application of machine and deep learning techniques to real-world problems in physical sciences (including the fields and subfields of astronomy, chemistry, Earth science, and physics).
Sat, Dec 9th 9:00 am – 6:00 pm | Grand Ballroom A
Hierarchical RL Workshop
The goal of this workshop is to improve cohesion and synergy among the research community and increase its impact by promoting better understanding of the challenges and potential of HRL. This workshop further aims to bring together researchers studying both theoretical and practical aspects of HRL, for a joint presentation, discussion, and evaluation of some of the numerous novel approaches to HRL developed in recent years.
Sat, December 9th , 5:00 pm- 7:00 pm | Hall C
Bayesian Deep Learning Workshop
While deep learning has been revolutionary for machine learning, most modern deep learning models cannot represent their uncertainty nor take advantage of the well studied tools of probability theory. This has started to change following recent developments of tools and techniques combining Bayesian approaches with deep learning. The intersection of the two fields has received great interest from the community over the past few years, with the introduction of new deep learning models that take advantage of Bayesian techniques, as well as Bayesian models that incorporate deep learning elements [1-11]. In fact, the use of Bayesian techniques in deep learning can be traced back to the 1990s’, in seminal works by Radford Neal , David MacKay , and Dayan et al. . These gave us tools to reason about deep models’ confidence, and achieved state-of-the-art performance on many tasks. However earlier tools did not adapt when new needs arose (such as scalability to big data), and were consequently forgotten. Such ideas are now being revisited in light of new advances in the field, yielding many exciting new results.
At the Intel AI NIPS 2017 Afterparty, we had our first public unveiling of the Intel Nervana NNP.
Intel® Nervana™ and O’Reilly co-presented the inaugural Artificial Intelligence Conference in San Francisco on September 17th. The conference spotlighted keynotes from luminaries such as Andrew Ng (Coursera), Peter Norvig (Google), Carlos Guestrin (Apple), Michael Jordan (UC Berkeley), Jia Li (Google) and Naveen Rao (Intel Nervana). Uber, Apple, Nvidia, IBM, Facebook, Amazon, Microsoft, Google, and Baidu…
The 2017 Conference on Computer Vision and Pattern Recognition (CVPR 2017) will be taking place on July 21-26th in Honolulu, Hawaii. CVPR is known as the premier annual computer vision event consisting of poster sessions, co-located workshops, and tutorials. Intel will have strong presence at the event through its Intel® Nervana™ Platinum Sponsorship, accepted research papers, EXPO…
On November 17th, 2016, Intel hosted its first ever “AI Day” at Bespoke in San Francisco. #IntelAI soared to the top of Twitter’s trending hashtags as nearly 500 people piled in to watch Intel’s Brian Krzanich (CEO), Diane Bryant (EVP & GM Data Center Group), Doug Fisher (SVP & GM Software), and Doug Davis (SV IOTG) deliver…
Keep tabs on all the latest news with our monthly newsletter.