Myriad™ X: Evolving low power VPUs for Deep Neural Networks

MaryT_Intel · ‎12-20-2019

Today saw the announcement of Intel’s newest VPU: Myriad™ X. As a specialized Vision Processing Unit, Myriad X contributes to developing and deploying advanced AI applications by being able to understand and decision on visual input data. In this third generation VPU, we’ve implemented an all-new accelerator dedicated to running neural networks at the edge – we call this the Neural Compute Engine. With power consumption hovering around 1 Watt, Myriad X delivers previously unseen levels of DNN performance while remaining efficient enough to run inside of compact devices such as smartphones, drones, wearable devices and smart security cameras. Myriad X joins the Myriad VPU family next to the Myriad 2 VPU which broke new ground in terms of delivering DNN compute for untethered, small form factor devices. With the huge amount of innovation going on inside the intel® Nervana team in the space of deep learning, I wanted to take an opportunity to share a bit of information about how Intel® Movidius is complimenting this important work at the edge through Myriad™ X.

Having set the bar with Myriad 2, we wanted to take things to the next level with Myriad X. With competitors re-working incumbent CPU and GPU architectures with optimizations for deep neural networks, we realized that we needed to smash through current performance numbers by a large margin. Taking our learnings from our previous architectures, plus a wealth of experience gained from implementing actual DNN workloads for customers in commercial deployment, we set to work on the crowning feature of Myriad X: the Neural Compute Engine. The Neural Compute Engine is the first time anyone has implemented a dedicated DNN hardware accelerator on a VPU. While previous generations of our technology could utilize some individual accelerators for DNN performance, the Neural Compute Engine can run entire neural networks solely in hardware. When firing on all cylinders – SHAVE cores and Neural Compute Engine included, Myriad X delivers up to a trillion operations per second for deep neural networks - and remember, this is all from a chip designed to operate around a single Watt.

While the Neural Compute Engine is impressive, delivering high performance inference at low power isn’t quite as simple as putting a new block onto a chip. The Myriad X VPU brings together 4 other important design elements to achieve its blistering performance:

Flexible SHAVE Processors: the raw performance of Myriad’s SHAVE processors achieve the hundreds of GFLOPS compliments the neural compute engine’s hardware fixed-function acceleration. As deep neural network layer types and topologies evolve, the programmability of the SHAVE cores provide the balance between efficiency and future proofing.
Massively parallel central memory: deep neural networks create large volumes of intermediate data. Keeping all of this on chip enables our customers to vastly reduce the bandwidth that would otherwise create performance bottlenecks. Myriad X features a proprietary on-chip memory design that minimizes the cost of moving intermediate data – a crucial performance requirement as we see data transfer costs beginning to outstrip data compute costs from an energy perspective.
Flexible Precision: The Myriad X VPU has native support for mixed precision and hardware flexibility—the ability to run deep neural networks at low power is in part due to Myriad’s flexibility in terms of mixed precision support. Both 16 bit and 32 bit floating point datatypes, as well as u8 and unorm8 types are supported, allowing developers to find the perfect balance of accuracy and performance.
Optimized Libraries & Frameworks: We have been working on a development kit includes dedicated software libraries that go hand-in-hand with the architecture to support sustained performance on matrix multiplication and multidimensional convolution. In addition, we’ve introduced a tool that allows virtually automatic porting of trained PC models to Myriad based architectures.

Movidius has over a decade of experience working with the above heterogeneous approach to demanding workloads at low power. Solving these problems was no easy task, but we are delighted to share the latest fruits of our team’s hard work and expertise in the form of Myriad X. Furthermore, we can’t wait to see what delivering this kind of compute power in an energy efficient package means for the development of exciting new implementations of AI in new form factors and product categories.

As we see the Intel® Nervana™ team blazing a trail in optimizing machine learning workloads in the realm of training and cloud inference, we believe Myriad™ X provides a powerful compliment to the innovations occurring Intel’s artificial intelligence and machine learning teams, reinforcing Intel’s philosophy of the “virtuous cycle” of machine intelligence. With the introduction of the Myriad™ X VPU, we are confident standing by our claim that “If it’s smart and connected, it’s best with Intel®”.

You can learn more about the Myriad™ X VPU by visiting: https://software.intel.com/en-us/iot/hardware/vision-accelerator-movidius-vpu#specifications

Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations, and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products.