Technology

Enabling Machine Intelligence at High Performance & Low Power

Intel® Movidius™ VPUs are uniquely designed for high performance at ultra-low power for computer vision and AI workloads

Unique VPU Architecture

The design principles for Intel® Movidius™ Myriad™ VPUs follows from a careful balance of programmable vector-processors, dedicated hardware accelerators and memory architecture for optimized data flow. Myriad VPUs feature a software-controlled, multi-core, multi-ported memory subsystem and caches which can be configured to allow a large range of workloads. This proprietary technology allows for exceptionally high sustainable on-chip data and instruction bandwidth to support the an array of SHAVE processors, 2 CPUs and high-performance video hardware accelerators.

In order to guarantee sustained high performance and minimize power, the Movidius proprietary processor called SHAVE (Streaming Hybrid Architecture Vector Engine) contains wide and deep register-files coupled with a Variable-Length Long Instruction-Word (VLLIW) controlling multiple functional units including extensive SIMD capability for high parallelism and throughput at both a functional unit and processor level. The SHAVE processor is a hybrid stream processor architecture combining the best features of GPUs, DSPs and RISC with both 8/16/32 bit integer and 16/32 bit floating point arithmetic as well as unique features such as hardware support for sparse data structures. The architecture is designed to maximize performance-per-watt while maintaining ease of programmability, especially in terms of computer vision and machine learning workloads.

Our VPUs

Deep Neural Networks on Myriad™ VPUs

In order to deploy on-device Deep Learning applications, performance and precision at very low power are critical. Intel's Movidius™ Myriad™ VPU platform has a number of key elements suited to running deep neural networks.

  • Performance: the raw performance of Myriad’s SHAVE processor engines achieve the hundreds of GFLOPS required in fundamental matrix multiplication compute that’s required for deep learning networks of various topologies. 
  • On Chip RAM: deep neural networks create large volumes of intermediate data. Keeping all of this on chip enables our customers to vastly reduce the bandwidth that would otherwise create performance bottlenecks. 
  • Flexible Precision: Native Support for Mixed Precision and Hardware Flexibility—the ability to support Deep Learning networks with industry-leading performance at best-in-class power efficiency is supported by Myriad’s flexibility in terms of mixed precision support. Both 16 bit and 32 bit floating point datatypes, as well as u8 and unorm8 types are supported. Additionally, various hardware accelerators are utilized to provide the flexibility needed to achieve high performance for convolution computation.
  • High performance libraries: The development kit includes dedicated software libraries that go hand-in-hand with the architecture to support sustained performance on matrix multiplication and multidimensional convolution.
Read More

Related Videos