Using Convolutional Neural Network to classify 3D voxelized Point-clouds on a Neural Compute Stick


Performing object recognition on 3D point-cloud occluded volumes depicting real-world scenes containing ubiquitous objects is an important problem in the computer vision field. It presents multiple challenges, including

Significant memory requirements for these volumetric representations

•Real-time recognition of objects in mobile, power-constrained or autonomous.

CNNs have been shown to be powerful classification tools for multiple real-world computer vision tasks. The lack of readily available training data and memory requirements are two of the factors hindering the training and accuracy performance of 3D CNN.

In this work

•a 3D sexaquaternary-based voxelized point-cloud dataset is created containing 10 different 3D objects associated with different scenes. This dataset can minimize the memory footprint as well as increase the efficiency of CNN performance.

•In order to cope high computational power with low-power supply requirements and low-energy consumption levels for real-time applications, the CNN model trained with our dataset is ported to a very low power and low cost Fathom NCS based on Myriad2 MA2450 VPU.

3D objects + Scene

VOLA Representation

VOLA uses a one bit per voxel format to compress the volume contents. The intended use case for one bit per voxel representation is representing occupancy, i.e. indicating if a voxel is either completely solid or completely empty

Data Generation

Data Format

3D CNN Architecture

3D CNN architecture layout for Objects with background. The input is 64×64×64 with VOLA format with each voxel represent 1 bit. The input passes through 3 convolutional layers and 1 fully connected layer. The kernels used for convolutional layers are 8 × 8 × 8 with stride 1. The output from fully connected layer is 10 units. 

CNN Inference on Neural Compute Stick

The Neural Compute Stick (NCS) is a low-cost and low-power USB device based on Myriad2 MA2450 VPU. It supports loading networks designed and trained on the common deep learning framework Caffe.

The average run-time for the light-weight CNN to perform over 1000 inferences on 64x64x64 voxelizedpoint-clouds using Fathom NCS powered by Myriad2 MA2450, is only 11ms with 12 Streaming Hybrid Architecture Vector Engines (SHAVEs) require only 1.2 Watt per inference.

Related Videos