Vehicle counting with NM500 Neuromem | Andrew Whaley's Blog

Security and other geeky stuff.

Vehicle counting with NM500 Neuromem

Tags: ,

Introduction

I recently purchased a Nepes NM500 ‘Brilliant USB’ which is 4 x NM500 ‘Neuromorphic’ memory chips behind a USB interface. Nepes provide an SDK for Windows, Linux and Mac OS but unfortunately no realistic example code so I wanted to build something real and share my experiences.

nepes-brilliant-usb

 

What is the NM500 and why use it ?

So basically each chip is an FPGA implementation of a 576 node single layer, fully connected RBF neural network with a 256 byte input vector and a single 16 bit output vector. You can chain chips together to increase the number of neurons but the input and output vectors are always fixed at 256 bytes and 1 short respectively. You can segment the network into a number of ‘contexts’ allowing you to split it up into a number of sub-networks for example to interpret different sensors with different models. Given that the neural network is relatively simple compared with modern deep learning models, you might ask why bother with a hardware implementation ? The main advantages, as far as I can discern, are :-

  1. It’s extremely low power and so ideally suited to IoT / Arduino type of use cases where a software neural network would consume considerably more power and probably be slower running as software on a low power microcontroller.
  2. It’s auto-learning. Using a TPU or TensorflowLite limits you to not being able to train on device, instead you train on heavyweight hardware like GPU’s and then extract the TFL model which can only be used for inference at runtime. The NM500, allows you to mix inference and learning at runtime. The learning all happens instantly on chip without the need for eras of backpropagation. This is a very attractive feature.

But of course there are some limitations :-

  1. 256 input byte limitation e.g. 16 x 16 greyscale image
  2. Single short output vector typically used to indicate an inferred class

When you consider a typical object detection type application e.g. identifying people within a video feed, you typically have 5 outputs: object class, x coordinate, y coordinate, width and height. So clearly this is not possible with a single output vector. Also object detection typically uses many layers of convolutional and pooling layers, none of which you can achieve with this simple hardware. So let’s consider a relatively simple image recognition problem :-

 

Counting Cars passing my house

I have an IP camera outside of our house which I currently use with Darknet Yolo running on a Jetson Nano to detect people and record them approaching the property. Now I’m going to reuse the same RTSP video feed but instead use the NM500 to count cars passing the house either heading uphill or heading downhill. So this is an image classification problem. Firstly, here’s the feed from the camera which I can read with OpenCV :-

2020-08-06_15-58

Notice the small section at the end of the driveway ? I take a 160×160 pixel square at offset (420,, 0) i.e. about 40% along the top edge. Here’s an extract :-

roi_4_12

This extract shows a car heading down the hill which I want the system to classify as either :-

  • Nothing – i.e. empty road in all lighting conditions
  • People – one or more people wandering past (often with dogs :) )
  • Cars heading uphill
  • Cars heading downhill.

So we have four classes which we want our images to be classified into.

 

Image Processing

Before using the neural network we need to get our data into a useful 256 byte vector. As you’ve probably guessed that’s a 16 x 16 greyscale image. Our ‘Region Of Interest’ above is 160 x 160 of 24 bit RGB so we need to downscale and downsample it to that format :-

// Extract ROI
frame(Rect(420, 0, 160, 160)).copyTo(roi);
resize(roi, currentRoi, cv::Size(16,16));
cvtColor(currentRoi, currentRoi, cv::COLOR_RGB2GRAY);

so now our ‘currentRoi’ matrix contains the lowscale image which is suitable to pass directly to the NM500 hardware, however, we can make the task considerably easier. There’s still a lot of detail on this image which doesn’t help with the problem and therefore might require a lot of training data for the a good model to form.

Instead, we take two consecutive greyscale frames and ‘subtract’ them using opencv :-

// Calculate difference image - to show movement
absdiff(currentRoi, lastRoi, diff);

So now we end up with something like this :-

dif_4_14

with the white pixels indicating movement and the black pixels indicating no change from the previous frame. This gives us an ideal vector for the neural network with most background (uninteresting) data removed.

 

Classifying the Image

The Nepes SDK is pretty easy to use, so we can use the pixel data directly and feed it into hardware :-

nm_classify_req req;
memcpy(req.vector, diff.ptr(0), 256);
req.vector_size = 256;
req.k = NUM_CLASSES;

nm_classify(target, &req);
if (req.status == NM_CLASSIFY_IDENTIFIED) {
    cat = req.category[0];
}
else cat = 0;

Note that I’m just using memcpy to directly copy the image pointer into the request vector. The category is returned in the first array value.

 

Training the Network

The SDK provides a nm_learn function very like the classify function where you pass a vector and a class category and it associates the two. There are also functions for saving and loading the network state although I haven’t used these and just retrain the network from scratch at startup since it’s so fast. To make this easier, I created a function to load every image from folder and associate it with a class, then I can just add new images to fine tune the system :-

// Initialise the network with training images
train_images(target, "/home/awhaley/CLionProjects/CameraNM500/images/Nothing", CLASS_NOTHING);
train_images(target, "/home/awhaley/CLionProjects/CameraNM500/images/People", CLASS_PEOPLE);
train_images(target, "/home/awhaley/CLionProjects/CameraNM500/images/CarUphill", CLASS_CAR_UPHILL);
train_images(target, "/home/awhaley/CLionProjects/CameraNM500/images/CarDownhill", CLASS_CAR_DOWNHILL); 

You could obviously build something more interactively and be adding new images at runtime and have the system learn them whilst you’re still classifying if you wish. This, I think, is the biggest strength of the NM500.

 

Putting it alltogether and conclusions

Here’s the final thing working :-

2020-08-06_14-17

 

Here’s some samplel log output :-

[0202] ( 7s): 3 - Car going uphill [ 0, 26, 102, 75,]
[0203] ( 3s): 3 - Car going uphill [ 0, 26, 103, 75,]
[0204] ( 0s): 2 - People [ 0, 27, 103, 75,]
[0205] ( 1s): 4 - Car going downhill [ 0, 27, 103, 76,]
[0206] ( 0s): 2 - People [ 0, 28, 103, 76,]
[0207] ( 8s): 4 - Car going downhill [ 0, 28, 103, 77,]
[0208] ( 20s): 2 - People [ 0, 29, 103, 77,]
[0209] ( 0s): 4 - Car going downhill [ 0, 29, 103, 78,]
[0210] ( 0s): 3 - Car going uphill [ 0, 29, 104, 78,]
[0211] ( 5s): 3 - Car going uphill [ 0, 29, 105, 78,]
[0212] ( 1s): 4 - Car going downhill [ 0, 29, 105, 79,]
[0213] ( 13s): 2 - People [ 0, 30, 105, 79,]
[0214] ( 4s): 3 - Car going uphill [ 0, 30, 106, 79,]
[0215] ( 14s): 3 - Car going uphill [ 0, 30, 107, 79,]
[0216] ( 23s): 4 - Car going downhill [ 0, 30, 107, 80,]
[0217] ( 18s): 4 - Car going downhill [ 0, 30, 107, 81,]
[0218] ( 20s): 3 - Car going uphill [ 0, 30, 108, 81,]
[0219] ( 9s): 4 - Car going downhill [ 0, 30, 108, 82,]
[0220] ( 11s): 3 - Car going uphill [ 0, 30, 109, 82,]
[0221] ( 0s): 2 - People [ 0, 31, 109, 82,]
[0222] ( 0s): 4 - Car going downhill [ 0, 31, 109, 83,]
[0223] ( 2s): 3 - Car going uphill [ 0, 31, 110, 83,]
[0224] ( 0s): 2 - People [ 0, 32, 110, 83,]
[0225] ( 2s): 4 - Car going downhill [ 0, 32, 110, 84,]
[0226] ( 2s): 4 - Car going downhill [ 0, 32, 110, 85,]
[0227] ( 44s): 3 - Car going uphill [ 0, 32, 111, 85,]
[0228] ( 0s): 2 - People [ 0, 33, 111, 85,]
[0229] ( 5s): 3 - Car going uphill [ 0, 33, 112, 85,]
[0230] ( 2s): 4 - Car going downhill [ 0, 33, 112, 86,]
[0231] ( 2s): 3 - Car going uphill [ 0, 33, 113, 86,]

I currently have around 100 training images per class and I’d say it’s probably 95% accurate at classification. This has used 92 neurons out of my available 2,304 so there’s certainly lots of scope for improving the performance. The performance burden on the system seems negligable as I’m getting about 20fps with or without inference taking place and so most of the work is probably manipulating the images getting them into a useful format. This kind of highlights a big limitation with the hardware: the single layer and limited input and output vectors mean that quite a lot preprocessing will be required to get the image into a suitable state. Doing this kind of manipulation on an Arduino and RPi Zero is likely to be prohibitively expensive for real time image problems. So I think NM500 is a very nice concept but probably too limited for real world use. I guess we would really need much larger scale (neurons and input vector size) and more elaborate neuron types e.g. convolutional layers for image processing. I’m really looking forward to playing with future ‘neuromorphic’ hardware.

Here’s the code https://github.com/azw413/CameraNM500