I recently sat down to benchmark the new accelerator hardware that is now appearing on the market intended to speed up machine learning inferencing on the edge. But, so I’d have a rough yardstick for comparison, I also ran the same benchmarks on the Raspberry Pi.
However a lot of people complained that I should have used TensorFlow Lite for those benchmarks rather than TensorFlow. Enough people said it in fact, that I felt I really should see how much faster TensorFlow Lite was on the Raspberry Pi than ‘vanilla’ TensorFlow.
So, here goes…
Headline results from benchmarking
Using TensorFlow Lite we see a considerable speed increase when compared with the original results from our previous benchmarks using full TensorFlow.
We see an approximately ×2 increase in inferencing speed between the original TensorFlow figures and the new results using TensorFlow Lite.
A single 3888×2916 pixel test image was used containing two recognisable objects in the frame, a banana🍌 and an apple🍎. The image was resized down to 300×300 pixels before presenting it to the model, and each model was run 10,000 times before an average inferencing time was taken. The first inferencing run, which takes longer due to loading overheads, was discarded.
Comparing our new result with our previously obtained benchmark figures we see that using TensorFlow Lite for inferencing on an unaccelerated Raspberry Pi brings inferencing times very roughly into line with those seen from the NVIDIA Jetson Nano when using normal TensorFlow models before optimisation using NVIDIA’s TensorFlow with TensorRT library.
This is really rather suggestive that unoptimised ‘vanilla’ TensorFlow models are mostly running on the NVIDIA Jetson Nano’s processor, a 64-bit Quad-core ARM A57, rather than being offloaded to the GPU as you’d expect.
While it’s still extremely early days, TensorFlow Lite has recently introduced support for GPU acceleration for inferencing, and running models using TensorFlow Lite with GPU support should reduce the time needed for inferencing on the Jetson Nano. Taking our new results here on the Raspberry Pi as a yard stick we should expect the gap between the Jetson Nano and Google’s Coral hardware to close significantly at that point.
Heating and Cooling
As we observed last time the Raspberry Pi reached a high enough temperature during benchmarking that it suffered from thermal throttling of the CPU. This time we observed external temperatures in excess of those previously seen.
External temperatures were measured using a laser infrared thermometer which has an accuracy of ±2°C for temperatures ≤100°C after a extended test run of 50,000 inferences was completed.
The CPU temperatures were as reported by the operating system using the following command line invocation.
Last time the Raspberry Pi reached a temperature of 74°C during extended testing which meant that it suffered from thermal throttling of the CPU, it came close to the 80°C point where additional incremental throttling would occur. This time we saw increased temperatures, peaking around 78°C.
As before I’d recommend that, if you intended to run inferencing for extended periods using the Raspberry Pi, you should add at least a passive heatsink to avoid throttling the CPU. It’s even possible that a small fan might also be a good idea. Because let’s face it, CPU throttling can spoil your day.
While adding TensorFlow Lite on the Raspberry Pi to our benchmarks hasn’t changed the overall result, with the Coral Dev Board and USB Accelerator have a clear lead, with MobileNet models running between ×3 to ×4 times faster than the direct competitors. It’s really interesting to see that using TensorFlow Lite, and accepting the restrictions that the lightweight framework is going to place on you, increases performance this much.
While I was expecting things to run faster, a factor of ×2 is pretty impressive.
Both here and with our previous benchmarks I felt that approaching things in a relatively direct way, and trying as much as possible to keep the playing field level between platforms level, was the best approach to get a base line for how they all performed with respect to each other.
However there is obviously a great deal you can do with optimisation, both of the model you’re running and how you go about running it, to improve the inferencing speeds I talked about here and in my original benchmarking piece. I’m not unaware of that, and I’ll be interested to see how others can improve on the work I’ve done here.
Yes, you can get these models to run faster. Now show us how to do that.
Go ahead and download the latest release of Raspbian Lite and set up your Raspberry Pi. Unless you’re using wired networking, or have a display and keyboard attached to the Raspberry Pi, at a minimum you’ll need to put the Raspberry Pi on to your wireless network, and enable SSH.
Once you’ve set up your Raspberry Pi go ahead and power it on, and then open up a Terminal window on your laptop and SSH into the Raspberry Pi.
To convert the model from TensorFlow to TensorFlow Lite you’ll need to know what the input and output nodes of the model are called. The easiest way to figure this out is to use the use the summarize_graph tool to inspect the model and provide guesses about likely input and output nodes. Unfortunately if you’ve previously installed TensorFlow using pip then this tool isn’t going to be available, you’ll have to go back and install from it source to have access to the C++ tools.
This command takes the input tensor normalized_input_image_tensor after resizing each camera image frame to 300×300 pixels. The outputs of the quantised model represent four arrays: detection_boxes, detection_classes, detection_scores, and num_detections.
You can follow a similar process for the quantised version of the MobileNet SSD V2 model from the Coral Model Zoo, an invoke the same toco command line to convert it to a TensorFlow Lite model.
As I really tried to make clear in my previous article putting these platforms on an even footing and directly comparing them is actually not a trivial task. Hopefully this goes some way to proving that.
Links to getting started guides
If you’re interested in getting started with any of the accelerator hardware I used during my first benchmark I’ve put together getting started guides for the Google, Intel, and NVIDIA hardware I used there.