Using a ESP32 Thing and a WiFi camera to create a robot that you control from your browser.
I've been continuing to explore how to use the ESP32 Thing. In a previous Enginursday, I built an OLED Clock that uses the ESP32 to automatically get the time from the internet. This week I wanted to build a robot that you can control from your web browser using the arrow keys on your computer. For a full tutorial on how to make your own, stay tuned for a tutorial on our learn.sparkfun.com page, but for a brief overview continue reading below.
The robot is focused around our shadow chassis, and the motors are controlled with our Serial Controlled Motor Driver (SCMD). Some of the other parts used are a 2200mAh battery, a USB-A Female Breakout (to avoid having to hack apart the USB cable to provide power to the camera), and a 5V DC/DC Converter. In addition to the ESP32 Thing, I used our Motion Shield to store the HTML file that the ESP32 serves to client when they connect.
Having the HTML file on the SD card sped up the development time because this way I could make changes to the file without having to wait for Arduino to compile the code for the ESP32. The DC/DC converter is important because I needed to regulate the 6-8.4V battery voltage down to 5V to not only extend the battery life with the increased efficiency of a switching regulator, but also because the camera and ESP32 will draw 500mA of current, which would normally cause a linear regulator to get quite toasty.
Getting started with this project left one big question in my head, "Can the ESP32 handle a video stream?". And the answer I ended up with was: not sure, but it doesn't matter. Something I wasn't sure about was how the ESP32 would handle a link when a client connected to the web server. Would the ESP32 see the link to the camera's image stream and store the image in memory? How big of an image could I display before the ESP32 ran out of memory and crashed? Or is it up to the client to go to the image stream and create the image?
To find out the answer, I found a public IP camera and embedded the video stream into my HTML page. What I discovered was that it doesn't matter if it's an image or a video link – all the ESP32 does is send the HTML code to the browser, and it's up to the browser to figure out what needs to be displayed where. So when the browser saw a link, the browser went to the URL and retrieved the image or video.
For the ESP32 and HTML code, you can find it on my GitHub here. In the HTML code, I check for keyboard presses and releases (specifically the arrow keys). When a key is pressed, a text string is saved in a separate page using an XML request. When text is sent to the page, the ESP32 scans the request to see what action it should take. These could be to drive forward/backward or turn left/right.
The hard part was trying to embed the video stream into the HTML file. Each camera manufacturer has their own way to stream video. Some of the older cameras use a MJPEG from a HTTP link, but the newer cameras use what's called a Real Time Streaming Protocol (RTSP), instead of HTTP. An RTSP link isn't native to HTML5 and requires a Javascript library to convert the stream. To keep the file size smaller, I found that the camera can grab snapshots of the image from a HTTP link, and with significantly less code I was able to create a video stream by taking snapshots at a regular time interval to make it look like a video stream.
Build your own!
When I was first planning out this post, I wanted to drive it around our office and record my screen to see how much fun it was to drive around, and look at all of the reactions of people as they stare at it driving past their department. As it traveled, it picked up few fashion accessories along its way:
But just watching me drive around isn't really all that fun. What would be more fun, is to let YOU drive the robot around. You can access the robot from ~~this link~~. Unfortunately, it only seems to work on Firefox and Safari. In order to view the IP camera stream you need to log into the camera, which can be embedded in the URL for the camera, but Chrome and Edge will block passing the credentials for security reasons. If you're asked for a user name and password, enter guest for the user name and password for the password. The camera is capable of 30FPS, but in real world conditions, the frame rate varies from ~12-25FPS. The controls are pretty simple:
Arrow Keys:
First, I don't know what is going to happen. I don't know how many users can connect before the ESP32 crashes, or how many users can access the camera at any given time. Most importantly, be patient and share. I don't have anything fancy going on in the code to create a queue of users with a time limit on driving. If one person tells it to drive forward, and another then tells it to drive in reverse, it will execute the last command that was received. So try and let other people take turns driving it.
Second, you are confined to Engineering. We have barriers set up to keep the robot in my department, trying to drive over them will just get the robot stuck. I'm going to try and keep an eye on it, but once it gets stuck, it might be stuck for a while, so it might be best to try and steer clear of the barriers.
The page might go down occasionally. It might be that the battery is being replaced, or the ESP32 and/or camera have crashed due to the number of users. Give it a couple of minutes, and it should be back online. I'm only planning on having the robot online from 8:30am-5:00pm (MDT) on Thursday 3/22 and Friday 3/23.
And that's it! Have fun driving around our offices. If you want to drive by my office and say hi, my office is the door with a life sized (-ish) Han Solo vinyl sticker.