IoT or the internet of things is characterized as a forthcoming innovation that empowers us to control equipment gadgets through the Internet. Homes of the next generation will become more and more self-controlled and mechanized because of the solace it gives, particularly when utilized in a private home.
Touchless gestures are the new frontier in the world of human-machine interfaces. You can control a computer, microcontroller, robot, or another device by swiping your palm over a sensor. To open or close any app, start the music, answer calls, and so on, most phones now have a gesture control capability. This is a very useful function for saving time, and it also looks amazing to use gestures to control any gadget.
In this blog, we will create a Gesture Controlled Virtual Mouse with ESP32-CAM and OpenCV. The mouse tracking and clicking activities can be controlled wirelessly using the ESP32 Camera Module and a Python application.
To get started, you’ll need a strong understanding of Python, image processing, Embedded Systems, and the Internet of Things. First, we’ll learn how to control mouse tracking and clicking, as well as the requirements for running a Python program. We’ll start by testing the entire Python script with a webcam or a laptop’s inbuilt camera.
The Python code will be run on an ESP32-CAM Module in the second section. As a result, instead of a PC camera, the ESP32-CAM will be used as an input device or any other external camera.
What are Gestures?
A Gesture is a movement made with a part of your body, most often your hands, to convey emotion or information. A gesture is a type of nonverbal communication in which a person’s visible bodily gestures can convey a message. It is possible to control activities without contacting the real equipment by detecting these movements.
Left, right, up, down, forward, backward, clockwise, anticlockwise, and waving are the movements that users can recognize in this situation. You can also add right-left, left-right, up-down, down-up, forward-backward, and backward-forward to the mix.
Hand gestures are a widely recognized language and the most powerful and expressive way of human communication. It is sufficiently expressive to be understood by the deaf and dumb.
Hardware Required:
- ESP32-CAM Board-AI-Thinker ESP32 Camera Module
- FTDI Module-USB-to-TTL Converter Module
- USB Cable-5V Mini-USB Data Cable
- Jumper Wires-Female to Female Connectors
Controlling Mouse Tracking & Clicks with PC Camera
Let’s develop a Gesture Controlled Virtual Mouse using PC image recognition technologies before moving on to the project.
Installing Python & Required Libraries
In order for the live video stream to appear on our computer, we must develop a Python script that allows us to retrieve the video frames. The first step is to get Python installed. Download Python version 3.7.8 from python.org. A few libraries will not work unless you obtain this version or downgrade to this version.
- Once downloaded and installed, open a command prompt and type the following commands:
python –version |
- The output should look like this, with version 3.7.8.
- Now we have to install a few libraries. For that run the following commands below one after another until all the libraries are installed.
pip install numpy pip install opencv-python pip install autopy pip install mediapipe |
- If the python version you installed is correct then installing these won’t be an issue.
Source Code/Program
- Create a folder, and inside that folder, create a new python file called track_hand.py.
- Now save the code below by copying and pasting it.
import cv2 import mediapipe as mp import time import math import numpy as np class handDetector(): def __init__(self, mode=False, maxHands=1, modelComplexity=1, detectionCon=0.5, trackCon=0.5): self.mode = mode self.maxHands = maxHands self.modelComplex = modelComplexity self.detectionCon = detectionCon self.trackCon = trackCon self.mpHands = mp.solutions.hands self.hands = self.mpHands.Hands(self.mode, self.maxHands, self.modelComplex, self.detectionCon, self.trackCon) self.mpDraw = mp.solutions.drawing_utils self.tipIds = [4, 8, 12, 16, 20] def findHands(self, img, draw=True): imgRGB = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) self.results = self.hands.process(imgRGB) # print(results.multi_hand_landmarks) if self.results.multi_hand_landmarks: for handLms in self.results.multi_hand_landmarks: if draw: self.mpDraw.draw_landmarks(img, handLms, self.mpHands.HAND_CONNECTIONS) return img def findPosition(self, img, handNo=0, draw=True): xList = [] yList = [] bbox = [] self.lmList = [] if self.results.multi_hand_landmarks: myHand = self.results.multi_hand_landmarks[handNo] for id, lm in enumerate(myHand.landmark): # print(id, lm) h, w, c = img.shape cx, cy = int(lm.x * w), int(lm.y * h) xList.append(cx) yList.append(cy) # print(id, cx, cy) self.lmList.append([id, cx, cy]) if draw: cv2.circle(img, (cx, cy), 5, (255, 0, 255), cv2.FILLED) xmin, xmax = min(xList), max(xList) ymin, ymax = min(yList), max(yList) bbox = xmin, ymin, xmax, ymax if draw: cv2.rectangle(img, (xmin – 20, ymin – 20), (xmax + 20, ymax + 20), (0, 255, 0), 2) return self.lmList, bbox def fingersUp(self): fingers = [] # Thumb #print(self.lmList) #print(self.tipIds) if self.lmList[self.tipIds[0]][1] > self.lmList[self.tipIds[0] – 1][1]: fingers.append(1) else: fingers.append(0) # Fingers for id in range(1, 5): if self.lmList[self.tipIds[id]][2] < self.lmList[self.tipIds[id] – 2][2]: fingers.append(1) else: fingers.append(0) # totalFingers = fingers.count(1) return fingers def findDistance(self, p1, p2, img, draw=True,r=15, t=3): x1, y1 = self.lmList[p1][1:] x2, y2 = self.lmList[p2][1:] cx, cy = (x1 + x2) // 2, (y1 + y2) // 2 if draw: cv2.line(img, (x1, y1), (x2, y2), (255, 0, 255), t) cv2.circle(img, (x1, y1), r, (255, 0, 255), cv2.FILLED) cv2.circle(img, (x2, y2), r, (255, 0, 255), cv2.FILLED) cv2.circle(img, (cx, cy), r, (0, 0, 255), cv2.FILLED) length = math.hypot(x2 – x1, y2 – y1) return length, img, [x1, y1, x2, y2, cx, cy] def main(): pTime = 0 cTime = 0 cap = cv2.VideoCapture(0) detector = handDetector() while True: success, img = cap.read() img = detector.findHands(img) lmList, bbox = detector.findPosition(img) if len(lmList) != 0: print(lmList[4]) cTime = time.time() fps = 1 / (cTime – pTime) pTime = cTime fingers = detector.fingersUp() cv2.putText(img, str(int(fps)), (10, 70), cv2.FONT_HERSHEY_PLAIN, 3, (255, 0, 255), 3) cv2.imshow(“Image”, img) cv2.waitKey(1) if __name__ == “__main__”: main() |
- Make a new python file called final.py inside the same folder.
- Now save the code below by copying and pasting it. However, before saving, make the following changes:
- ** The width and height of your webcam should be adjusted using the wCam and hCam characteristics.
import numpy as np import track_hand as htm import time import autopy import cv2 wCam, hCam = 1280, 720 frameR = 100 smoothening = 7 pTime = 0 plocX, plocY = 0, 0 clocX, clocY = 0, 0 cap = cv2.VideoCapture(0) cap.set(3, wCam) cap.set(4, hCam) detector = htm.handDetector(maxHands=1) wScr, hScr = autopy.screen.size() # print(wScr, hScr) while True: # 1. Find hand Landmarks fingers=[0,0,0,0,0] success, img = cap.read() img = detector.findHands(img) lmList, bbox = detector.findPosition(img) # 2. Get the tip of the index and middle fingers if len(lmList) != 0: x1, y1 = lmList[8][1:] x2, y2 = lmList[12][1:] # print(x1, y1, x2, y2) # 3. Check which fingers are up fingers = detector.fingersUp() # print(fingers) cv2.rectangle(img, (frameR, frameR), (wCam – frameR, hCam – frameR), (255, 0, 255), 2) # 4. Only Index Finger : Moving Mode if fingers[1] == 1 and fingers[2] == 0: # 5. Convert Coordinates x3 = np.interp(x1, (frameR, wCam – frameR), (0, wScr)) y3 = np.interp(y1, (frameR, hCam – frameR), (0, hScr)) # 6. Smoothen Values clocX = plocX + (x3 – plocX) / smoothening clocY = plocY + (y3 – plocY) / smoothening # 7. Move Mouse autopy.mouse.move(wScr – clocX, clocY) cv2.circle(img, (x1, y1), 15, (255, 0, 255), cv2.FILLED) plocX, plocY = clocX, clocY # 8. Both Index and middle fingers are up : Clicking Mode if fingers[1] == 1 and fingers[2] == 1: # 9. Find distance between fingers length, img, lineInfo = detector.findDistance(8, 12, img) print(length) # 10. Click mouse if distance short if length < 40: cv2.circle(img, (lineInfo[4], lineInfo[5]), 15, (0, 255, 0), cv2.FILLED) autopy.mouse.click() # 11. Frame Rate cTime = time.time() fps = 1 / (cTime – pTime) pTime = cTime cv2.putText(img, str(int(fps)), (20, 50), cv2.FONT_HERSHEY_PLAIN, 3, (255, 0, 0), 3) # 12. Display cv2.imshow(“Image”, img) cv2.waitKey(1) |
Testing
Now run the above code, something similar to the below must be visible.
The image should follow the entire hand, including the fingers.
- The pointer now moves when you move your hand inside the pink bounding area. To click, elevate the central figure and position it on the spot where the cursor is.
- Now, you’ve completed half of the task. Let’s move on to another section which is the device or embedded part.
ESP32 CAM Module
The ESP32 Based Camera Module was developed by AI-Thinker. The controller contains a Wi-Fi + Bluetooth/BLE chip and is powered by a 32-bit CPU. It has a 520 KB internal SRAM and an external 4M PSRAM. UART, SPI, I2C, PWM, ADC, and DAC are all supported by its GPIO Pins.
The module is compatible with the OV2640 Camera Module, which has a camera resolution of 1600 x 1200 pixels. A 24-pin gold plated connector links the camera to the ESP32 CAM Board. A 4GB SD Card can be used on the board. The photographs captured are saved on the SD Card.
ESP32-CAM Features :
- The smallest 802.11b/g/n Wi-Fi BT SoC module.
- Low power 32-bit CPU, can also serve the application processor.
- Up to 160MHz clock speed, summary computing power up to 600 DMIPS.
- Built-in 520 KB SRAM, external 4MPSRAM.
- Supports UART/SPI/I2C/PWM/ADC/DAC.
- Support OV2640 and OV7670 cameras, built-in flash lamp.
- Support image WiFI upload.
- Supports TF card.
- Supports multiple sleep modes.
- Embedded Lwip and FreeRTOS.
- Supports STA/AP/STA+AP operation mode.
- Support Smart Config/AirKiss technology.
- Support for serial port local and remote firmware upgrades (FOTA).
ESP32-CAM FTDI Connection
- There is no programmer chip on the PCB. So, any form of USB-to-TTL Module can be used to program this board. FTDI Modules based on the CP2102 or CP2104 chip, or any other chip, are widely accessible.
- Connect the FTDI Module to the ESP32 CAM Module as shown below.
ESP32-CAM | FTDI Programmer |
GND | GND |
5V | VCC |
U0R | TX |
U0T | RX |
GPIO0 | GND |
Connect the ESP32’s 5V and GND pins to the FTDI Module’s 5V and GND. Connect the Rx to UOT and the Tx to UOR Pin in the same way. The most crucial thing is that you must connect the IO0 and GND pins. The device will now be in programming mode. You can remove it once the programming is completed.
Project PCB Gerber File & PCB Ordering Online
If you don’t want to put the circuit together on a breadboard and instead prefer a PCB. EasyEDA’s online Circuit Schematics & PCB Design tool was used to create the PCB Board for the ESP32 CAM Board. The PCB appears as seen below.
The Gerber File for the PCB is given below. You can simply download the Gerber File and order the PCB from https://www.nextpcb.com/
Download Gerber File: ESP32-CAM Multipurpose PCB
Now you can visit the NextPCB official website by clicking here: https://www.nextpcb.com/. So you will be directed to the NextPCB website.
- You can now upload the Gerber File to the Website and place an order. The PCB quality is excellent. That is why the majority of people entrust NextPCB with their PCB and PCBA needs.
- The components can be assembled on the PCB Board.
Installing ESP32CAM Library
Another streaming process will be used instead of the general ESP webserver example. As a result, another ESPCAM library is required. On the ESP32 microcontroller, the esp32cam library provides an object-oriented API for using the OV2640 camera. It’s an esp32-camera library wrapper.
Download the zip library as shown in the image from the following Github Link
After downloading, unzip the library and place it in the Arduino Library folder. To do so, follow the instructions below:
Open Arduino -> Sketch -> Include Library -> Add .ZIP Library… -> Navigate to downloaded zip file -> add
Source Code/Program for ESP32 CAM Module
The source code/program ESP32 CAM Gesture Controlled Mouse can be found in Library Example. So go to Files -> Examples -> esp32cam -> WifiCam.
You must make a little adjustment to the code before uploading it. Change the SSID and password variables to match the WiFi network you’re using.
Compile the code and upload it to the ESP32 CAM Board. However, you must follow a few steps each time you post.
- When you push the upload button, make sure the IO0 pin is shorted to the ground.
- If you notice dots and dashes during uploading, immediately press the reset button.
- Remove the I01 pin shorting with Ground and push the reset button one more after the code has been uploaded.
- If the output is still not the Serial monitor, push the reset button once again.
Now you can see a similar output as in the image below.
So that’s it for the ESP32-CAM section. Because the ESP32-CAM is broadcasting live video, make a note of the IP address displayed.
Python Code + Gesture Controlled Virtual Mouse with ESP32-CAM
Now we’ll finish up the Gesture Controlled Virtual Mouse with ESP32-CAM project. So, we return to our final.py code and make any necessary modifications or just paste the code provided.
import numpy as np import track_hand as htm import time import autopy import cv2 url=”http://192.168.1.61/cam-hi.jpg” wCam, hCam = 800, 600 frameR = 100 smoothening = 7 pTime = 0 plocX, plocY = 0, 0 clocX, clocY = 0, 0 ”’cap = cv2.VideoCapture(0) cap.set(3, wCam) cap.set(4, hCam)”’ detector = htm.handDetector(maxHands=1) wScr, hScr = autopy.screen.size() while True: # 1. Find hand Landmarks fingers=[0,0,0,0,0] #success, img = cap.read() img_resp=urllib.request.urlopen(url) imgnp=np.array(bytearray(img_resp.read()),dtype=np.uint8) img=cv2.imdecode(imgnp,-1) img = detector.findHands(img) lmList, bbox = detector.findPosition(img) # 2. Get the tip of the index and middle fingers if len(lmList) != 0: x1, y1 = lmList[8][1:] x2, y2 = lmList[12][1:] # print(x1, y1, x2, y2) # 3. Check which fingers are up fingers = detector.fingersUp() cv2.rectangle(img, (frameR, frameR), (wCam – frameR, hCam – frameR), (255, 0, 255), 2) # 4. Only Index Finger : Moving Mode if fingers[1] == 1 and fingers[2] == 0: # 5. Convert Coordinates x3 = np.interp(x1, (frameR, wCam – frameR), (0, wScr)) y3 = np.interp(y1, (frameR, hCam – frameR), (0, hScr)) # 6. Smoothen Values clocX = plocX + (x3 – plocX) / smoothening clocY = plocY + (y3 – plocY) / smoothening # 7. Move Mouse autopy.mouse.move(wScr – clocX, clocY) cv2.circle(img, (x1, y1), 15, (255, 0, 255), cv2.FILLED) plocX, plocY = clocX, clocY # 8. Both Index and middle fingers are up : Clicking Mode if fingers[1] == 1 and fingers[2] == 1: # 9. Find distance between fingers length, img, lineInfo = detector.findDistance(8, 12, img) print(length) # 10. Click mouse if distance short if length < 40: cv2.circle(img, (lineInfo[4], lineInfo[5]), 15, (0, 255, 0), cv2.FILLED) autopy.mouse.click() # 11. Frame Rate cTime = time.time() fps = 1 / (cTime – pTime) pTime = cTime cv2.putText(img, str(int(fps)), (20, 50), cv2.FONT_HERSHEY_PLAIN, 3, (255, 0, 0), 3) # 12. Display cv2.imshow(“Image”, img) cv2.waitKey(1) |
Make sure you adjust your URL variable in the preceding code according to the IP displayed on the Arduino IDE Serial monitor. Also, according to the resolution being displayed, change the wCam and hCam variables.
When you run the code, the ESP32Cam’s wireless stream with mouse tracking should be visible and functional.
As a result, we’ve created our wireless Virtual Gesture Controlled Virtual Mouse with ESP32-CAM and OpenCV.
Conclusion:
I hope all of you had understand how to design a Gesture Controlled Virtual Mouse with ESP32-CAM & OpenCV. We will be back soon with more informative blogs.