IOT

DIY AI Camera with Google Vision & ESP32 CAM Module

In this blog, we will design an AI camera utilizing the Google Vision API and the ESP32 CAM Module. This is essentially a full test of the Google Vision API with the ESP32 Camera for AI and Machine Learning applications. The developed AI camera can recognize items in the captured frame and display the frame as well as the detected labels on the TFT LCD screen.

The Google Vision API makes it simple for developers to integrate vision detection features like image labeling, face and landmark identification, optical character recognition (OCR), and explicit content tagging into their apps. We’ll use the ESP32 Camera Module to implement the same Google Vision features. The ESP32 CAM module was chosen because it is an excellent choice for image processing IoT applications.

Since there are so many phases in this project, it will take some time and patience. We’ll write the Arduino code for the ESP32 CAM Module and include some libraries like as TFT, JSON, and Decoder. The next step is to install Google Vision API and NodeJS, as well as configure various GCP settings. This post covers the entire hardware setup as well as the Arduino and NodeJS code. As a result, creating a DIY AI Camera with Google Vision and an ESP32 CAM Module should be simple.

Hardware Required:

  • ESP32-CAM Board-AI-Thinker ESP32 Camera Module
  • TFT LCD Display-ILI9341 2.8″ 240X320 SPI Display
  • Push Button
  • FTDI Module-USB-to-TTL Converter Module
  • USB Cable-5V Mini-USB Data Cable
  • Jumper Wires-Female to Female Connectors

ESP32 CAM Module

The ESP32 Based Camera Module was developed by AI-Thinker. The controller contains a Wi-Fi + Bluetooth/BLE chip and is powered by a 32-bit CPU. It has a 520 KB internal SRAM and an external 4M PSRAM. UART, SPI, I2C, PWM, ADC, and DAC are all supported by its GPIO Pins.

The module is compatible with the OV2640 Camera Module, which has a camera resolution of 1600 x 1200 pixels. A 24-pin gold plated connector links the camera to the ESP32 CAM Board. A 4GB SD Card can be used on the board. The photographs captured are saved on the SD Card.

ESP32-CAM Features 

  • The smallest 802.11b/g/n Wi-Fi BT SoC module.
  • Low power 32-bit CPU, can also serve the application processor.
  • Up to 160MHz clock speed, summary computing power up to 600 DMIPS.
  • Built-in 520 KB SRAM, external 4MPSRAM.
  • Supports UART/SPI/I2C/PWM/ADC/DAC.
  • Support OV2640 and OV7670 cameras, built-in flash lamp.
  • Support image WiFI upload.
  • Supports TF card.
  • Supports multiple sleep modes.
  • Embedded Lwip and FreeRTOS.
  • Supports STA/AP/STA+AP operation mode.
  • Support Smart Config/AirKiss technology.
  • Support for serial port local and remote firmware upgrades (FOTA).

ESP32-CAM FTDI Connection

  • There is no programmer chip on the PCB. So, any form of USB-to-TTL Module can be used to program this board. FTDI Modules based on the CP2102 or CP2104 chip, or any other chip, are widely accessible.
  • Connect the FTDI Module to the ESP32 CAM Module as shown below.
ESP32 CAM FTDI Module Connection
ESP32-CAMFTDI Programmer
GNDGND
5VVCC
U0RTX
U0TRX
GPIO0GND

Connect the ESP32’s 5V and GND pins to the FTDI Module’s 5V and GND. Connect the Rx to UOT and the Tx to UOR Pin in the same way. The most crucial thing is that you must connect the IO0 and GND pins. The device will now be in programming mode. You can remove it once the programming is completed.

Project Schematic Design

The given schematic can be used to program the ESP32 CAM Module. However, the project Google Vision API with ESP32 Camera has a somewhat different architecture. We used the ILI9341 2.8′′ TFT LCD Display to display the image that was taken. The project’s connecting diagram is shown below.

Google Vision ESP32 CAM

The connection between LCD Display and ESP32 CAM are as follows.

sl.no2.8″ SPI LCD DISPLAYESP32 CAM
1VCC3.3V
2GNDGND
3CSIO2
4RESETIO16
5D/CIO15
6SDIIO13
7SCKIO14
8LEDVCC
9SD0IO12

This project also includes the use of a push-button for image capture. The ESP32 CAM I04 pins are linked to the push-button, while the other end is maintained high using VCC. The high logic level is enabled when the button is pressed, and the image is captured.

This is my cardboard-based setup. All of the components fit neatly inside the package.

Only a TFT LCD Display and a Push Button Switch are found on the box’s top side. The TFT LCD is utilized to display AI-detected acquired photos.

  • Only a camera outlet on the bottom side of the box is used for the camera function.

PCB Design + Gerber Files + PCB Ordering Online

If you don’t want to put the circuit together on a breadboard and instead prefer a PCB, this is the PCB for you. EasyEDA online Circuit Schematics & PCB Design tool was used to create the PCB Board for the ESP32 CAM AI Board. The diagram looks somewhat like this.

After that, the schematic is transferred to a PCB. The PCB’s top and bottom views are provided below.

The Gerber File for the PCB is also given below. You can simply download the Gerber File from the following link.

Download Gerber File: ESP32-CAM AI Camera PCB

Now you can visit the NextPCB official website by clicking here: NextPCB.

You can now upload the Gerber File and place an order on the website. The PCB quality is excellent. That is why the majority of people entrust NextPCB with their PCB and PCBA needs.

Flow of Data

We’ve covered the entire process here, from detecting the object to displaying labels on the screen. We have an ESP32 CAM module that captures an image of the surroundings or an object and then transfers it to a TFT screen via the SPI protocol so that it may be shown.

The identical image is now transmitted to the NodeJS server, along with the Authentication ID.

Here the engine that detects the Object or creates labels for the object(s) in the image frame is Google Cloud Vision API.

The image is sent to the Vision AI API via the NodeJS server. However, in order to connect with the API, you must first authenticate with the Authentication ID. Once the frame has been sent, the API returns the labels to the server, which then sends them to the ESP-CAM, where they are shown on the TFT-Screen.

Arduino Libraries Installation

Now, in order to use the TFT screen and get data from the server, we’ll need to install a few libraries using the Arduino library manager. To open Library Manager, use Ctrl+shift+I; depending on your system, it may take a few seconds to open. Now type the names of libraries into the search bar and install them.

1. TFT_eSPI by Bodmer: https://github.com/Bodmer/TFT_eSPI

2. TJpg_Decoder by Bodmer: https://github.com/Bodmer/TJpg_Decoder

3. ArduinoJson by Benoit Blanchon: https://github.com/bblanchon/ArduinoJson

Code for TFT Display Test

The frames captured by the camera must now be displayed on the TFT screen. Once the libraries are installed, post the code below.

#include “esp_camera.h”
#include <TJpg_Decoder.h>
#include <SPI.h>
#include <TFT_eSPI.h>
 
#define PWDN_GPIO_NUM 32
#define RESET_GPIO_NUM    -1
#define XCLK_GPIO_NUM      0
#define SIOD_GPIO_NUM 26
#define SIOC_GPIO_NUM 27
 
#define Y9_GPIO_NUM   35
#define Y8_GPIO_NUM   34
#define Y7_GPIO_NUM   39
#define Y6_GPIO_NUM   36
#define Y5_GPIO_NUM   21
#define Y4_GPIO_NUM   19
#define Y3_GPIO_NUM   18
#define Y2_GPIO_NUM        5
#define VSYNC_GPIO_NUM    25
#define HREF_GPIO_NUM 23
#define PCLK_GPIO_NUM 22
 
#define GFXFF 1
#define FSB9 &FreeSerifBold9pt7b
 
TFT_eSPI tft = TFT_eSPI();
 
bool tft_output(int16_t x, int16_t y, uint16_t w, uint16_t h, uint16_t* bitmap)
{
   // Stop further decoding as image is running off bottom of screen
  if ( y >= tft.height() ) return 0;
 
  // This function will clip the image block rendering automatically at the TFT boundaries
  tft.pushImage(x, y, w, h, bitmap);
 
  // This might work instead if you adapt the sketch to use the Adafruit_GFX library
  // tft.drawRGBBitmap(x, y, bitmap, w, h);
 
  // Return
  return 1;
}
 
void setup() {
  Serial.begin(115200);
  delay(1000);
  
  Serial.println();
 
  Serial.println(“INIT DISPLAY”);
  tft.begin();
  tft.setRotation(3);
  tft.setTextColor(0xFFFF, 0x0000);
  tft.fillScreen(TFT_CYAN);
  tft.setFreeFont(FSB9);
 
  TJpgDec.setJpgScale(1);
  TJpgDec.setSwapBytes(true);
  TJpgDec.setCallback(tft_output);
  
  Serial.println(“INIT CAMERA”);
  camera_config_t config;
  config.ledc_channel = LEDC_CHANNEL_0;
  config.ledc_timer = LEDC_TIMER_0;
  config.pin_d0 = Y2_GPIO_NUM;
  config.pin_d1 = Y3_GPIO_NUM;
  config.pin_d2 = Y4_GPIO_NUM;
  config.pin_d3 = Y5_GPIO_NUM;
  config.pin_d4 = Y6_GPIO_NUM;
  config.pin_d5 = Y7_GPIO_NUM;
  config.pin_d6 = Y8_GPIO_NUM;
  config.pin_d7 = Y9_GPIO_NUM;
  config.pin_xclk = XCLK_GPIO_NUM;
  config.pin_pclk = PCLK_GPIO_NUM;
  config.pin_vsync = VSYNC_GPIO_NUM;
  config.pin_href = HREF_GPIO_NUM;
  config.pin_sscb_sda = SIOD_GPIO_NUM;
  config.pin_sscb_scl = SIOC_GPIO_NUM;
  config.pin_pwdn = PWDN_GPIO_NUM;
  config.pin_reset = RESET_GPIO_NUM;
  config.xclk_freq_hz = 10000000;
  config.pixel_format = PIXFORMAT_JPEG;
  //init with high specs to pre-allocate larger buffers
  if(psramFound()){
    config.frame_size = FRAMESIZE_QVGA; // 320×240
    config.jpeg_quality = 10;
    config.fb_count = 2;
  } else {
    config.frame_size = FRAMESIZE_SVGA;
    config.jpeg_quality = 12;
    config.fb_count = 1;
  }
 
 
  // camera init
  esp_err_t err = esp_camera_init(&config);
  if (err != ESP_OK) {
    Serial.printf(“Camera init failed with error 0x%x”, err);
    return;
  }
}
 
camera_fb_t* capture(){
  camera_fb_t *fb = NULL;
  esp_err_t res = ESP_OK;
  fb = esp_camera_fb_get();
  return fb;
}
 
void showingImage(){
  camera_fb_t *fb = capture();
  if(!fb || fb->format != PIXFORMAT_JPEG){
    Serial.println(“Camera capture failed”);
    esp_camera_fb_return(fb);
    return;
  }else{
    TJpgDec.drawJpg(0,0,(const uint8_t*)fb->buf, fb->len);
    esp_camera_fb_return(fb);
  }
}
 
void loop() {
    showingImage();
}

When uploading the code, make sure the IO0 pin is grounded. If the text Connecting…. shows on the log screen, followed by “…… …….

Now the screen image has to be visible.

Google Vision API

Through REST and RPC APIs, Vision API provides strong pre-trained machine learning models. Assign labels to photos and sort them into millions of predefined categories easily. Detect objects and faces, interpret printed and handwritten text, and enrich your image database with useful metadata.

It also makes it simple for developers to implement vision detection capabilities into their apps, such as image labeling, face, and landmark identification, optical character recognition (OCR), and explicit content tagging.

NodeJS Installation

We will now use NodeJS to establish a server.  To Install the latest version of NodeJS from nodejs.org.

Download the software that is compatible with your system from this page. Make sure you have the LTS version downloaded.

GCP and API Setup

Before you start, make sure you’ve added your credit/debit card to GCP, even though the project is completely free. Even so, we need a billing address because we’re using GCP features.

To begin, click here to get to Google Cloud Platform (GCP). Also, start a new project.

  • Here, select New Project.
  • Add a name to the project and leave the organization as No Organization.
  • After you’ve enabled billing and created a project. Your project’s Vision API must now be enabled. To do so go to this link here. And enable the API.

Now we have to create a service account for authentication. Go to this link.

  • Select the newly created project.
  • Add a name to the service account as well as an ID (we don’t have to enter the whole ID, just a name, albeit the service ID is generated automatically). Then click Create and Continue.
  • Select a role from the drop-down menu.
  • Click Basic, then Owner under Quick access.
  • Continue by clicking Continue.
  • Now press the Done button.

Now we create a service account key:

  1. Click the email address for the service account you created in the Cloud Console.
  2. Select Keys.
  3. After clicking Add key, click Create new key.
  4. Choose to Create. Your PC will receive a JSON key file.
  5. Close the window.

Open a command prompt and type the following command:

To begin, navigate to the directory where you want to store the project and start a server. Now go to that place and open the command prompt.

• npm install – save @google-cloud/vision

• set GOOGLE_APPLICATION_CREDENTIALS=KEY_PATH

For example: set  GOOGLE_APPLICATION_CREDENTIALS=”/home/user/Downloads/service-account-file.json

  • Now, in the same directory, create a new file with the “.js” suffix (say test.js)
  • Also, create a folder called “resources”  and put an image called “test.jpg” in it (that you want to test for object detection).
  • Now open test.js and paste the code below
‘use strict’;
 
function main() {
  // [START vision_quickstart]
  async function quickstart() {
    // Imports the Google Cloud client library
    const vision = require(‘@google-cloud/vision’);
 
    // Creates a client
    const client = new vision.ImageAnnotatorClient();
 
    // Performs label detection on the image file
    const [result] = await client.labelDetection(‘./resources/test.jpg’);
    const labels = result.labelAnnotations;
    console.log(‘Labels:’);
    labels.forEach(label =>{ console.log(label)});
  }
  quickstart();
  // [END vision_quickstart]
}
 
process.on(‘unhandledRejection’, err => {
  console.error(err.message);
  process.exitCode = 1;
});
 
main(…process.argv.slice(2));
  • Now run the above code by typing “node test.js” into the command prompt (make sure you’re in the correct directory).
  • The labels will be shown in the log that was discovered in the test.jpg image.
  • Congratulations, our big effort is complete; now we must duplicate it using the esp-cam server.
  • Make a new folder called VisionServer, and inside it, make another folder called resources, as well as a file called server.js.
  • Open server.js and paste the code below
var fs = require(‘fs’);
const http = require(‘http’);
const server = http.createServer();
const filePath = ‘./resources/test.jpeg’;
 
server.on(‘request’, (request, response)=>{
    if(request.method == ‘POST’ && request.url === “/imageUpdate”){
        
        var ImageFile = fs.createWriteStream(filePath, {encoding: ‘utf8’});
        request.on(‘data’, function(data){
            ImageFile.write(data);
        });
 
        request.on(‘end’,async function(){
            ImageFile.end();
            const labels = await labelAPI();
            response.writeHead(200, {‘Content-Type’ : ‘application/json’});
            response.end(JSON.stringify(labels));
        });
 
    }else{
        console.log(“error”);
        response.writeHead(405, {‘Content-Type’ : ‘text/plain’});
        response.end();
    }
});
 
async function labelAPI() {
  var o = [];
  // Imports the Google Cloud client library
  const vision = require(‘@google-cloud/vision’);
 
  // Creates a client
  const client = new vision.ImageAnnotatorClient();
 
  // Performs label detection on the image file
  const [result] = await client.labelDetection(filePath);
  const labels = result.labelAnnotations;
  
  labels.forEach(label => {
    o.push({description: label.description, score: label.score});
  });
  return o;
}
 
const port = 8888;
server.listen(port)
console.log(`Listening at ${port}`)

Save the code and launch it by going to the proper directory and typing “node server.js” in the command prompt.

Final Arduino Code

The final Arduino code for the AI Camera with Google Vision and ESP32 CAM Module can be found here. So, launch the Arduino IDE and make a few changes to the code below.

To begin, enter the SSID and password for the network to which your laptop is connected.

const char* ssid = “**********”;
const char* password = “**********”;

Now we must update the server’s IP address, which in this case is our computer’s IP address, as seen in the following line of code.

client.begin(“http://192.168.116.56:8888/imageUpdate”);

Here is the final code that you have to upload to the ESP32 CAM Board.

#include “esp_camera.h”
#include <TJpg_Decoder.h>
#include <SPI.h>
#include <TFT_eSPI.h>
#include <WiFi.h>
#include <HTTPClient.h>
#include <ArduinoJson.h>
 
#define PWDN_GPIO_NUM 32
#define RESET_GPIO_NUM    -1
#define XCLK_GPIO_NUM      0
#define SIOD_GPIO_NUM 26
#define SIOC_GPIO_NUM 27
 
#define Y9_GPIO_NUM   35
#define Y8_GPIO_NUM   34
#define Y7_GPIO_NUM   39
#define Y6_GPIO_NUM   36
#define Y5_GPIO_NUM   21
#define Y4_GPIO_NUM   19
#define Y3_GPIO_NUM   18
#define Y2_GPIO_NUM        5
#define VSYNC_GPIO_NUM    25
#define HREF_GPIO_NUM 23
#define PCLK_GPIO_NUM 22
 
#define GFXFF 1
#define FSB9 &FreeSerifBold9pt7b
 
TFT_eSPI tft = TFT_eSPI();
 
const char* ssid = “**********”;
const char* password = “**********”;
const unsigned long timeout = 30000; // 30 seconds
 
const int buttonPin = 4;    // the number of the pushbutton pin
int buttonState;        
int lastButtonState = LOW;  
unsigned long lastDebounceTime = 0;  // the last time the output pin was toggled
unsigned long debounceDelay = 50;    // the debounce time; increase if the output flickers
bool isNormalMode = true;
 
bool tft_output(int16_t x, int16_t y, uint16_t w, uint16_t h, uint16_t* bitmap)
{
   // Stop further decoding as image is running off bottom of screen
  if ( y >= tft.height() ) return 0;
 
  // This function will clip the image block rendering automatically at the TFT boundaries
  tft.pushImage(x, y, w, h, bitmap);
 
  // This might work instead if you adapt the sketch to use the Adafruit_GFX library
  // tft.drawRGBBitmap(x, y, bitmap, w, h);
 
  // Return
  return 1;
}
 
 
void setup() {
  Serial.begin(115200);
  delay(1000);
  
  Serial.println();
  pinMode(buttonPin, INPUT);
 
  Serial.println(“INIT DISPLAY”);
  tft.begin();
  tft.setRotation(3);
  tft.setTextColor(0xFFFF, 0x0000);
  tft.fillScreen(TFT_YELLOW);
  tft.setFreeFont(FSB9);
 
  TJpgDec.setJpgScale(1);
  TJpgDec.setSwapBytes(true);
  TJpgDec.setCallback(tft_output);
  
  Serial.println(“INIT CAMERA”);
  camera_config_t config;
  config.ledc_channel = LEDC_CHANNEL_0;
  config.ledc_timer = LEDC_TIMER_0;
  config.pin_d0 = Y2_GPIO_NUM;
  config.pin_d1 = Y3_GPIO_NUM;
  config.pin_d2 = Y4_GPIO_NUM;
  config.pin_d3 = Y5_GPIO_NUM;
  config.pin_d4 = Y6_GPIO_NUM;
  config.pin_d5 = Y7_GPIO_NUM;
  config.pin_d6 = Y8_GPIO_NUM;
  config.pin_d7 = Y9_GPIO_NUM;
  config.pin_xclk = XCLK_GPIO_NUM;
  config.pin_pclk = PCLK_GPIO_NUM;
  config.pin_vsync = VSYNC_GPIO_NUM;
  config.pin_href = HREF_GPIO_NUM;
  config.pin_sscb_sda = SIOD_GPIO_NUM;
  config.pin_sscb_scl = SIOC_GPIO_NUM;
  config.pin_pwdn = PWDN_GPIO_NUM;
  config.pin_reset = RESET_GPIO_NUM;
  config.xclk_freq_hz = 10000000;
  config.pixel_format = PIXFORMAT_JPEG;
  //init with high specs to pre-allocate larger buffers
  if(psramFound()){
    config.frame_size = FRAMESIZE_QVGA; // 320×240
    config.jpeg_quality = 10;
    config.fb_count = 2;
  } else {
    config.frame_size = FRAMESIZE_SVGA;
    config.jpeg_quality = 12;
    config.fb_count = 1;
  }
 
  // camera init
  esp_err_t err = esp_camera_init(&config);
  if (err != ESP_OK) {
    Serial.printf(“Camera init failed with error 0x%x”, err);
    return;
  }
}
 
bool wifiConnect(){
  unsigned long startingTime = millis();
  WiFi.begin(ssid, password);
 
  while(WiFi.status() != WL_CONNECTED){
    delay(500);
    if((millis() – startingTime) > timeout){
      return false;
    }
  }
  return true;
}
 
void buttonEvent(){
  int reading = digitalRead(buttonPin);
  if (reading != lastButtonState) {
    lastDebounceTime = millis();
  }
 
  if ((millis() – lastDebounceTime) > debounceDelay) {
    if (reading != buttonState) {
      buttonState = reading;
      
      if (buttonState == HIGH) {
        isNormalMode = !isNormalMode;
 
        //Additional Code
        if(!isNormalMode)
          sendingImage();
        //  
      }
    }
  }
  lastButtonState = reading;
}
 
camera_fb_t* capture(){
  camera_fb_t *fb = NULL;
  esp_err_t res = ESP_OK;
  fb = esp_camera_fb_get();
  return fb;
}
 
void showingImage(){
  camera_fb_t *fb = capture();
  if(!fb || fb->format != PIXFORMAT_JPEG){
    Serial.println(“Camera capture failed”);
    esp_camera_fb_return(fb);
    return;
  }else{
    TJpgDec.drawJpg(0,0,(const uint8_t*)fb->buf, fb->len);
    esp_camera_fb_return(fb);
  }
}
 
void parsingResult(String response){
  DynamicJsonDocument doc(1024);
  deserializeJson(doc, response);
  JsonArray array = doc.as<JsonArray>();
  int yPos = 4;
  //tft.setRotation(1);
  for(JsonVariant v : array){
    JsonObject object = v.as<JsonObject>();
    const char* description = object[“description”];
    float score = object[“score”];
    String label = “”;
    label += description;
    label += “:”;
    label += score;
    
    tft.drawString(label, 8, yPos, GFXFF);
    yPos += 16;
  }
  //tft.setRotation(3);
}
 
void postingImage(camera_fb_t *fb){
  HTTPClient client;
  client.begin(“http://192.168.116.56:8888/imageUpdate”);
  client.addHeader(“Content-Type”, “image/jpeg”);
  int httpResponseCode = client.POST(fb->buf, fb->len);
  if(httpResponseCode == 200){
    String response = client.getString();
    parsingResult(response);
  }else{
    //tft.setRotation(1);
    //Error
    tft.drawString(“Check Your Server!!!”, 8, 4, GFXFF);
    //tft.setRotation(3);
  }
 
  client.end();
  WiFi.disconnect();
}
 
void sendingImage(){
  camera_fb_t *fb = capture();
  if(!fb || fb->format != PIXFORMAT_JPEG){
    Serial.println(“Camera capture failed”);
    esp_camera_fb_return(fb);
    return;
  }else{
    TJpgDec.drawJpg(0,0,(const uint8_t*)fb->buf, fb->len);
    //tft.setRotation(1);
    tft.drawString(“Wifi Connecting!”, 8, 4, GFXFF);
    //tft.setRotation(3);
    if(wifiConnect()){
      //tft.drawString(“Wifi Connected!”, 8, 4, GFXFF);
      TJpgDec.drawJpg(0,0,(const uint8_t*)fb->buf, fb->len);
      postingImage(fb);
    }else{
      //tft.setRotation(1);
      tft.drawString(“Check Wifi credential!”, 8, 4, GFXFF);
      //tft.setRotation(3);
    }
    esp_camera_fb_return(fb);
  }
}
 
void loop() {
  buttonEvent();
  
  if(isNormalMode)
    showingImage();
  
}

Testing DIY AI Camera using Google Vision & ESP32 CAM

  • When you hold down the button for 1 second, a flash appears and the image is captured.
DIY AI Camera
  • And after a few seconds, the labels show, indicating that the AI camera has begun to work.
  • A sample of other images was taken with this DIY AI Camera.

Conclusion

I hope all of you understand how to design a DIY AI Camera with Google Vision & ESP32 CAM Module. We MATHA ELECTRONICS will be back soon with more informative blogs.

Leave a Reply

Your email address will not be published.