Google-Cloud-Vision-API

3 min readJul 13, 2021

Google Cloud Vision API enables developers to understand the content of an image by encapsulating powerful machine learning models in an easy-to-use REST API. It quickly classifies images into thousands of categories (e.g., “sailboat”, “lion”, “Eiffel Tower”), detects individual objects, faces within images, finds/reads printed words and contained within images.

Getting Started

The Vision API has a very broader scope therefore we will discuss only two features and its API Implementation here:

Text Detection (OCR)
Label Detection

Let's move to API Implementation Schema first to know about its usage

API Implementation Schema

Endpoint

The Vision API consists of a single endpoint:

https://vision.googleapis.com/v1/images

that supports one HTTP request method (annotate):

POST - https://vision.googleapis.com/v1/images:annotate

Authentication

If your client application does not use OAuth 2.0, then it must include an API key when it calls an API that’s enabled within a Google Cloud Platform project. The application passes this key into all API requests as a key=API_key parameter,
such as:

POST https://vision.googleapis.com/v1/images:annotate?key=YOUR_API_KEY

Since we are just testing these APIs on Google Vision API console therefore we can work without authentication.

Body

The body of your POST request contains a JSON object of type AnnotateImageRequest, such as:

{
  "image": {
    object(Image)
  },
  "features": [
    {
      object(Feature)
    }
  ],
  "imageContext": {
    object(ImageContext)
  },
}

Console

You can try the requests mentioned below on Google Vision API console

1. Text Detection (OCR)

The Vision API can detect and extract text from images. For example, a photograph might contain a street sign or traffic sign. The JSON includes the entire extracted string, as well as individual words, and their bounding boxes.
Here we are discussing only one feature that support OCR:

TEXT_DETECTION

Input Image

Sample Request

Image can be sent as a base64-encoded string, a Google Cloud Storage file location, or as a publicly accessible URL, ImageContext is optional & used very rarely based on your request parameters, so we are skipping it.

{
  "requests": [
    {
      "image": {
        "content": "/9j/7QBEUGhvdG9zaG9...base64-encoded-image-content...fXNWzvDEeYxxxzj/Coa6Bax//Z"
      },
      "features": [
        {
          "type": "TEXT_DETECTION"
        }
      ]
    }
  ]
}

Sample Response

{
  "responses": [
    {
      "textAnnotations": [
        {
          "locale": "en",
          "description": "ABBEY\nROAD NW8\nCITY OF WESTMINSTER\n",
          "boundingPoly": {
            "vertices": [
              {
                "x": 45,
                "y": 43
              },
              {
                "x": 269,
                "y": 43
              },
              {
                "x": 269,
                "y": 178
              },
              {
                "x": 45,
                "y": 178
              }
            ]
          }
        },
        {
          "description": "ABBEY",
          "boundingPoly": {
            "vertices": [
              ...
            ]
          }
        },
        {
          "description": "ROAD",
          "boundingPoly": {
            "vertices": [
             ...
            ]
          }
        },
        {
          "description": "NW8",
          "boundingPoly": {
            "vertices": [
             ...
            ]
          }
        },
        {
          "description": "CITY",
          "boundingPoly": {
            "vertices": [
             ...
            ]
          }
        },
        {
          "description": "OF",
          "boundingPoly": {
            "vertices": [
             ...
            ]
          }
        },
        {
          "description": "WESTMINSTER",
          "boundingPoly": {
            "vertices": [
             ...
            ]
          }
        }
      ]
    }
  ]
}

2. Label Detection

The Vision API can detect and extract information about entities within an image, across a broad group of categories.
Labels can identify objects, locations, activities, animal species, products, and more.
we need to specify LABEL_DETECTION as the value of features.type, same as above, done for TEXT_DETECTION:

LABEL_DETECTION

API implementation would be the same as described above, so moving to the next step:

Input Image

Sample Request

{
  "requests": [
    {
      "image": {
        "source": {
          "imageUri": "https://github.com/farhanh/raw-images/blob/master/ferris-wheel.jpg"
        }
      },
      "features": [
        {
          "type": "LABEL_DETECTION"
        }
      ]
    }
  ]
}

Sample Response

{
  "responses": [
    {
      "labelAnnotations": [
        {
          "mid": "/m/017rgb",
          "description": "ferris wheel",
          "score": 0.84832066
        },
        {
          "mid": "/m/010jjr",
          "description": "amusement park",
          "score": 0.8101249
        },
        {
          "mid": "/m/01d74z",
          "description": "night",
          "score": 0.8036025
        },
        {
          "mid": "/m/05b0n7k",
          "description": "outdoor recreation",
          "score": 0.68825835
        },
        {
          "mid": "/m/02jf28",
          "description": "fair",
          "score": 0.6566326
        }
      ]
    }
  ]
}

Please reach out to me in case of any query contact@farhanhabib.com and also don’t forget to applaud if you like this article. Thanks

Google-Cloud-Vision-API

Getting Started

API Implementation Schema

Endpoint

Authentication

Body

Console

1. Text Detection (OCR)

Input Image

Sample Request

Sample Response

2. Label Detection

Input Image

Sample Request

Sample Response

Written by Mohammad Farhan Habib