Module8: Getting started with computer vision

The Computer Vision Service

The Computer Vision service is designed to help you extract information from images. It provides functionality that you can use for:

Description and tag generation - determining an appropriate caption for an image, and identifying relevant “tags” that can be used a keywords to indicate its subject.
Object detection - detecting the presence and location of specific objects within the image.
Face detection - detecting the presence, location, and features of human faces in the image.
Image metadata, color, and type analysis - determining the format and size of an image, its dominant color palette, and whether it contains clipart.
Category identification - identifying an appropriate categorization for the image, and if it contains any known celebrities or landmarks.
Brand detection - detecting the presence of any known brands or logos.
Moderation rating - determine if the image includes any adult or violent contents.
Optical character recognition - reading text in the image.
Smart thumbnail generation - identifying the main region of interest in the image to create a smaller “thumbnail” version.

You can provision Computer Vision as a single-service resource, or you can use the Computer Vision API in a multi-service Cognitive Services resource.

Image Analysis

To analyze an image, you can use the Analyze Image REST method or the equivalent method in the SDK for your preferred programming language, specifying the visual features you want to include in the analysis (and if you select categories, whether or not to include details of celebrities or landmarks). This method returns a JSON document containing the requested information.

You can also use scoped functions to retrieve specific subsets of the image features, including the image description, tags, and objects in the image.

The JSON response for image analysis looks similar to this:

{ "categories": [ { "name": "_outdoor_mountain", "confidence": "0.9"} ], "adult": {"isAdultContent": "false", …}, "tags": [ {"name": "outdoor", "confidence": 0.9}, {"name": "mountain", " confidence ": 0.9}], "description": { "tags":["outdoor", "mountain"], "captions": [ {"name": "A mountain with snow", "confidence": 0.9 } ] }, "metadata": {"width":60,"height":30, format:"Jpeg"}, "faces": [], "brands": [], "color": {"dominantColorForeground": "Brown",…}, "imageType": {"clipArtType": 0, …}, "objects" : [ { "rectangle": {x:20, y:25, w:10, h:20}, "object": "mountain", "confidence": 0.9 } ] }

Smart-Cropped Thumbnails

Thumbnails are often used to provide smaller versions of images in applications and websites. For example, a tourism site might display a list of tourist attractions in a city with a small, representative thumbnail image for each attraction; and only display the full image when the user selects the “details” page for an individual attraction.

The Computer Vision service enables you to create a thumbnail with different dimensions (and aspect ratio) from the source image, and optionally to use image analysis to determine the region of interest in the image (it's main subject) and make that the focus of the thumbnail. This ability to determine the region of interest is particularly useful when cropping the image to change its aspect ratio.

You can generate thumbnails with a width and height up to 1024 pixels, with a recommended minimum size of 50x50 pixels.

Search This Blog

AI-102