Module11: reading text in documents and images

 

Computer Vision Options for Reading Text

The Computer Vision service offers two APIs that you can use to read text.

  • The OCR API:

    • Use this API to read small to medium volumes of text from images.

    • The API can read text in multiple languages.

    • Results are returned immediately from a single function call.

  • The Read API:

    • Use this API to read small to large volumes of text from images and PDF documents.

    • This API uses a newer model than the OCR API, resulting in greater accuracy.

    • The Read API can read printed text in multiple languages, and hadwritten text in English.

    • The initial function call returns an asynchronous operation ID, which must be used in a subsequent call to retrieve the results.

  • Using the OCR API

    To use the OCR API, call the Ocr REST function (or the equivalent SDK method) passing the image URL or binary image data, and specifying the language of the text to be detected (with a default value of en for English), and optionally the detectOrientation parameter to return information about orientation of the text in the image.

    The resulting JSON describes the text detected, broken down into regions of the image, and then further down into lines, and then individual words; like this:

  • { "language": "en", "textAngle": 0.00000, "orientation": "Up", "regions": [ { "boundingBox": "462,379,497,75", "lines": [ { "boundingBox": "462,379,497,74", "words": [ { "boundingBox": "462,379,41,73", "text": "Hello" }, { "boundingBox": "523,379,153,73", "text": "World!" } ] } ] } ] }

Using the Read API

To use the Read API, call the Read REST function (or equivalent SDK method), passing the image URL or binary data, and optionally specifying the language the text is written in (with a default value of en for English).

The Read function returns an operation ID, which you can use in a subsequent call to the Get Read Results function in order to retrieve details of the text that has been read. Depending on the volume of text, you may need to poll the Get Read Results function multiple times before the operation is complete.

The results of from the Read API are similar to the OCR API, except that the text is broken down by page, then line, and then word. Additionally, the text values are included at both the line and word levels, making it easier to read entire lines of text if you don't need to extract text at the individual word level.


{ "status": "succeeded", "createdDateTime": "2019-10-03T14:32:04Z", "lastUpdatedDateTime": "2019-10-03T14:38:14Z", "analyzeResult": { "version": "v3.0", "readResults": [ { "page": 1, "language": "en", "angle": 49.59, "width": 600, "height": 400, "unit": "pixel", "lines": [ { "boundingBox": [ 20,61,204,64,204,84,20,81], "text": "Hello world!", "words": [ { "boundingBox": [ 20,62,48,62,48,83,20,82], "text": "Hello", "confidence": 0.91 }, { "boundingBox": [ 51,62,105,63,105,83,51,83], "text": "world!", "confidence": 0.164 } ] } ] } ] } }


The Form Recognizer Service

The Form Recognizer service enables you to extract data from forms, including a semantic understanding of the fields in the form and their corresponding values.

The Form Recognizer service provides the following capabilities:

  • Use prebuilt models to extract data from:

    • Receipts

    • Invoices

    • Business cards

  • Train custom models from your own forms using:

    • Unsupervised learning (with unlabeled forms)

    • Supervised learning (with labeled forms)

  • Prebuilt Models

    Form Recognizer includes prebuilt models that you can use for common form extraction tasks.

    To use the prebuilt models, use the REST API (or SDK) to call the model-specific function to start the analysis process and receive a result ID. You can then make a subsequent call to the model-specific Get Results function, passing the result ID to retrieve the results.

    Receipts

    The prebuilt model for receipts enables you to extract common receipt fields, including:

    • MerchantName

    • MerchantAddress

    • MerchantPhoneNumber

    • TransactionDate

    • TransactionTime

    • Items

      • Quantity

      • Name

      • TotalPrice

    • Subtotal

    • Tax

    • Tip

    • Total

    Invoices

    The prebuilt model for invoices extracts text and tables such as you commonly find in invoices, and identifies named fields such as:

    • CustomerName

    • CustomerId

    • PurchaseOrder

    • InvoiceId

    • InvoiceDate

    • DueDate

    • VendorName

    • VendorAddress

    • VendorAddressRecipient

    • CustomerAddress

    • CustomerAddressRecipient

    • BillingAddress

    • BillingAddressRecipient

    • ShippingAddress

    • ShippingAddressRecipient

    • SubTotal

    • TotalTax

    • InvoiceTotal

    • AmountDue

    • ServiceAddress

    • ServiceAddressRecipient

    • RemittanceAddress

    • RemittanceAddressRecipient

    • ServiceStartDate

    • ServiceEndDate

    • PreviousUnpaidBalance

    Business Cards

    The prebuilt model for business cards extracts information such as:

    • ContactNames

      • FirstName

      • LastName

    • CompanyNames

    • Departments

    • JobTitles

    • Emails

    • Websites

    • Addresses

    • MobilePhones

    • Faxes

    • WorkPhones

    • OtherPhones

  • Training Custom Models without Labels (Unsupervised)

    If the prebuilt models don't provide what you need, you can use Form Recognizer to train a custom model based on your own sample forms.

    The simplest way to train a custom model is to use an unsupervised learning technique in which you train the model using unlabeled sample forms. Form recognizer analyzes the forms to determine their text layout, and detects key-value pairs and tables that contain the form data. This layout and field mapping information is then used to train a model that can extract data from similar forms.

    To train a model with unlabeled sample forms:

    1. Upload at least 5 sample image or PDF forms to an Azure Storage blob container.

    2. Generate a shared access security (SAS) URL for the container.

    3. Use the Train Custom Model REST API function (or equivalent SDK method) to start training using the forms, passing the SAS URL for the container.

    4. Use the Get Custom Model REST API function (or equivalent SDK method) to get the trained model ID.

  • Training Custom Models with Labels (Supervised)

    Training a custom form recognizer model with unlabeled forms often provides adequate results; but if your form is complex, or you need to define explicit field mappings, you can use a supervised learning approach and train your model using labeled forms.

    To train a custom model using labeled sampled forms:

    1. Store sample forms in an Azure blob container, along with JSON files containing layout and label field information.

      • You can generate an ocr.json file for each sample form using the Form Recognizer's Analyze Layout function. Additionally, you need a single fields.json file describing the fields you want to extract, and a labels.json file for each sample form mapping the fields to their location in that form.

    2. Generate a shared access security (SAS) URL for the container

    3. Use the Train Custom Model REST API function (or equivalent SDK method) with the useLabelFile parameter set to true to train the model.

    4. Use the Get Custom Model REST API function (or equivalent SDK method) to get the trained model ID.

    Alternatively, a simpler approach is to use the Form Recognizer Sample Labeling Tool to connect to the SAS URL where the sample forms are stored, interactively label the sample forms to generate the necessary field mapping files, and train the model.''

  • Using a Custom Model

    To extract form data using a custom model, use the Analyze Form REST API function (or equivalent SDK method) with your custom model ID. This functionstarts the form analysis and returns a result ID, which you can pass in a subsequent call to the Get Analyze Form Result function to retrieve the results.

    The specific structure of the results depends on the fields in your forms, and the approach used to train your model. If you trained the model using unlabeled sample forms, the results are returned in a pageResults list, as shown below. If you used labeled forms to train the model, the results are returned in the documentResults list.

{ "status": "succeeded", "createdDateTime": "2020-08-21T00:46Z", "lastUpdatedDateTime": "2020-08-21T00:46Z", "analyzeResult": { "version": "2.0.0", "readResults": [ { ... } ] "pageResults" : [ { "page": 1, "keyValuePairs": [ { "Key": { "Text": "Order Date", ... }, "Value": { "Text": "01/01/2021", ... }, ... } ] } ], "documentResults" : [ ... ] } }


LABS

Extract Data from Forms

Suppose a company needs to automate a data entry process. Currently an employee might manually read a purchase order and enter the data into a database. You want to build a model that will use machine learning to read the form and produce structured data that can be used to automatically update a database.

Form Recognizer is a cognitive service that enables users to build automated data processing software. This software can extract text, key/value pairs, and tables from form documents using optical character recognition (OCR). Form Recognizer has pre-built models for recognizing invoices, receipts, and business cards. The service also provides the capability to train custom models. In this exercise, we will focus on building custom models.


Read Text in Images

Optical character recognition (OCR) is a subset of computer vision that deals with reading text in images and documents. The Computer Vision service provides two APIs for reading text, which you'll explore in this exercise.

Comments

Popular posts from this blog

Module6: QnA Maker and Module7: Conversational AI and Azure Bot service