Module5: Natural Language Processing/language understanding
Introduction to Language Understanding
Natural language processing (NLP) is an area of artificial intelligence that deals with interpreting the semantic meaning of input in everyday language (spoken or written) such as you might use to communicate with another human. NLP solutions require a language model discern the meaning of natural language input. Often this activity is referred to as natural language understanding (NLU).
The Language Understanding service enables you to train a language model (referred to as a Language Understanding conversation app) that can interpret natural language utterances to determine the user's intent, and any entities to which the intent should be applied. For example, a user might submit the utterance “Switch the kitchen light on”, and the Language Understanding model might interpret the intent as "TurnOnDevice" (the user wants to turn something on) and detect the “kitchen light” entity as being the thing to which the intent should be applied.
You can use the REST interface or SDK to write code that defines, trains, and publishes a Language Understanding model; but it is more common to use the Language Understanding portal to create and manage Language Understanding solutions.
Having trained and published a Language Understanding model to determine intents and entities from utterances, you can use the model in your own software applications to interpret natural language input and perform the appropriate corresponding action - for example, by turning the kitchen light on.
Language Understanding Resources in Azure
To use the Language Understanding service to develop a natural language understanding solution, you require two kinds of resource in your Azure subscription:
An Authoring resource (which must be a Language Understanding - Authoring resource) that you can use to train your language understanding model.
A Prediction resource (which can be a Language Understanding - Prediction or Cognitive Services resource) to host your trained model and process requests from client applications.
Authoring resources can be created in one of three global geographic areas:
Asia Pacific (the resource is created in the Australia East Azure region)
Europe (the resource is created in the West Europe Azure region)
US (the resource is created in the West US Azure region)
To deploy a model, you prediction resource must be in an Azure location within the geographical area served by the authoring resource:
Asia Pacific:
Australia East
Europe:
France Central
North Europe
West Europe
UK South
US
All other locations
Intents and Utterances
Utterances are the phrases that a users might enter when interacting with an application that leverages your Language Understanding model. An intent represents a task or action the user wants to perform, or more simply the meaning of an utterance. You create a model by defining intents and associating them with one or more utterances.
For example, consider the following list of intents and associated utterances:
GetTime:
“What time is it?”
“What is the time?”
“Tell me the time”
GetWeather:
“What is the weather forecast?”
“Do I need an umbrella?”
“Will it snow?”
TurnOnDevice
“Turn the light on.”
“Switch on the light.”
“Turn on the fan”
None:
“Hello”
“Goodbye”
In a Language Understanding model, you must define the intents that you want your model to understand, so spend some time thinking about the domain your model must support, and the kinds of actions or information that users might request. In addition to the intents that you define, every model includes None intent that you should use to explicitly identify utterances that a user might submit, but for which there is no specific action required (for example, conversational greetings like “hello”) or that fall outside of the scope of the domain for this model.
After you've identified the intends your model must support, it's important to capture a variety of different example utterances for each intent. Collect utterances that you think users will enter; including utterances that mean the same thing, but which are constructed in a variety of different ways. Keep these guidelines in mind:
Capture a variety of different examples, or alternative ways of saying the same thing
Vary the length of the utterances from short, to medium, to long
Vary the location of the noun or subject of the utterance. Place it at the beginning, the end, or somewhere in between
Use correct grammar and incorrect grammar in different utterances to offer good training data examples
Entities
Entities are used to add specific context to intents. For example, you might define a TurnOnDevice intent that can be applied to multiple devices, and use entities to define the different devices.
Consider the following utterances, intents, and entities:
Utterance
Intent
Entities
What is the time?
GetTime
What time is it in London?
GetTime
Location (London)
What's the weather forecast for Paris?
GetWeather
Location (Paris)
Will I need an umbrella tonight?
GetWeather
Time (tonight)
What's the forecast for Seattle tomorrow?
GetWeather
Location (Seattle), Time (tomorrow)
Turn the light on.
TurnOnDevice
Device (light)
Switch on the fan.
TurnOnDevice
Device (fan)
Entity types
You can define entities in a number of ways:
Machine learned entities are the most flexible kind of entity, and should be used in most cases. You define a machine learned entity with a suitable name, and then associate words or phrases with it in training utterances. When you train your model, it learns to match the appropriate elements in the utterances with the entity.
List entities are useful when you need an entity with a specific set of possible values - for example, days of the week. You can include synonyms in a list entity definition, so you could define a DayOfWeek entity that includes the values “Sunday”, "Monday", “Tuesday”, and so on; each with synonyms like "Sun", “Mon”, "Tue", and so on.
Regular Expression or RegEx entities are useful when an entity can be identified by matching a particular format of string. For example, a date in the format MM/DD/YYYY, or a flight number in the format AB-1234.
Pattern.any() entities are used with patterns, which are discussed in the next topic.
Patterns and Pattern.any() Entities
If your model requires multiple intents for which typical utterances are likely to be similar, you can use patterns to disambiguate the intents with minimal utterance samples.
For example, consider the following utterances:
“Turn the kitchen light on.”
“Is the kitchen light on?”
“Turn the kitchen light off.”
The utterances are syntactically very similar, with only a few differences in words or punctuation. However, they represent three different intents (which could be named TurnOnDevice, GetDeviceStatus, and TurnOffDevice). Moreover, the intents could apply to a wide range of entity values. In addition to “kitchen light”, the intent might apply to "living room light", “bedside lamp”, "fan", television", or any other device that the model might need to support.
One approach to training the model would be to include associate possible combination of utterance for every possible entity with all three intents. However, a more efficient way to accomplish this is to define patterns that include utterance templates, like this:
TurnOnDevice:
“Turn the {DeviceName} on.”
“Switch the {DeviceName} on.”
“Turn on the {DeviceName}.”
GetDeviceStatus:
“Is the {DeviceName} on[?]”
TurnOffDevice:
“Turn the {DeviceName} off.”
“Switch the {DeviceName} off.”
“Turn off the {DeviceName}.”
These utterances include a placeholder for a Pattern.any() entity named DeviceName, reducing the number of utterances required to train the model. Note that the patterns can make use of optional elements, such as punctuation (for example, [?]) to provide additional cues that help identify the appropriate intent.
The patterns defined in the utterance templates, including the position of the Pattern.any() entity and any optional words or punctuation, helps the model identify the intents and entity values from fewer samples:
Utterance
Intent
Entity
Turn the kitchen light on.
TurnOnDevice
DeviceName (kitchen light)
Is the bedroom lamp on.
GetDeviceStatus
DeviceName (bedroom lamp)
Switch the TV off.
TurnOffDevice
DeviceName (TV)
More information
Prebuilt Models
You can create your own language models by defining all the intents and utterances it requires, but often you can leverage prebuilt model elements that encapsulate common intents and entities.
The Language Understanding service provides prebuilt model elements at three different levels of granularity:
Prebuilt Domains define complete language understanding models that include predefined intents, utterances, and entities. Prebuilt domains include Calendar, Email, Weather, RestaurantReservation, HomeAutomation, and others.
Prebuilt Intents include predefined intents and utterances, such as CreateCalendarEntry, SendEmail, TurnOn, AddToDo, and others.
Prebuilt Entities define commonly used entities, such as Age, Email, PersonName, Number, Geography, DateTime, and others.
Leveraging prebuilt model elements can significantly reduce the time it takes to develop a language understanding solution.
Training, Testing, Publishing, and Reviewing
Creating a language understanding model is an iterative process with the following activities:
Train a model to learn intents and entities from sample utterances.
Test the model interactively, or by submitting a batch of utterances with known intent labels and comparing the predicted intents to the known label.
Publish a trained model to a prediction resource and use it from client applications.
Review the predictions made by the model based on user input and apply active learning to correct misidentified intents or entities and improve the model.
By following this iterative approach, you can improve the language model over time based on actual user input, helping you develop solutions that reflect the way users indicate their intents using natural language.
Create a Language Understanding App
The Language Understanding service enables you to define an app that encapsulates a language model that applications can use to interpret natural language input from users, predict the users intent (what they want to achieve), and identify any entities to which the intent should be applied.
For example, a language understanding app for a clock application might be expected to process input such as:
What is the time in London?
This kind of input is an example of an utterance (something a user might say or type), for which the desired intent is to get the time in a specific location (an entity); in this case, London.
Note: The task of the language understanding app is to predict the user's intent, and identify any entities to which the intent applies. It is not its job to actually perform the actions required to satisfy the intent. For example, the clock application can use a language app to discern that the user wants to know the time in London; but the client application itself must then implement the logic to determine the correct time and present it to the user.
You can improve a Language Understanding app based on historical utterances submitted to the endpoint. This practice is called active learning.
Publishing Configuration Options
When you publish a Language Understanding app, you can select various publishing options.
Publishing slot
Every Language Understanding app has two publishing slots:
Staging. Use this slot to publish and test new versions of your language model without disrupting production applications.
Production. Use this slot for “live” models that are used by production applications.
Publish settings
Regardless of which slot you publish your Language Understanding app to, you can configure the following publish settings to enable specific behavior:
Sentiment Analysis. Enable this to include a sentiment score from 0 (negative) to 1 (positive) in predictions. This score reflect the sentiment of the input utterance.
Spelling correction. Enable this to use the Bing Spell Check service to correct the spelling on input utterances before intent prediction.
Speech Priming. Enable this if you plan to use the language model with the Speech service. This option send the model to the Speech service ahead of prediction to improve intent recognition from spoken input.
Processing Predictions
To consume your Language Understanding model in a client application, you can use the REST APIs or one of the programming language-specific SDKs.
Regardless of the approach used, requests for predictions are sent to a published slot (production or staging) and include the following parameters:
query - the utterance text to be analyzed.
show-all-intents - indicates whether to include all identified intents and their scores, or only the most likely intent.
verbose - used to include additional metadata in the results, such as the start index and length of strings identified as entities,
log used to record queries and results for use in Active Learning.
Prediction results
The prediction results consist of a hierarchy of information that you application must parse. When using the REST interface, the results are in JSON form. SDKs present the results as an object hierarchy based on the underlying JSON.
A typical response might look similar to this:
{ "query": "What's the time in Edinburgh?", "prediction": { "topIntent": "GetTime", "intents": { "GetTime": { "score": 0.9978 }, ... }, "entities": { "location": ["Edinburgh"], ... } }}The prediction results include the query utterance, the top (most likely) intent along with its confidence score, and the entities that were detected; which are provided as an object for each entity (fr example location) with a list of the instances of that entity that were detected (for example, “Edinburgh”). Depending on the options specified in the request, the results may also include any other intents that were identified as being a possible match, and details about the location of each entity in the utterance string.
Note: It's important to emphasize that the Language Understanding service enables you application to identify the intent of the user (in this case to find out the current time in Edinburgh). It is the responsibility of the client application to then perform whatever logic is necessary to fulfill the intent (so the Language understanding model does not return the actual time in Edinburgh - it simply indicates to the client application that this is the information that the user wants.)
Using a Container
So far we've considered the use of the Language Understanding service by consuming a prediction resource endpoint in your Azure subscription.
Like many other cognitive services, the Language Understanding service can also be deployed as a container, running in a local Docker host, an Azure Container Instance (ACI), or in an Azure Kubernetes Service (AKS) cluster.
The easiest way to manage the deployment of a Language Understanding container is to use the Docker command line tool.
Downloading the container image
The first step in using Language Understanding in a container is to use the docker command line tool to download the Language Understanding container image, like this:
docker pull mcr.microsoft.com/azure-cognitive-services/language/luis:latest
Export the Language Understanding app
Before you can deploy your Language Understanding app in a container, you need to export it in the appropriate packaged format. You can export a model from the Language Understanding portal by selecting the Export for container option, or you can export a published app directly from its endpoint using an HTTP GET request, like this:
The exported package is in *.gz (GZIP) format, which is what the container image expects.
Run the container
To run the container, use the docker run command. This example runs the container in a local Docker instance:
Note the following options when using docker run:
The mount parameters enable the container to access local folders. Specifically, the input mount must reference the folder containing your exported Language Understanding app package, and the output folder is where the service will write logs (including Language Understanding query logs that you can use for active learning).
The Eula, Billing, and ApiKey parameters are used the same way they are for any Cognitive Services container - specifying acceptance of the license agreement, the prediction endpoint to which usage data should be sent for billing, and a valid subscription key for your prediction resource.
Using Multiple Language Models
As you build more sophisticated natural language solutions, you may want to leverage multiple language models, each designed for a specific language domain. The challenge in doing this is determining which model to use to predict the intent for a given utterance.
The Dispatch tool is a command line utility that you can use to create a Language Understanding app that defines intents that correspond to a second tier of Language understanding apps. When a user submits an utterance, the dispatch language app predicts which second-tier language model can best service the request, and returns that intent to the calling application. The application than then submit the query to the Language Understanding app identified by the dispatch language app to determine the intent and entities.
You can find the Dispatch utility in its GitHub repo. To try it out for yourself, you can follow the tutorial in the Language Understanding documentation.
Speech and Language Understanding Integration
While many applications work with text-based natural language input, it's also common to see applications that engage with users through speech; for example, digital assistants on smart phones, home automation devices, and in-car systems.
The Speech SDK is most commonly used with the Speech service, but it also offers integration with the Language Understanding service; enabling you to use a language model to predict intents from spoken input.
To use the Speech SDK with a Language Understanding model, enable the Speech Priming publishing setting for your Language Understanding endpoint, and use the Speech SDK to write code that uses your Language Understanding prediction resource, as described in the next topic.
Intent Recognition with the Speech SDK
To use a Language Understanding model from the Speech SDK, your code should follow this pattern:
Use a SpeechConfig object to encapsulate the information required to connect to your Language Understanding prediction resource (not a Speech resource). Specifically, the SpeechConfig must be configured with the location and key of the Language Understanding prediction resource.
Optionally, use an AudioConfig to define the input source for the speech to be analyzed. By default, this is the default system microphone, but you can also specify an audio file.
Use the SpeechConfig and AudioConfig to create an IntentRecognizer object, and add the model and the intents you want to recognize to it's configuration.
Use the methods of the IntentRecognizer object to submit utterances to the Language understanding prediction endpoint. For example, the RecognizeOnceAsync() method submits a single spoken utterance.
Process the response. In the case of the RecognizeOnceAsync() method, the result is an IntentRecognitionResult object that includes the following properties:
Duration
IntendId
OffsetInTicks
Properties
Reason
ResultId
Text
If the operation was successful, the Reason property has the enumerated value RecognizedIntent, and the IntentId property contains the top intent name. Full details of the Language Understanding prediction can be found in the properties property, which includes the full JSON prediction.
Other possible values for Result include RecognizedSpeech, which indicates that the speech was successfully transcribed (the transcription is in the Text property), but no matching intent was identified. If the result is NoMatch, the audio was successfully parsed but no speech was recognized, and if the result is Cancelled, an error occurred (in which case, you can check the Properties collection for the CancellationReason property to determine what went wrong.)
You can integrate the Speech service with the Language Understanding service to create applications that can intelligently determine user intents from spoken input.
Comments
Post a Comment