Revolutionizing Personal Assistant Through Understanding Actionable Requests in Human-to-Human Interactions

Intelligent Personal Assistant Apps

Intelligent Personal Assistant Applications (IPAA) are increasingly in use and are becoming essential parts of many people’s lives. IPAAs are designed to help humans with day to day tasks, queries, and actions such as initiating a call to someone or evoking a reminder to bring something somewhere.

Human-to-Human Personal Assistant Usages

Most of the popular personal assistant applications today are designed to interact with humans, that is, they carry out commands or answer queries made by a user. The user may convey those commands or queries using natural language speech or text. This type of interaction is referred to as Human-to-Machine (H2M) interaction. A significant step towards integrating IPAAs further in human lives is enriching them with the ability to understand interpersonal conversation. A new type of IPAAs is aiming to help fulfill user requests that are conveyed in Human-to-Human (H2H) interactions. These requests often originate from other people interacting with the user (e.g. spouse, team member, friend, etc.), and are transferred through textual communications such as SMS or IM.

A typical example of a human-to-human request may be to pick up someone from a specific location at a specific time, for example, “don’t forget to pick up John from school at 4 pm”. In this case, the IPAA’s task is to detect the semantic elements of the request such as whom to pick up, from which location and at what time. Finally, the IPAA needs to create a suitable reminder and help the user to fulfill the request.

Another example could be a request to call someone back depending on a given condition, such as “call me when you leave work please.” In this case, the IPAA’s task is to detect the semantic elements: whom to call and when, and create a corresponding reminder.

The Intel AI Lab team, in cooperation with Intel Labs, has developed a model for detecting the semantic elements of human to human requests. This model is based on the “intent_extraction” model[1] published as part of Intel’s nlp-architect open source library.  The output of this model is handled by the midu application, a personal time management reference app, developed by Intel’s wearable software team. midu receives these elements from the human to human request detection through a dedicated API, and further resolve them to the actual places, events, activities, etc. These, in turn, are used in the midu application; they are added to the user’s timeline, and triggered as contextual reminders, in accordance with their semantic “meaning” as extracted and resolved through this process. Figures 1 and 2 show the H2H request comprehension process, expressed in midu’s user experience as a timeline entry and reminder.

  Figure 1 – midu application – analyzing human-to-human textual messages, detecting and understanding the request and its semantic elements and creating a suitable timeline reminder.

Figure 2 – Analyzing an outgoing message containing a bring request. Semantic resolution and contextual triggering is performed by Intel’s midu technology.

The Challenges of Understanding Human-to-Human Interactions

The fundamental challenge of IPAAs is to perform semantic elements resolution.

Although still not solved and there is no off-the-shelf system that is 100% accurate, this challenge is widely addressed by H2M systems. H2H systems also share this challenge. Our work is built on top of H2M industry knowledge and practices in resolving semantic elements. However, understanding H2H requests raises three additional challenges:

The first challenge is to convert the informal language of text messages to formal language. Text messages may include informal language such as acronyms, abbreviations, and misspellings, as in the following informal text message:

The second challenge is filtering out bot messages. In some text messaging systems, bots send automatic messages containing advertisements or reminders. Those messages may be falsely detected as requests from other humans, and need to be filtered out.

The third challenge is detecting whether the message is indeed a request to perform an action. Since the vast majority of H2H messages are not requests to perform actions, the potential for falsely detecting action requests is high. Note that this challenge is currently bypassed by existing H2M systems by requiring the user to add a “wakeup word” before the command/request. The “wakeup word” is usually the system’s name, as in. “OK Google, please call John”. Future H2M systems may want to omit the requirement for a “wake-up word”, in which case they will face the same challenge as H2H systems of trying to detect whether the message is indeed a request/command.

An effective approach to overcoming this challenge is to break it down to sub-challenges. Table 1 shows a breakdown list of these sub-challenges along with textual examples:

Table 1 – H2H request-detection challenges

System Architecture and Method

The developed system is designed to overcome the H2H semantic comprehension challenges. The system includes 3 modules, with each module containing one or more blocks. Figure 3 illustrates the system’s architecture.

Following is a description of the system’s modules and their functionality:

  1. H2H preprocessing module: The text messages that are the input to this module undergo:
    1. Text normalization for converting the informal language of text messages to formal language.
    2. Bot filtering to filter out bot messages.

The text normalization component is based on supervised Neural Machine Translation (NMT) in which the training data comprises pairs of informal text messages and their corresponding formal text messages. The inference stage inputs an informal text message and outputs the predicted formal form of this message.

  1. Semantic elements detection module (also called slot classification): This module’s goal is to detect the main semantic elements of a request: subject, direct object and indirect objects, time and location. This module extracts Part of Speech tags, word embeddings and character embeddings as input to a deep Bidirectional-LSTM neural network classifier.
  2. H2H validation module: This is a post-processing module that is designed to handle the main challenges of H2H comprehension. It verifies that the message is indeed a request to perform an action. This module includes components for validating the tense of the request as well as validating that the request is neither a negation nor conditioned and that it is not a question. In addition, the module includes a component for verifying that the request is semantically valid. This component utilizes Multi-Layer Perceptron (MLP) based Word Sense Disambiguation (WSD) for detecting the meanings of the extracted semantic elements.

Figure 3 – System Architecture

To test the system, the Intel AI Lab team has assembled a dataset of 500 human-to-human messages. 385 out of the 500 messages include requests to perform actions whereas 115 messages do not include requests to perform actions. The messages were manually generated for the purpose of creating the dataset. The messages were also manually tagged. This dataset can be downloaded from NLP Architect library, an NLP library we introduced in May. Please note that this is a testing dataset, in order to train a system, a training dataset should be assembled. Table 2 describes the dataset and its tagging.

Table 2 – The dataset description

Experiments

The system’s evaluation with the above dataset included two sets of tests. The first test aimed to measure the quality of handling the challenges described in Table 1. Meaning, to what extent the model detects messages that include requests to perform actions and filters out messages that do not include such requests. Table 3 shows the request detection evaluation test results.

Table 3 – Request Detection Evaluation Results

We see that 89.7% of the messages that were classified by the system as including requests did, in fact, include requests to perform an action. This high precision rate is mainly achieved by the system’s ability to detect and filter out messages that are false positives.

The second test aimed to measure the quality of the semantic elements detection (i.e., the slot classification task). The system is configured to detect three main semantic elements: subjects, direct objects, and indirect objects. For each of those elements, the system extracts the head of the element. Table 4 shows the evaluation results of semantic elements detection in requests.

Table 4 – Request Semantic Elements Detection Evaluation Results

Future Work

In this work, the request was extracted from a single message that included the request predicate, i.e., call, bring, send, etc. For future work, we plan to include the ability to extract a request from the full context of the H2H conversation. This will enable the extraction of semantic elements that are related to the request but are mentioned in other messages during the conversation. For example, a message to pick up someone from a specific location may be followed by another later message stating the pickup request time.

Conclusions

A large step towards integrating IPAAs more fully into human everyday lives is enabling them to understand natural human-to-human language. In this work, we focused on understanding human to human requests. Understanding such requests raises various natural language processing challenges. We showed that by mapping the challenges and designing a dedicated model for each challenge, it is possible to achieve high precision in detecting requests. This enables the incorporation of human-to-human request comprehension algorithms in next-generation IPAAs such as the midu application, that result in the autonomous creation of timeline reminders.

Notices and Disclaimers

Results have been estimated or simulated using internal Intel analysis or architecture simulation or modeling and provided to you for informational purposes. Any differences in your system hardware, software or configuration may affect your actual performance.

No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.

Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade.

Intel, the Intel Logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries.

*Other names and brands may be claimed as the property of others.

[1] This model performs the semantic elements detection stage for our algorithm.