Deep Learning Foundations to Enable Natural Language Processing Solutions

Natural language processing (NLP) is one of the most familiar AI capabilities, having become ubiquitous through consumer digital assistants and chatbots as well as commercial applications like textual analysis of financial or legal records. Intel technology is enabling a variety of NLP applications through the advancement of hardware and software capabilities for deep learning and development of modular NLP components.

A Rising Tide of Deep Learning Capabilities

Many advances in NLP in recent years have been driven by the general advancement of the field of deep learning with more powerful compute resources, greater access to useful data sets, and advances in neural network topologies and training paradigms. These deep learning advances started by driving improvements in computer vision applications but benefit greatly the NLP field as well.

On the deep learning layers side, residual layers, highway layers, and dense connections were designed to make it easier for signal and gradient to reach all the layers in a deep network. With these, state-of-the-art computer vision was achieved by leveraging deep networks’ representational power. Meanwhile, they improved the performance on NLP tasks as well, such as using densely-connected recurrent layers for language models [Godin 2017].

More empirical investigations comparing convolution layers, recurrent layers, or a temporal convolution layer combining both ideas, enabled state-of-the-art results on a range of language datasets [Gehring 2017] [Bai 2018]. Being able to use these layers flexibly allows developers to experiment with various options when tackling a particular NLP problem.

On the deep learning topologies side, an auto-encoder model can be adapted as a sequence-to-sequence model to handle sequential language data. An attention mechanism addresses how a decoding network should respond to the input encodings over time. Pointer network, as a variant of attention model, is specialized in finding locations in the input sequence, which provides a new mechanism for machine reading comprehension [Wang 2017] and text summarization [See 2017]. By adding fast weights, [Ba 2016] incorporate the concept of short-term associative memory into long-term sequence learning.

On the training paradigm side, unsupervised learning leveraging the training data itself and transfer learning for building representations to apply to one task after another have made their ways from the computer vision domain into boosting NLP progresses.

Since these deep learning models share a lot of lower-level components, deep learning-based NLP solutions can share software and hardware with solutions for computer vision and many other AI capabilities. Optimizations to general software stacks for deep learning will be reflected in the performance of deep learning NLP solutions. Intel’s AI portfolio of hardware and software solutions provides great examples of these deep learning advances running on Intel® architecture. Recent work on our hardware and optimizations for widely-used deep learning frameworks has provided optimized performance for running commonly used models and computing motifs on Intel® Xeon® Scalable processors. Intel also actively contributes these efforts back to the open frameworks so every developer will have such experience out-of-the-box.

Building a Flexible, Modular Stack for NLP Use Cases

This allows us to have a new perspective when building foundations for NLP use cases, since deep-learning based NLP models often have common building blocks, such as DL layers and DL topologies. Some of the low-level capabilities are also needed for multiple applications. The availability of foundational elements in an open and flexible stack is particularly suitable for solving a variety of NLP problems.

In contrast, the traditional approach to machine learning or deep learning was to seek to solve one particular problem at a time. Today, because the deep learning community is providing many useful fundamental building blocks, enterprise users and data scientists can go the other direction, learning and building these foundations and how they can be adapted to suit a variety of problems.

There are several major benefits with this shift. First, these reusable components help us gradually build structural assets. We’re able to achieve faster time to value more easily by reusing what has been built before. Second, the functionalities and solutions built atop Intel’s unified hardware and software portfolio are sure to benefit from active software development and improvement. Additionally, experimenting with available building blocks can yield surprising new approaches and applications that we may not have discovered with our earlier problem-solving focus.

A flexible and modular stack still allows users to combine traditional NLP approaches with deep learning based approaches, and provides different levels of abstraction for different groups of users. Many varied enterprise use cases are demonstrating the potential of NLP and its foundational components. Several examples are provided below, but they are by no means the end of the story.

Topic Analysis

The financial industry faces a huge knowledge management challenge posed by the number of documents that must be processed and understood on a daily basis. It is difficult to extract key insights, such as the competitiveness of a certain product, from pages and pages of text.

NLP topic analysis can now be used to quickly parse a large collection of documents and identify the topics that different parts of the document are associated to. Different users will care about different topics, such as valuation of a certain company, competitiveness, leadership, or macroeconomics. NLP topic analysis enables users to filter to specific topics of interest and obtain more condensed information.

To leverage large amounts of unlabeled data, the model can be pre-trained by contiguous text, the representations can then be transferred to topic analysis or other additional tasks. An overview of some of the approaches involved in such a solution was introduced in an early blog. To achieve this, from the NLP building block perspective, we used sequence-to-sequence topology, LSTMs, transferred and fine-tuned word embeddings, combined with name entity recognition, etc.

Trend Analysis

Many domains—healthcare, manufacturing, finance, etc.—face the challenge of identifying time-based trends in large textual data sets. By combining capabilities such as text normalization, noun-phrase chunking and extraction, language models, corpus TF-IDF, and grouping using word vectors, we can quickly produce a solution that can extract keywords and estimates of importance from groups of documents. Then, by comparing these extracted keywords over time, we can detect useful trends, such as how weather changes can cause inventory shortages or what areas of academic research attract more contribution and attention over time.

Sentiment Analysis

Sentiment analysis capabilities are often used in competitive analysis, communication strategy optimization, and product or market analysis. A solution that also provides fine-grained analysis of sentiments provides even more actionable insights for enterprise users. For example, this more targeted sentiment analysis could find that reviews of a certain product indicate generally positive views of its power consumption but negative views of its reliability. For such fine-grained sentiment analysis, we utilized components such as POS tagging, text normalization, dependency parsing, and term expansion. For different domains, the same words can deliver different sentiments, so a mechanism allowing domain adaptive becomes very critical as well.

Flexible Building Blocks on Versatile Architecture

When we see the projection on the tremendous NLP market, how do we build the solutions, software and hardware to enable and utilize those opportunities? At Intel, we would like to build the technologies that can continue to innovate and improve, that can give us the open and flexible platform to research, practice and apply algorithms, and that can efficiently scale to multiple applications, eventually lead to impactful business insights.

At Intel AI Lab, our team of NLP researchers and developers are building an open and flexible library of NLP components to enable multiple NLP use cases for our partners and customers. It allows us to efficiently incorporate new results from our research and data science into the comprehensive stack to re-using what we’ve built and optimized. We will continue to work to optimize these components for increased deep learning capabilities.

Flexible, reliable, high-performance Intel AI product portfolio provides the hardware, framework tools, and software for these NLP applications, and also other AI and advanced analytics workflows. For more detail on these natural language processing use cases, please look for my session at O’Reilly AI Beijing, “Deep Learning-powered Natural Language Processing”. To learn more about AI on Intel architecture, please visit https://ai.intel.com.

References

[Ba 2016] Fast Weights to Attend to the Recent Past

[Gehring2017] Convolutional Sequence to Sequence Learning

[Godin2017] Improving Language Modeling using Densely Connected Recurrent Neural Networks

[See 2017] Get To The Point: Summarization with Pointer-Generator Networks

[Wang 2017] Machine Comprehension Using Match-LSTM and Answer Pointer

[Bai2018] An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling