NLP Architect Version 0.3 Release

Intel® AI Lab is excited to release a set of new features for NLP Architect in version 0.3 of the library. Back in May 2018 at the inaugural Intel® AI DevCon (Intel® AIDC) in San Francisco, Intel AI Lab announced its first release of NLP Architect, an open source library for natural language processing (NLP) research and development. We have continued to develop and add new features, including additional NLP models and solutions in release 0.2. Today’s release continues on this path.
Now let’s look at some of the highlights of this release.

NLP Application: Topics and Trends Analysis

For the current release of NLP Architect, we have created an additional application: Topics and Trends Analysis. Users are often overwhelmed with information and are primarily interested in seeing phrases that summarize the key topics instead of combing through the data. Topical datasets can change over time as subtopics are added, changed or removed. Seeing the trend of extracted phrases is also of interest for users who want to know how their data is changing. The Topics and Trends Analysis application tackles both of these tasks.

An example of cold (red) and hot (green) trends

This application consists of 2 major parts, as seen in the figure below.

The top process is called Topic Analysis. This module utilizes NLP Architect’s noun phrase extractor to extract key phrases from text files and apply a scoring algorithm to guess how prominent a phrase is in the context of the processed corpora.

The second process requires two collections of key phrases extracted using the Topic Analysis algorithm. It ingests all the generated data and presents a collection of visual aids that help analyze emerging topics (hot topic) or out-dated topics (cold topic), clusters of topics and the ability to further process topics. We created a UI alongside these algorithms so users can try out this application using their own data.

Similar to Term Set Expansion which was made available in the previous release of NLP Architect, an integrated application on trend and topic analysis demonstrates representative workflow and allows users to visualize how to utilize some of the building blocks in NLP Architect to build more end-to-end applications for their use cases.

To learn how to run Topics and Trend Analysis on your data, visit the documentation website.

More Demos on NLP components

We understand NLP model results are often hard to visualize, so we added demonstrations for four components of NLP Architect: Named Entity Recognition, Intent Extraction, Dependency Parser, and Machine Reading comprehension. Check out the new UI in the screenshots below and visit this guide to learn how to fire up NLP Architect’s demos.

New front page of NLP Architect Demos

NER demo with new UI

Sparse\Neural Machine Translation Model

Neural Machine Translation (NMT) models are often over-parameterized and tend to be very large. Compressed models will serve as a good baseline for the developer community to develop optimized software kernels that leverage sparsity and quantization for efficient inference on limited-resource devices.

We integrate the pruning mechanism proposed by Zhu and Gupta into Google’s NMT architecture and demonstrated how to prune the GNMT model up to 90% sparsity during training while maintaining a comparable accuracy (up to 1.5 loss in BLEU score). In addition, we show how to further compress the highly sparse models by uniform quantization of the weights to 8-bit Integer format.

Further documentation, examples for training and running inference, and two pre-trained sparse models can be found here.

Semantic Relation Identification

We implemented several semantic relation identification models in NLP Architect. These models detect semantic relations between two entities based on external knowledge resources. For example, the terms Big Blue and IBM have a redirect connection entry in Wikipedia which implies that they are opaque synonyms. In this version, we utilized external resources such as  Wikipedia, Wordnet, VerbOcean, word embeddings and pre-processed databases. We deployed these semantic relation identifiers in a Cross Document Coreference model that detects whether mentions of an entity in two different documents refer to the same entity. These are early stage models, and we encourage users to find additional documentation and configuration details here.

We invite developers and researchers to try out NLP Architect, and we look forward to receiving feedback, feature requests and pull request contributions from all users. For a full list of additions and changes in this new version, please see the v0.3 release notes.

Acknowledgments

Credits go to our team of NLP researchers and developers at the Intel AI Lab: Peter Izsak, Anna Bethke, Daniel Korat, Amit Yaccobi, Jonathan Mamou, Shira Guskin, Sharath Nittur Sridhar, Oren Pereg, Alon Eirew, Sapir Tsabari, Yael Green, Chinnikrishna Kothapalli, Yinyin Liu, Guy Boudoukh, Ofir Zafrir and Maneesh Tewani.

Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration. No computer system can be absolutely secure. Check with your system manufacturer or retailer or learn more at intel.com.
Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries.
*Other names and brands may be claimed as the property of others.
© Intel Corporation.