Introducing NLP Architect by Intel AI Lab
May 23, 2018
May 23, 2018
Many advances in Natural Language Processing (NLP) and Natural Language Understanding (NLU) in recent years have been driven by advancements in the field of deep learning with more powerful compute resources, greater access to useful data sets, and advances in neural network topologies and training paradigms. At Intel AI Lab, our team of NLP researchers and developers have been exploring the state-of-the-art deep learning topologies and techniques for NLP and NLU. And today, we would like to introduce NLP Architect, as an open source library to share with the community and to create a platform for future research and collaborations.
In the current version of NLP Architect, we’ve collected these features that we found interesting from both research perspectives and practical applications, including:
All the above models are provided with end-to-end examples of training and inference processes. In addition, we’ve included some of the functionalities often used when deploying these models, such as data pipelines, common functional calls, and utilities related to NLP. The library is modularized for easy integration. We look at these as a set of building blocks that were needed for implementing NLP use cases based on our pragmatic research experience.
This open and flexible library of NLP components provides the foundations for us to enable NLP solutions with our partners and customers. We are still actively incorporating new results from our research and data science into this stack to allow everyone to re-use what we’ve built and optimized. The library also provides us the platform for analysis and optimizations of Intel software and hardware on NLP workloads.
Some of the components, with provided pre-trained models, are exposed as REST service APIs through NLP Architect server. NLP Architect server is designed to provide predictions across different models in NLP Architect. It also includes a web front-end exposing the model annotations for visualizations. Currently, we provide 2 services, BIST dependency parsing and NER annotations. We also provide a template for developers to add a new service.
Developers can start by downloading the code from our GitHub repository and following the instructions to install NLP Architect. A comprehensive documentation for all the core modules and end-to-end examples can be found here. We look forward to receiving feedback, feature requests or pull request contributions from all users.
In our previous blog, we discussed that by building a stack of NLP components based on latest DL technologies, it allows us to build foundations to tackle many applications for our partners and customers. It also enables us to continuously incorporate new results from our research and data science into the stack. In future releases, we are planning to demonstrate these advantages with solutions including sentiment extraction, topic and trend analysis, term set expansion and relation extraction. We are also researching unsupervised and semi-supervised methods that will be introduced into interpretable NLU models and domain-adaptive NLP solutions.
Credits go to our team of NLP researchers and developers at Intel AI Lab, Peter Izsak, Anna Bethke, Daniel Korat, Amit Yaccobi, Andy Keller, Jonathan Mamou, Shira Guskin, Sharath Nittur Sridhar, Oren Pereg, Alon Eirew, Sapir Tsabari, Yael Green, Chinnikrishna Kothapalli.
Notices and Disclaimers
Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries.
*Other names and brands may be claimed as the property of others.
© Intel Corporation
The 2018 Conference of Computer Vision and Pattern Recognition (CVPR) takes place June 18th-22nd in Salt Lake City, Utah, USA. CVPR is known as the premier annual computer vision event consisting of poster sessions, co-located workshops, and tutorials. Intel’s presence at CVPR consists of 12 accepted papers/poster sessions, one competition, one Intel AI sponsored Doctoral Consortium, two…
Currently, more than 75% of all internet traffic is visual (video/images). Total traffic is exploding, projected to jump from 1.2 zettabytes per year in 2016, to 3.3 zettabytes in 2021, and visual data will comprise roughly 2.6 zettabytes of that. A major challenge for applications is how to process and understand this visual data, a…
Synthetic Genomics, Incorporated (SGI) is a synthetic biology company that aims to bring genomic-driven solutions to market. They design and build biological systems and conduct interdisciplinary research by combining biology and engineering to address global sustainability problems SGI asked for Intel’s help to conduct a deep learning proof of concept that would automatically tag a…
Get the latest from Intel AI