Back to articles
A Case Study of Text Analytics in AllenNLP, Google, AWS, Azure and IBM Watson
by in

Recently, I was introduced to Allen Institute for AI and was impressed by AllenNLP. This Natural Language Processing (NLP) project is an open source deep learning toolkit with a set of pre-trained core models and applications mainly for NLP such as Semantic Role Labeling, Natural Entity Recognition (NER), and Textual Entailment. In this article, I review this solution and compare the performance of its NER model with text analytics APIs in Google Cloud, Amazon AWS, Microsoft Azure and IBM Watson.

 

The NER model implemented in AllenNLP is described in Deep contextualized word representations paper. In fact, they introduce a new type of deep contextualized word representation to facilitate the improvement of pre-trained word vectors. The main limitation of word vectors is the single context representation of each word. For instance, Boston represents a location in this sentence: “I ate pizza in Boston”; while it is part of an entity name in “I ate at Boston Pizza”. AllenNLP aims to address this issue by presenting the contextual feature, each word is presented in the context of its usage.

 

The other interesting feature of AllenNLP is that it provides a framework that you can easily evaluate your model or deploy it in production. For example, for their NER model, it is stored in an S3 bucket at:

 

https://s3-us-west-2.amazonaws.com/allennlp/models/ner-model-2018.04.26.tar.gz)

 

Now, with the help of AllenNLP package, you can swiftly call this model in Python and run your prediction:

 

from allennlp.predictors.predictor import Predictor
predictor = Predictor.from_path("https://s3-us-west-2.amazonaws.com/allennlp/models/ner-model-2018.04.26.tar.gz")
predictor.predict(
  sentence="Your subject sentence to be tagged"
)

 

The outcome is a list of objects including logits, mask, tags, and words. Let’s run our previous Boston examples:

 

from allennlp.predictors.predictor import Predictor
predictor = Predictor.from_path("https://s3-us-west-2.amazonaws.com/allennlp/models/ner-model-2018.04.26.tar.gz")
 
result_1st_sentence = predictor.predict(
  sentence="I ate pizza in Boston"
)
 
result_2nd_sentence = predictor.predict(
  sentence="I ate at Boston Pizza"
)
 
print(result_1st_sentence['words'],result_2nd_sentence['words'])
print(result_1st_sentence['tags'], result_2nd_sentence['tags'])

Here is the result:

 

['I', 'ate', 'pizza', 'in', 'Boston'] ['I', 'ate', 'at', 'Boston', 'Pizza']

['O', 'O', 'O', 'O', 'U-LOC'] ['O', 'O', 'O', 'B-ORG', 'L-ORG']

 

As we can see, the model correctly identified Boston as a location and organization name in the first and second sentences, respectively.

 

For this particular model, I extended my test on a real-life example. Some years ago, I was working on Syndicated Loan Agreements extracted from SEC 10-Q, 10-K and 8-K filings. As part of my project, I had to extract the name of the borrowers and lenders from the agreement. Those names are usually presented at the top of the contract. Here are a couple of examples:

 

U.S. $180,000,000

CREDIT AGREEMENT

Dated as of December 19, 1996

among

KATZ MEDIA CORPORATION,

as Borrower,

THE LENDERS PARTY HERETO,

and

DLJ CAPITAL FUNDING, INC.,

as Syndication Agent,

and

THE FIRST NATIONAL BANK OF BOSTON,

as Administrative Agent

ARRANGED BY:

DONALDSON, LUFKIN & JENRETTE SECURITIES CORPORATION

TABLE OF CONTENTS

 

 

 

Or:

 

FIVE-YEAR CREDIT AGREEMENT

Dated as of June 18, 2004

$1,500,000,000

-------------------------

J.P. MORGAN SECURITIES INC.

CITIGROUP GLOBAL MARKETS INC.,

as Co-Arrangers and Joint Bookrunners,

JPMORGAN CHASE BANK,

CITICORP USA, INC.,

as Co-Syndication Agents,

BNP PARIBAS

BANK OF AMERICA, N.A.

BARCLAYS BANK PLC

THE ROYAL BANK OF SCOTLAND PLC

as Co-Documentation Agents,

and

THE BANK OF NOVA SCOTIA,

as Administrative Agent

 

 

Working on more than 3700 contracts, I didn’t have any choice but automation of the process. To do so, I developed a complicated and long text processing function with NLTK to decompose the headers and extract the names. It worked well for almost 65% of the contracts which is not accurate but helpful. Having that experience before, l conducted an entity identification process with AllenNLP on a sample of 100 contract headers. I repeated the same exercise with four other vendors including AWS Comprehend from Amazon, Watson Natural Language Understanding from IBM, Azure Text Analytics from Microsoft, and Cloud Natural Language from Google. For each of names in the headers, I assigned 1 point for the right identification, 0.5 if it is partially identified, and -1 if it is misidentified. Here are examples of my tests:

 

Entities Google IBM AWS AllenNLP Azure
YORK INTERNATIONAL CORPORATION 1 1 0 0 0
CITIBANK, N.A. 1 1 1 1 0
JPMORGAN CHASE BANK 1 1 0 1 0
BANK OF TOKYO-MITSUBISHI TRUST COMPANY 1 0 0 0.5 1
FLEET NATIONAL BANK 0.5 1 0 0.5 0
NORDEA BANK FINLAND PLC 1 1 0 1 0
SALOMON SMITH BARNEY INC. 0 1 1 1 1
J.P.MORGAN SECURITIES, INC. 1 0 1 1 0
False Entities -3 -3 0 -4 -3
CITIGROUP GLOBAL MARKETS INC. 1 1 1 1 1
JPMORGAN CHASE BANK 1 1 1 1 1
CITICORP USA, INC 0 1 1 1 1
BNP PARIBAS 1 0 1 0.5 1
BANK OF AMERICA 1 0 0 0.5 0
BARCLAYS BANK PLC 1 1 1 1 0.5
THE ROYAL BANK OF SCOTLAND PLC 1 1 0 1 0.5
THE BANK OF NOVA SCOTIA 1 1 0 1 0
False Entities -1 1 0 -7 0
DLJ CAPITAL FUNDING, INC 0 0 1 1 0.5
THE FIRST NATIONAL BANK OF BOSTON 1 0 0 1 0
DONALDSON, LUFKIN & JENRETTE SECURITIES CORPORATION 1 0.5 0 1 1
False Entities -3 0 0 -1 -4
THE CHASE MANHATTAN BANK 1 0.5 1 0.5 0
THE BANK OF NOVA SCOTIA 1 1 0 0.5 0
CREDIT SUISSE FIRST BOSTON 1 1 0 0.5 0
FLEET NATIONAL BANK 1 1 0 1 0
False Entities -1 0 -2 -5 0

 

And the results, after running the APIs on 100 of contract headers:

 

 
Google IBM AWS AllenNLP Azure
Score without negative points 88% 69% 52% 82% 40%
Score with negative points 55% 63% 42% 19% 12%

 

First of all, please note that I ran the models right out of the box and did not do any text trimming or tuning on the models. Also, the focus is on the accuracy. Therefore, the other factors such as response speed or computation cost are not considered. Having said that, we can see in the table that IBM’s Watson has the best performance with 63% overall score followed by Google and AWS. However, if we put the negative scores aside we see that Google has the best performance with 88% score. More interestingly, we see that AllenNLP is the second with a very good score of 82%. The reason that I also considered the scores without penalty is that based on my observations, filtering false entities is an easy task. Most of the misidentifications are terms such as “Inc” or “Agent” which are fairly easy to filter.

 

Overall, I believe that AllenNLP performs really well compared to other major players in text analytics. More importantly, since it is an open source model and transparent, you can fine-tune and improve the performance based on your application which is a significant advantage compared to the others. Finally, we can see a significant difference in the performance of the models in my particular case which concludes that running a robust Text Analytics System is far from running an API from a vendor.

Back to articles
Try R-Brain for Free

R-Brain is a powerful data science platform, where you can build sophisticated models, collaborate with others, learn and experiment. Try for free, no credit card required.
Try for Free