Intelligent Agents : Machine Reading Comprehension
On 5th of Jan this year, an AI model for the first time outperformed humans in reading comprehension. The SLQA+ (ensemble) model from Alibaba recorded an Exact Match score of 82.44 against the human score of 82.304 , on the SQuAD dataset.
It turns out that the Microsoft r-net+ (ensemble) model had achieved 82.650 two days prior to that. And since then, two other models have also gone on to beat the human EM score. While none of the models have beaten the human F1 Score ( precision, recall) of 91.21 yet, these events further underline the frantic pace at which the RC models are evolving, which is great news because Reading Comprehension ( RC) is a key element of intelligent agent systems.
Intelligent Agents & Machine Reading Comprehension
Building intelligent agents with the ability to answer open -domain/even closed domain questions with high accuracy, has been a key goal of most AI labs. Intelligent agents with RC and Question-Answer ( QA) abilities can help AI personal assistant systems like Alexa, Google Assistant, Siri, Cortana etc. perform better and help enterprises in having intelligent agent bots supplement human agents or directly process chat & messaging traffic and maybe even voice to some extent.
Machine Comprehension/Machine Reading Comprehension/Machine Reading models enable computers to read a document and answer general questions against it. While this is a relatively elementary task for a human, its not that straightforward for AI Models. There are multiple NTM (Neural Turning Machine), Memory Network and Attention Models for Reading Comprehension available.The list of SQuAD models can be accessed here.
As a first step towards building our Intelligent agent system ( humanly.ai ), we are also building a Machine Reading system. Our implementation is based on the BiDAF ( Bi-Directional Attention Flow ) ensemble model & Textual Entailment. It’s still work in progress ( EM 67%, and F1 77%), and sometimes it gives funny answers but you can try it out here.
One of the basic challenges we faced was handling the questions which would require a yes/no type of answers ( a further inference between the question, answer and the document) — and hence the implementation of the Textual Entailment module. The other observation was to respond back in full sentences ( “Yes, Narendra Modi is the Prime Minister of India” instead of a “Yes”, to the question is “Is Narendra Modi the Prime Minister of India?”), and for that as the next product increment we are currently planning on implementing the Seq2Seq model to format our responses.
But one major challenge, all Machine Reading systems face, especially when it comes to practical implementations for specific domains or verticals is the absence of supervised learning data ( labelled data) for that domain. All the contemporary, reading Comprehensions models are built on supervised training data, with labelled questions and answers/ paragraph with the answer etc. So when it comes to new domains, whilst the Enterprises have artefacts & data, the absence of labelled data presents a challenge.
We are currently experimenting with an ensemble of Machine Reading Comprehension models, each trained on a specific dataset, so that the learning is incremental. While, the scores are improving for the model, but the need for labelled domain data to train the MRC model in the first place still persists. Towards this problem, I came across two very neat solutions which attempt domain transference from Microsoft — SynNet and ReasoNet, that we intend to explore further.
The ‘two stage Synthesis Networks’ or SynNet model first gets trained on supervised data for a given vertical and learns the technique to identify patterns for critical information ( named entities, knowledge points etc.), and then generates questions around these answers. Once trained, it can then generate pseudo questions & answers against artefacts for the new domain. These can then be used to train the MRC on the new domain.
The Reasoning Network or ReasoNet essentially uses Reinforcement learning to dynamically figure when it has enough information to answer a question, and that it should stop reading. This is a deviation from the approach of using a fixed number of turns during the process of inferring the relationship between the questions, artefacts and the answers. This has also performed exceptionally when on the SQuAD dataset.
We shall overcome
As various models continue to emerge, its a reasonable guess that sooner rather than later ( especially catalyzed by the availability of so many datasets that are themselves growing rapidly, btw MS MARCO V2 becomes available on 01/03/2018 ) that Machine Comprehension Models will be able to overcome the key challenges, and get us closer to the goal of Intelligent Agents that can be trained on standard documents & answer general questions — as humans do ( which also happens to be the byline for humanly.ai btw :P )
I do hope you were able to look past the blatant plugs for humanly.ai :P, and found the post useful in getting some basic understanding of Machine Comprehension. As always, do leave your comments & thoughts — including any aspects that I might have missed. I will be more than happy to incorporate them.
Disclaimers: The above post in no way claims any copyright to any of the images or literature presented.