INSUBCONTINENT EXCLUSIVE:
Computer systems are getting quite good at understanding what people say, but they also have some major weak spots
Among them is the fact that they have trouble with words that have multiple or complex meanings
A new system called ELMo adds this critical context to words, producing better understanding across the board.
To illustrate the problem,
think of the word &queen.& When you and I are talking and I say that word, you know from context whether I&m talking about Queen Elizabeth,
or the chess piece, or the matriarch of a hive, or RuPaul Drag Race.
This ability of words to have multiple meanings is called polysemy
And really, it the rule rather than the exception
Which meaning it is can usually be reliably determined by the phrasing — &God save the queen!& versus &I saved my queen!& — and of
course all this informs the topic, the structure of the sentence, whether you&re expected to respond, and so on.
Machine learning systems,
however, don&t really have that level of flexibility
The way they tend to represent words is much simpler: it looks at all those different definitions of the word and comes up with a sort of
average — a complex representation, to be sure, but not reflective of its true complexity
When it critical that the correct meaning of a word gets through, they can&t be relied on.
ELMo (&Embeddings from Language Models&),
however, lets the system handle polysemy with ease; as evidence of its utility, it was awarded best paper honors at NAACL last week
At its heart it uses its training data (a huge collection of text) to determine whether a word has multiple meanings and how those different
meanings are signaled in language.
For instance, you could probably tell in my example &queen& sentences above, despite their being very
similar, that one was about royalty and the other about a game
That because the way they are written contain clues to your own context-detection engine to tell you which queen is which.
Informing a
system of these differences can be done by manually annotating the text corpus from which it learns — but who wants to go through millions
of words making a note on which queen is which
&We were looking for a method that would significantly reduce the need for human annotation,&
explained Mathew Peters, lead author of the paper
&The goal was to learn as much as we can from unlabeled data.&
In addition, he said, traditional language learning systems &compress all
that meaning for a single word into a single vector
So we started by questioning the basic assumption: let not learn a single vector, let have an infinite number of vectors
Because the meaning is highly dependent on the context.&
ELMo learns this information by ingesting the full sentence in which the word
appears; it would learn that when a king is mentioned alongside a queen, it likely royalty or a game, but never a beehive
When it sees pawn, it knows that it chess; jack implies cards; and so on.
An ELMo-equipped language engine won&t be nearly as good as a
human with years of experience parsing language, but even working knowledge of polysemy is hugely helpful in understanding a language.
Not
only that, but taking the whole sentence into account in the meaning of a word also allows the structure of that sentence to be mapped more
easily, automatically labeling clauses and parts of speech.
Systems using the ELMo method had immediate benefits, improving on even the
latest natural language algorithms by as much as 25 percent — a huge gain for this field
And because it is a better, more context-aware style of learning, but not a fundamentally different one, it can be integrated easily even
into existing commercial systems.
In fact, Microsoft is reportedly already using it with Bing
After all, it crucial in search to determine intention, which of course requires an accurate reading of the query
ELMo is open source, too, like all the work from the Allen Institute for AI, so any company with natural language processing needs should
probably check this out.
The paper lays down the groundwork of using ELMo for English language systems, but because its power is derived by
essentially a close reading of the data that it fed, there no theoretical reason why it shouldn&t be applicable not just for other
languages, but in other domains
In other words, if you feed it a bunch of neuroscience texts, it should be able to tell the difference between temporal as it relates to
time and as it relates to that region of the brain.
This is just one example of how machine learning and language are rapidly developing
around each other; although it already quite good enough for basic translation, speech to text and so on, there quite a lot more that
computers could do via natural language interfaces — if they only know how.