INSUBCONTINENT EXCLUSIVE:

Computer systems are getting quite good at understanding what people say, but they also have some major weak spots

Among them is the fact that they have trouble with words that have multiple or complex meanings

A new system called ELMo adds this critical context to words, producing better understanding across the board. To illustrate the problem,

think of the word &queen.& When you and I are talking and I say that word, you know from context whether I&m talking about Queen Elizabeth,

or the chess piece, or the matriarch of a hive, or RuPaul Drag Race. This ability of words to have multiple meanings is called polysemy

And really, it the rule rather than the exception

Which meaning it is can usually be reliably determined by the phrasing — &God save the queen!& versus &I saved my queen!& — and of

course all this informs the topic, the structure of the sentence, whether you&re expected to respond, and so on. Machine learning systems,

however, don''t really have that level of flexibility

The way they tend to represent words is much simpler: it looks at all those different definitions of the word and comes up with a sort of

average — a complex representation, to be sure, but not reflective of its true complexity

When it critical that the correct meaning of a word gets through, they can''t be relied on. ELMo (&Embeddings from Language Models&),

however, lets the system handle polysemy with ease; as evidence of its utility, it was awarded best paper honors at NAACL last week

At its heart it uses its training data (a huge collection of text) to determine whether a word has multiple meanings and how those different

meanings are signaled in language. For instance, you could probably tell in my example &queen& sentences above, despite their being very

similar, that one was about royalty and the other about a game

That because the way they are written contain clues to your own context-detection engine to tell you which queen is which. Informing a

system of these differences can be done by manually annotating the text corpus from which it learns — but who wants to go through millions

of words making a note on which queen is which We were looking for a method that would significantly reduce the need for human annotation,&

explained Mathew Peters, lead author of the paper

&The goal was to learn as much as we can from unlabeled data. In addition, he said, traditional language learning systems &compress all that

meaning for a single word into a single vector

So we started by questioning the basic assumption: let not learn a single vector, let have an infinite number of vectors

Because the meaning is highly dependent on the context. ELMo learns this information by ingesting the full sentence in which the word

appears; it would learn that when a king is mentioned alongside a queen, it likely royalty or a game, but never a beehive

When it sees pawn, it knows that it chess; jack implies cards; and so on. An ELMo-equipped language engine won''t be nearly as good as a

human with years of experience parsing language, but even working knowledge of polysemy is hugely helpful in understanding a language. Not

only that, but taking the whole sentence into account in the meaning of a word also allows the structure of that sentence to be mapped more

easily, automatically labeling clauses and parts of speech. Systems using the ELMo method had immediate benefits, improving on even the

latest natural language algorithms by as much as 25 percent — a huge gain for this field

And because it is a better, more context-aware style of learning, but not a fundamentally different one, it can be integrated easily even

into existing commercial systems. In fact, Microsoft is reportedly already using it with Bing

After all, it crucial in search to determine intention, which of course requires an accurate reading of the query

ELMo is open source, too, like all the work from the Allen Institute for AI, so any company with natural language processing needs should

probably check this out. The paper lays down the groundwork of using ELMo for English language systems, but because its power is derived by

essentially a close reading of the data that it fed, there no theoretical reason why it shouldn''t be applicable not just for other

languages, but in other domains

In other words, if you feed it a bunch of neuroscience texts, it should be able to tell the difference between temporal as it relates to

time and as it relates to that region of the brain. This is just one example of how machine learning and language are rapidly developing

around each other; although it already quite good enough for basic translation, speech to text and so on, there quite a lot more that

computers could do via natural language interfaces — if they only know how.

Machines learn language better by using a deep understanding of words