The human brains may use next-word prediction to drive language processing.
Autocomplete is a feature used widely in search engines to predict the user’s query and provides suggestions as the user types. In texting applications, this feature is better known as predictive text. Behind these technologies are artificial intelligence models of language that excel at predicting the next word in a string of text. The most recent generation of predictive language models is now capable of learning the underlying meaning of language. Not only can they predict the word that comes next, but they can also perform tasks that require some degree of genuine understanding such as question answering, document summarising, and story completion.
Although these models were not specifically designed to mimic how the human brain processes language, a new study by scientists from the Massachusetts Institute of Technology (MIT) proposes that the underlying function of these models may be more similar to language-processing centres in the human brain than initially thought. Computer models that perform well on other types of language tasks do not share this similarity to the human brain thereby suggesting that our brains may use next-word prediction to drive language processing.
“The better the model is at predicting the next word, the more closely it fits the human brain,” said Nancy Kanwisher, the Walter A. Rosenblith Professor of Cognitive Neuroscience, a member of MIT’s McGovern Institute for Brain Research and Center for Brains, Minds, and Machines (CBMM), and an author of the new study. “It’s amazing that the models fit so well, and it very indirectly suggests that maybe what the human language system is doing is predicting what’s going to happen next.”
New, high-performing next-word prediction models use deep neural networks that contain computational “nodes.” These nodes form connections of varying strength and layers that pass information between each other in prescribed ways. Deep neural networks have been used to create models of vision that can recognise objects as well as the primate brain does. According to research at MIT, the underlying function of such visual object recognition models matches the organisation of the primate visual cortex.
In the current study, the MIT team used a similar approach to compare language-processing centres in the human brain with language-processing models. The researchers analysed 43 different language models, including several that are optimised for next-word prediction like GPT-3 (Generative Pre-trained Transformer 3), which, given a prompt, can generate text similar to what a human would produce. Other models were designed to perform different language tasks such as filling in a blank sentence.
Upon presenting each model with a string of words, the researchers measured the activity of the nodes that make up the network and compared these patterns to activity in the human brain. The human datasets included functional magnetic resonance (fMRI) data and intracranial electrocorticographic measurements taken in people undergoing brain surgery for epilepsy. Comparisons were made by measuring subjects performing three language tasks: listening to stories, reading sentences one at a time, and reading sentences in which one word is revealed at a time.
The results of their study revealed that the best-performing next-word prediction models had activity patterns that very closely resembled those seen in the human brain. Further research showed that activity in those same models was also highly correlated with human behavioural measures such as how fast people are able to read the text.
“We found that the models that predict the neural responses well also tend to best predict human behaviour responses, in the form of reading times. And then both of these are explained by the model performance on next-word prediction. This triangle really connects everything together,” Schrimpf explained.
Predictive models like GPT-3 are equipped with a feature known as a forward one-way predictive transformer that can make predictions of what is going to come next based on a very long prior context (hundreds of words) instead of just the last few words. While scientists have not found any brain circuits or learning mechanisms that correspond to this type of processing, Tenenbaum revealed that their new findings are consistent with previous hypotheses that prediction is one of the key functions in language processing.
Presently, the researchers are planning to build variants of these language processing models to analyse how small changes in their architecture can affect their performance and ability to fit human neural data. They are also keen to try combining these high-performing language models with some previously developed computer models by Tenenbaum’s lab that can perform other kinds of tasks such as constructing perceptual representations of the physical world.
“If we’re able to understand what these language models do and how they can connect to models which do things that are more like perceiving and thinking, then that can give us more integrative models of how things work in the brain,” Tenenbaum said. “This could take us toward better artificial intelligence models, as well as giving us better models of how more of the brain works and how general intelligence emerges, than we’ve had in the past.” [APBN]
Source: Schrimpf et al. (2020). The neural architecture of language: Integrative modeling converges on predictive processing. Proceedings of the National Academy of Sciences, 118(45), e2105646118.