understanding can emerge on a large scale
Since these models appeared, researchers and philosophers have wondered whether they are complex “stochastic parrots” or whether they are actually capable of understanding and will ultimately be more intelligent than us.
This is an important question: If basic models – the catch-all term – can actually understand the way humans do it, it could be a step towards superintelligence, or even a form of artificial sentience.
The debate has been murky because people often talk over each other due to different definitions of “meaning” and “understanding.” But Amazon Web Services (AWS) researchers have proposed using non-controversial definitions to help align the discussion – and according to these definitions, they believe that understanding basic patterns is not only possible but inevitable as these models evolve.
Stefano Soato
“These definitions are useful in the sense that they align with definitions that people have used in mathematical logic, as well as in mathematical model theory, as well as in epistemology and dynamical systems and linguistics, etc.” , said Stefano Soatto, Vice President of Amazon Web Services and distinguished scientist, co-author of an article on the subject.
For “meaning”, researchers use a definition first proposed by the 17th Antoine Arnauld, philosopher of the century, who defined it as the relationship between the linguistic form (the text on which the models are trained) and the thing or concept to which the linguistic form points (which is external to language). The word “tree” is not a tree, but it refers to the concept of “tree”. The meaning of the word tree is its relationship with the concept “tree”.
Soatto and fellow applied scientist Matthew Trager explain in a paper published earlier this month that these meanings can be represented either as vectors in an abstract space or as probability distributions in the same space. This allows us to discuss the notions of proximity (which meanings are similar), and to some extent implication (what implies what) and causality (what follows what).
They define “understanding” as equivalence classes of expressions, which form a representation of that meaning – or the relationship between the word and what it refers to. Large foundation models represent such equivalence classes, considered either as vectors or as a distribution of continuations. This allows them to reason and operate on meaning without storing every detail, they said.
Consider the number pi (π). Pi is an infinite, non-repeating decimal number (3.14159265358979…). No one could memorize or store in their brain the entire infinite sequence of numbers. But we can still understand and work with the concept of pi.
How? By having a representation in our mind that captures the essential properties of pi – that it is the ratio of the circumference of a circle to its diameter, that it begins with 3.14, that it continues indefinitely without repeating itself, etc. With this mental representation, we do not need to recall the entire endless string of numbers.
“Our representation is an abstract concept, just like the meaning of a sentence,” Soatto said in a phone call.
More importantly, this representation allows us to answer specific questions and do things with the concept of pi. For example, if I asked you what the millionth digit of pi is, you wouldn’t need to recite a million digits to prove that you understand pi. Your mental representation would allow you to solve or research it, using a limited amount of gray matter and time.
Thus, understanding an abstract concept means forming a mental model or representation that captures the relationship between the language describing a concept and the key properties of that concept. This model or mental representation can be reasoned about and interrogated using our limited human memory, mental processing power, and access to data – rather than trying to store literally an infinite amount of raw data. Constructing a concise but powerful representation, according to this view, demonstrates true understanding.
with scale, the researchers believe, these models begin to understand
Models trained as predictors of the next token learn stochastic models. But over time, they identify the relationships – the meanings – between all the data they have seen and, as they evolve, develop their own internal languages describing these relationships. Internal languages emerge when a model grows to hundreds of billions of parameters, Soatto says.
Humans acquire language using only about 1.5 megabytes of data (less than a 1980s floppy disk). “Humans are proof that you don’t need specialized logic hardware or dedicated hardware with strict logic constraints to do math,” Soatto said.
But Trager and Soatto suggest that large-scale models must “find their way” around their large “brains” to eliminate redundant information and improve relational discovery. In doing so, these models can improve their ability to discover meaningful relationships within the data they are trained on.
As this relational discovery improves with scale, researchers believe these models begin to understand; in other words, they form representations of these relationships, of these meanings, on which they can then operate. Even if they don’t currently understand it this way, the researchers say, they likely will through continued scaling.
There is a concept in biology of a “critical period”, during which an organism’s nervous system is particularly receptive to learning specific skills or traits. This period is marked by high levels of brain plasticity before neuronal connections become more fixed.
According to the researchers, foundation models also experience a critical period during which they must accumulate a significant amount of information. The model can then eliminate some of this accumulated information while improving its ability to identify relationships between the data it encounters during inference.
CLOSE-UP SQUIRREL HOLDING WALNUT TREE
It’s not clear when in the evolutionary tree this happens in biology – it’s not like a light switch between understanding meaning and meaninglessness. It is a spectrum that emerges with the scale of the neurons of the brain. An amoeba probably doesn’t have it; a squirrel probably has them. As machines evolve, these researchers believe that understanding of meaning, by their definition, will emerge as it does in biology.
“Scale plays a key role in this emerging phenomenon,” Soatto said. “Where does it stop?” As of now, there is no obvious end in sight. »
The problem, he says, is that we may never know for sure when understanding will emerge.
With large-scale machine learning models, you can prove that (a) they can represent abstract concepts, such as pi, (b) they currently do not, at least reliably, and (c) whether and when they do, we cannot know. for sure. This is no different from humans: when we teach students, how do we know they have “understood” the material? We give an exam: if they fail, we know they didn’t do it. If they succeed, do we know they got it? If we had asked an additional question, would they have answered correctly? We won’t know until we ask. »
However, unlike humans, these large models are not opaque systems. We can measure them, audit them and calibrate them. This is a key advantage over the opacity of the human mind and promises to understand not only how these properties emerge in a foundation model, but also how they emerge in the human mind.