We ask a lot of ourselves as babies. Somehow, we must move from being sensory blobs to mobile, rational, attentive communicators in just a few years. There you are, a baby with no vocabulary, in a room cluttered with toys and stuffed animals. You pick up a Lincoln log and your caretaker says, “This is a ‘log.’ Eventually, you understand that “log” does not refer strictly to this particular brown plastic cylinder or to brown plastic cylinders in general, but to brown plastic cylinders that embody the characteristics of felled and stripped tree parts, which are also, of course, “logs”.
There has been much research and heated debate about how babies achieve this. Some scientists have argued that most of our language acquisition can be explained by associative learning, in that we associate sounds with sensitivities, just as dogs associate the sound of a bell with food. Others argue that there are inherent features of the human mind that have shaped the forms of all languages and are crucial to our learning. Still others argue that toddlers build their understanding of new words in addition to their understanding of other words.
This speech advanced on a recent Sunday morning, as Tammy Kwan and Brenden Lake delivered blackberries from a bowl into the mouth of their twenty-one-month-old daughter, Luna. Luna was dressed in pink leggings and a pink tutu, with a silicone bib around her neck and a soft pink hat on her head. A lightweight GoPro-style camera was attached to the front.
“Babooga,” she said, pointing a round finger at the berries. Dr. Kwan gave him the rest and Dr. Lake looked at the empty bowl, amused. “That’s about $10,” he said. A light on the camera flashed.
For an hour every week for the past 11 months, Dr. Lake, a psychologist at New York University whose research focuses on human and artificial intelligence, attached a camera to Luna and recorded things from her point view while she plays. Its goal is to use the videos to train a language model using the same sensory input a toddler is exposed to: a LunaBot, so to speak. In doing so, he hopes to create better tools for understanding both AI and ourselves. “We see this research as finally making that connection between these two areas of study,” Dr. Lake said. “We can finally put them in dialogue with each other. »
There are many obstacles to using AI models to understand the human mind. After all, the two are radically different. Modern, multimodal language models – like OpenAI’s GPT-4 and Google’s Gemini – are assembled on neural networks with little built-in structure and have improved primarily through increased computing power and data sets of more important training. Google’s newest language model, Llama 3, is trained on over ten trillion words; an average five-year-old child is exposed to more than 300,000.
Such models can analyze pixels in images, but are unable to taste cheese or berries or feel hunger, types of important learning experiences for children. Researchers can do their best to turn a child’s entire sensory stream into code, but crucial aspects of their phenomenology will inevitably be ignored. “What we see is just the residue of an active learner,” said Michael Frank, a psychologist at Stanford who has tried for years to capture the human experience on camera. Her lab currently works with more than 25 children across the country, including Luna, to record their experiences at home and in social settings.
Humans are also not mere receptacles of data, as neural networks are, but intentional animals. Everything we see, every object we touch, every word we hear corresponds to the beliefs and desires we have at the moment. “There is a deep relationship between what you are trying to learn and the data collected,” said Linda Smith, a psychologist at Indiana University. “These models only predict. They take everything you put into it and take the next best step. Although it is possible to mimic human intentionality by structuring training data (which Dr. Smith’s lab attempted to do recently), the most competent AI models and the companies that make them are have long been geared toward processing more data efficiently, not toward processing more data efficiently. give more meaning with less.
There is also a more conceptual problem, which arises from the fact that the capabilities of AI systems can seem quite human, even if they arise in non-human ways. Recently, dubious claims of awareness, general intelligence And sentience emerged from the industrial laboratories of Google and Microsoft following the release of new models. In March, Claude 3, the latest model from an AI research startup called Anthropic, sparked concern debate when, after analyzing a random sentence about pizza toppings hidden in a long list of unrelated documents, he expressed the suspicion that it was being tested. Such reports often seem like marketing ploys rather than objective scientific projects, but they underscore our eagerness to ascribe scientific significance to AI.
But human minds converge with virtual minds in other ways. Tom Griffiths, a cognitive scientist at Princeton, suggested that by describing the limits of human intelligence and building models with similar limits, we could achieve a better understanding of ourselves and more interpretable AI and more efficient. Human intelligence helps us better understand and model computers, and we can use these models to understand human intelligence,” Dr Griffiths said. “This is all very new. “We explore the space of possibilities.”
In February, Dr. Lake and his collaborators created the first-ever AI model based on a child’s experiences, using videos captured in Dr. Frank’s lab more than a decade ago. The model was published in the journal Science and, based on 60 hours of images, was able to associate different moments with words. Type in “sand” and the model will remember the time 11 years ago when the boy the model was training from went to the beach with his mother. Type “car” and the template displays a first-person video of the boy sitting in his booster seat.
The training videos are old and grainy, and the data is quite sparse, but the model’s ability to form a kind of conceptual map of the world suggests that it might be possible to acquire language primarily by association. “We had a reviewer of the paper who said, ‘Before I read this, I would have thought this was impossible,'” said Wai Keen Vong, a researcher at NYU who helped lead the work.
For Dr. Lake and other researchers like him, these interrelated questions: How much more human can we make AI? What makes us human? — showcase the most exciting research on the horizon. Delving deeper into the first question piece by piece, by modeling social interactions, intentions and biases, by collecting complete video sequences from a front-facing camera mounted on a one-year-old child, is getting closer to the answer to the second question.
“If the field can get to the point where models are trained solely on the data seen by a single child, and they perform well across a broad set of tasks, that would be a huge scientific achievement,” Dr. Lake said. .
At their apartment, Dr. Lake and Dr. Kwan gathered Luna and her older brother, Logan, for a birthday party. The children, crowding in front of the door, put on their socks and shoes. Dr. Lake stopped the recording on Luna’s camera and handed her a pair of fuzzy white mittens with sheep faces on them. “What is this, Luna?” I asked.
“Baa baa,” Luna said.
Dr Kwan said: “There was a time when she didn’t know the word ‘no’ and it was just ‘yes’ to everything. » She addressed Luna: “Kisses, do you want kisses?
“No,” Luna said.
“Oh,” Dr. Lake said, laughing. “I miss the ‘yes’ phase.”
Audio produced by Sarah Diamond.