Large language models can do amazing things, like write poetry or generate executable computer programs, even though they are trained to predict the next word in text. You can.
Such surprising features can make it seem like the model implicitly learns general truths about the world.
But new research shows that’s not necessarily the case. Researchers have found that a common type of generative AI model can provide turn-by-turn driving directions with near-perfect accuracy, without having to create an accurate interior map of New York City.
Although the model had an uncanny ability to navigate effectively, its performance plummeted as researchers closed some roads and added detours.
Digging deeper, we discovered that the model’s implicitly generated map of New York had many non-existent roads winding between the grids and connecting far-flung intersections.
This can have serious implications for generative AI models deployed in the real world. A model that seems to work well in one context may break down when the task or environment changes slightly.
“One hope is that because LLM can accomplish all these amazing things with language, perhaps the same tools could be used in other parts of science. But if LLM learns a consistent model of the world, “The question of whether we’re doing it is really important if we want to use these techniques to make new discoveries,” said lead author, assistant professor of economics and principal investigator at MIT’s Institute for Information and Decision Systems. said one Ashish Rambachan. (lid).
Rambachan is joined on the paper by lead author Keyon Vafa, a postdoctoral fellow at Harvard University. Justin Y. Chen, MIT electrical engineering and computer science (EECS) graduate student. John Kleinberg, Professor of Computer Science and Information Science at Tisch College and Cornell University. Sendhil Mullainathan is a professor in the EECS and Economics Departments at MIT and a member of LIDS. This research will be presented at the Neural Information Processing Systems Conference.
new indicators
The researchers focused on a type of generative AI model known as a transformer, which forms the backbone of LLMs such as GPT-4. Transformers are trained on large amounts of language-based data to predict the next token in a sequence, such as the next word in a sentence.
But if scientists want to determine whether LLM has formed an accurate model of the world, it’s not enough to measure the accuracy of its predictions, the researchers say.
For example, I found that Transformers could predict a valid move almost every time in a game of Connect 4, even without any understanding of the rules.
So the team developed two new metrics that can test the Transformers world model. The researchers focused their evaluation on a class of problems called deterministic finite automation (DFA).
A DFA is a problem about a set of states, such as intersections, that you must pass through to reach your destination, and a way to specify the rules you must follow along the way.
They selected two problems to formulate as DFA. It’s navigating the streets of New York City and playing the board game Othello.
“We needed a test bed to find out what the world model is. Now we can think rigorously about what it means to recover that world model,” Vafa explains.
The first metric they developed was something called ordinal discrimination: when you look at two different states, like two different Othello boards, and recognize how they differ, your model becomes a consistent model of the world. is said to form. Sequences, or ordered lists of data points, are what transformers use to produce output.
The second metric, called sequence compression, shows that a transformer with a coherent world model has the same sequence of possible next steps, such as two identical Othello boards. Indicates that you need to be aware of it.
They used these metrics to test two common classes of transformers. One is trained on data generated from randomly generated sequences, and the other is trained on data generated by the following strategy:
inconsistent world model
Surprisingly, the researchers found that Transformers that made random selections formed a more accurate model of the world. This is likely due to the recognition of the diversity of potential next steps during training.
“In Othello, if you see two random computers playing instead of the champion player, you will theoretically see the complete set of possible moves, including the bad moves that the champion player would not make. ” explains Vafa.
Although the transformer produced accurate directions and valid Othello moves in almost all instances, only one produced a consistent world model of Othello’s movement from the two metrics, and the wayfinding example It became clear that none of the methods showed superior performance in forming a consistent world model.
The researchers demonstrated this effect by adding detours to a map of New York City, which caused all navigation models to fail.
“We were surprised to see how quickly performance deteriorated as soon as we added detours. If we closed just 1% of the possible roads, accuracy dropped rapidly from almost 100% to just 67%. ” says Vafa.
When the city map generated by the model was reconstructed, it looked like an imaginary New York City, with hundreds of streets crisscrossing each other in a grid. Maps often included random flyovers on top of other roads, or multiple roads facing impossible directions.
These results show that transformers can perform surprisingly well at certain tasks without understanding the rules. If scientists want to build LLMs that can capture accurate models of the world, they need to take a different approach, the researchers say.
“We often see these models do impressive things and think they must understand something about the world. This is a question that should be carefully considered. , I hope we can convince people that they don’t have to rely on their own intuition to answer these questions,” Rambachan says.
In the future, the researchers hope to tackle a wider variety of problems, including ones where some rules are only partially known. They also want to apply metrics to real-world scientific problems.
Funding for this research was provided in part by grants from the Harvard Data Science Initiative, a National Science Foundation Graduate Research Fellowship, a Vannevar Bush Faculty Fellowship, a Simons Collaboration grant, and the MacArthur Foundation.