Zencoder has hired a number of search engine veterans to help build tools that can analyze large codebases and determine what is relevant and what is not. This detailed context reduces hallucinations and improves the quality of code that large language models can generate. “We call this lipoglocking,” Filev says.
Cosine also believes context is important. But we use that context to create new kinds of datasets. The company asked dozens of programmers to record what they were doing as they tackled hundreds of different programming tasks. “We told them to write everything down,” Pullen says. “Why did you open that file? Why did you scroll all the way through? Why did you close it?” It also allows programmers to annotate completed code sections and use them to create other code sections or specific documents. I asked them to mark up a section.
Cosine then takes all that information and generates a large synthetic data set that maps the common steps that coders take and the sources of information that they draw upon in their finished code. They use this dataset to train a model to determine what breadcrumbs they need to follow to produce a particular program, and how to follow them.
San Francisco-based Poolside also creates synthetic datasets that capture the process of coding, but relies more on a technique called RLCE (reinforcement learning from code execution). (Cosine also uses this, but to a lesser extent.)
RLCE is similar to the technique used to create sophisticated conversational chatbots like ChatGPT, known as RLHF (Reinforcement Learning from Human Feedback). Using RLHF, your model is trained to produce text that is close to what human testers would prefer. With RLCE, a model is trained to produce code that approximates the expected behavior at runtime (or runtime).
Gamifying the system
Both Cosine and Poolside say they were inspired by the approach DeepMind took with its AlphaZero gameplay model. AlphaZero was given steps (moves in the game) to perform and then played against himself over and over again, determining through trial and error which series of moves were winning moves and which were not.
“They searched every turn they could and simulated as many games as they could with their calculations. That’s how they ended up defeating Lee Sedol,” said Poolside founder. Scientist Pengmin Wang says this, referring to the Korean Go grandmaster. AlphaZero won in 2016. Before Poolside, Wang worked at Google DeepMind on applications for AlphaZero that go beyond board games, including FunSearch, a version trained to solve advanced math problems.
When AlphaZero’s approach is applied to coding, the steps involved in creating a piece of code (a breadcrumb trail) become available moves in the game, and the correct program wins the game. When a model plays alone, it can improve much faster than a human. “Human programmers try one failure at a time,” Kant says. “A model can try 100 things at once.”