Elon Musk agrees that AI training data is exhausted

Elon Musk, like other AI experts, says there is little real-world data left to train AI models.

“We’re basically depleting the human reservoir of knowledge right now… in AI training,” Musk said in a livestreamed conversation with Stagwell Chairman Mark Penn streamed on X late Wednesday. spoke. “That’s basically what happened last year.”

Musk, who owns the AI company xAI, echoed a theme touched on by former OpenAI chief scientist Ilya Satskeva in a December speech at the machine learning conference NeurIPS. Sutskever said the AI industry has reached what he calls “peak data,” predicting that a lack of training data will force a shift from current model development methods.

In fact, Musk suggested that synthetic data, or data generated by the AI models themselves, is the way forward. “The only way to complement[real-world data]is to use synthetic data that the AI creates[training data],” he said. “With synthetic data…(the AI) will kind of self-score and go through a self-learning process.”

Other companies, including tech giants like Microsoft, Meta, OpenAI, and Anthropic, are already using synthetic data to train their flagship AI models. Gartner estimates that 60% of data used for AI and analytics projects in 2024 will be synthetically generated.

Microsoft’s Phi-4, which was open sourced early Wednesday, was trained on synthetic data in parallel with real-world data. Google’s Gemma model was similar. Anthropic used some synthetic data to develop Claude 3.5 Sonnet, one of its highest performing systems. Meta then used the AI-generated data to fine-tune its latest Llama series models.

Training on synthetic data also has other benefits, such as cost savings. AI startup Writer claims that its Palmyra X 004 model, developed almost entirely using synthetic sources, cost just $700,000 to develop. In comparison, the estimated development cost for a comparable OpenAI model is $4.6 million.

However, there are also disadvantages. Some studies have shown that synthetic data can lead to model breakdown, making the model’s output less “creativity” and more biased, and ultimately severely impairing its functionality. Suggests. Models create synthetic data, so if the data used to train these models has biases or limitations, the output will be contaminated as well.

What's Hot

Top Wall Street analysts on Thursday say things like Nvidia

Will Nvidia’s stock price reach $190? 1 Wall Street analysts think so.

How Reco discovers shadow AI in SaaS

How Reco discovers shadow AI in SaaS

Teaching AI to communicate by voice like humans | Massachusetts Institute of Technology News

AI-generated “slop” is slowly destroying the internet, so why isn’t anyone doing anything to stop it? | Arwa Mahdawi

Nvidia’s big day is here: What to expect when the AI giant reports after the bell

Can Nvidia’s bull market continue? Timothy Arcuri predicts

Nvidia shows off progress on Blackwell server installation — AI and datacenter roadmap sees Blackwell Ultra coming next year, Vera CPUs and Rubin GPUs coming in 2026

Most Popular

Nvidia’s big day is here: What to expect when the AI giant reports after the bell

Can Nvidia’s bull market continue? Timothy Arcuri predicts

Nvidia shows off progress on Blackwell server installation — AI and datacenter roadmap sees Blackwell Ultra coming next year, Vera CPUs and Rubin GPUs coming in 2026

Our Picks

How Reco discovers shadow AI in SaaS

Teaching AI to communicate by voice like humans | Massachusetts Institute of Technology News

AI-generated “slop” is slowly destroying the internet, so why isn’t anyone doing anything to stop it? | Arwa Mahdawi

Subscribe to Updates

What's Hot

Elon Musk agrees that AI training data is exhausted

Related Posts

Subscribe to Updates