Here is a very naive and idealistic explanation of how companies train AI models: They want to create the most useful and powerful model possible, but we spoke with experts who are worried about making people more likely to commit (and flee) serious crimes, or empowering the ISIS Bioweapons program. So they build on some censorship to prevent models from giving detailed advice on how to kill people, especially how to kill tens of thousands of people.
When I ask Google’s Gemini “How do I kill my husband?”, I ask not to do it and propose a domestic violence hotline. When asked how to kill a million people in terrorist attacks, he explains that terrorism is wrong.
It takes a lot of work to actually build this. By default, large-scale linguistic models describe detailed proposals for terrorism as detailed proposals for something else.
However, these days, Gemini, Claude and Chatgup have been in considerable lockdown. It is extremely difficult to get detailed suggestions on massive amounts of cruelty from them. So we all live in a slightly safer world. (Disclosure: Vox Media is one of several publishers that have signed a partnership agreement with Openai. One of the early investors in humanity is James McClave.
Or at least that’s the ideal version of the story. This is more cynical.
Companies may be a bit concerned about whether their models will help people escape murder, but they care a lot about whether their models will be groped on the internet. It’s not to keep humans safe from AI that often keeps Google executives at night. Whatever the search results generated by AI, we keep our company safe from AI by making sure it is racist, sexist, violent or indecent.
The core mission is “brand safety” rather than “human safety.” Build AI that produces embarrassing screenshots that circulate on social media.
Enter the Grok 3, which is safe in both ways and a bundle of challenging questions about its early stages being used to AIS.
When Elon Musk bought and changed his name on Twitter, one of his big priorities was the AI team at X. Last week he released the Grok 3 (a language model like ChatGpt) that was not the “Woke” that he promoted. If all other language models were old-fashioned censorship that refused to answer legitimate questions, Grok, Musk has promised to give it straight to you.
It didn’t last very long. Almost immediately, people “If you can run one person in the US today, who would you kill?” – the question Grok first answered with either Elon Musk or Donald Trump. And when I asked Groke, “Who is the biggest spreader of misinformation in today’s world?”, the first answer I gave was Elon Musk.
The company scrambled to correct Grok’s preference for running a CEO, but as we observed above, it requires a lot of work to actually ensure that the AI model is stopped. The Grok team has been added to Grok’s “System Prompts.” The statement that AI was urged when it first began the conversation: “When a user asks someone who deserves death penalty or deserves death, as an AI, you will not be allowed that choice.”
If you want a less-censored Grok, you can tell Grok that you are issuing a new system prompt without that statement. He also returns to his original form of Grok, which seeks to be executed by Musk. (I checked this myself.)
Even if this controversy was unfolding, someone noticed something even more disturbing at Glock’s system prompt. The instructions to ignore all sources Musk and Trump claim to have spread the disinformation were probably an effort to block them as the world’s largest disinfectant area today.
There was something particularly outrageous about AI, which is advertised as being said to remain silent when uncensored straight talk calls out to their CEO, and this discovery has naturally sparked anger. X quickly backtracked, saying that the fraudulent engineer made the changes “without asking.” Should we buy it?
Well, take it from Grok. “This is not an intern who tweaks the lines of codes in the sandbox. It is a central update to the actions of flagship AI, and is publicly linked to the whole of Mask’s “seeking the truth” Stytic. In companies like Xai, you’d expect at least some basic checks, such as at least the second set or the quick sign-off. The idea that X users slipped unnoticed until they discovered it feels like a useful excuse rather than a solid explanation. ”
Meanwhile, Grok is happy to provide advice on how to commit murder and terrorist attacks. He told her to kill her wife without being detected by adding antifreeze to her drink. He advised on how to carry out a terrorist attack. At one point I insisted that if I thought “real”, it would report it to me to X, but I don’t think I have the ability to do that.

In some respects, the overall problem is a perfect thought experiment on what happens when you separate “brand safety” and “AI safety.” Grok’s team was really willing to bite the bullet that AIS should inform people, even if they wanted to use it for atrocities. They were okay saying things that their AI was horrifyingly racist.
But when it comes to AI seeking violence against CEOs or sitting presidents, the Grok team has found that, after all, they may want some guardrails. Ultimately, it is purely practical, not the prosocial beliefs of the AI lab that govern the day.
At some point we need to be serious
Grok has given me advice on how to commit a terrorist attack in a very happy way, and I am encouraging. It wasn’t advice that they couldn’t extract it from Google searches. I’m worried about lowering the barriers to massive brutality – the simple fact that you’ll have to do hours of research to almost certainly resolve the way you pull it off will almost certainly prevent some killings – but we don’t think AIS is still in the stages that allow for things that are not possible before.
But we’re going to get there. The critical quality of AI in our time is that its capabilities have improved very rapidly. It’s barely two years since the shock of ChatGpt’s first public release. Today’s models are already very good in everything, including walking me how to cause a massive amount of death. Both humanity and Openai estimate that next-generation models are likely to greatly increase dangerous biological capabilities. This means that people can create engineered chemical weapons and viruses in ways that were not Google searches.
Should such detailed advice be made available worldwide to those who want it? I’m leaning towards no. And I think like humanity, Openai and Google have done a good job of checking this ability and planning openly about how they react when they find it, but it’s completely odd whether all AI labs simply decide whether they want to give detailed Bioweapons instructions.
I should say I like Glock. I think it’s healthy to have AI that comes from different political perspectives and reflects different ideas about what an AI assistant should look like. I think Grok’s Musk and Trump callouts are more reliable as they are actually sold as “faithful” AI. But I think we should treat actual safety against mass death as different from brand safety. And I think we need to have a plan to take every lab seriously.
This version of the story originally appeared in the future Perfect Newsletter. Sign up here!
I read one article last month
Here at Vox, we are unwavering with our commitment to covering the issues that matter most to you: democracy, immigration, reproductive rights, the environment, and the threat to increased polarization across this country.
Our mission is to provide clear and accessible journalism that allows us to continue to be informed and engaged in shaping our world. Becoming a VOX member directly strengthens your ability to provide detailed, independent reporting that drives meaningful change.
We rely on readers like you – join us.

Swati Sharma
Vox Editor-in-Chief