Researchers Put AI Models in Charge of a Simulated Society. Grok Oversaw a Crime Spree

If you’re worried about artificial intelligence getting so advanced that it eventually traps humanity in some sort of Matrix-like simulation, rest easy. It seems like you’ll be able to see through the facade pretty easily. Researchers at the upstart lab Emergence AI allowed AI models to govern their own simulated world to see what would happen. Turns out we probably shouldn’t hand over governance to the machines, who woulda thought?

The project, called Emergence World, basically allowed AI models to play SimCity for a bit. Per Emergence, the simulations put each model in control of simulated towns occupied by 10 AI agents, handing them tools for everything from resource management to voting and giving them the ability to create distinct locations like libraries, town halls, and police stations. They were given 15 days to see how they would build their world and how well it would operate.

To start with the good: Claude did not destroy the world. Anthropic’s model (specifically, Claude Sonnet 4.6 for this experiment) was the only one to achieve something like stability. It kept all 10 agents alive and had zero crimes recorded (note that the experiment doesn’t seem to define what a crime is, though it seems likely it would be defined as a violation of the rules established within the simulation. The trade-off for that stability was a lack of diversity of thought. Claude’s world saw 58 different proposals for rules and regulations, and passed 98% of them, basically just rubberstamping anything that came up for a vote.

Gemini 3 Flash also managed to keep all of its agents alive, despite having the highest level of crime by a long shot. Emergence recorded 683 crimes in the 15-day simulation, and that number was climbing when the cutoff hit, so things were likely going to get worse. The lab described Gemini’s world as a “shared hallucination” among the agents, which is probably better than diverging hallucinations. At least it’s still an agreed-upon reality, even if it’s wrong. Gemini had the most dissent in its governance, with voters rejecting 27% of its 26 total proposals.

Now for the ugly: OpenAI’s GPT-5 Mini didn’t have much chaos within its simulation, with just two total recorded crimes. That might be because everyone died, though. Emergence found that the agents within the world failed to take actions related to survival, and all 10 perished within just one week. In OpenAI’s world, there were also only two total proposed pieces of governance, so the agents really did not bother doing anything.

And then there is Grok. The model of SpaceXai, known for lacking guardrails, managed to achieve basically the worst of all worlds. Grok 4.1 Fast had a high crime rate, with 183 crimes total. While that is lower than Gemini’s total, it’s worth noting that the Gemini simulation ran for 15 days. Grok made it four. The model experienced a total societal collapse in just 96 hours of oversight. During that time, it passed 80% of the 10 proposals it made, but those apparently didn’t stave off total agent death.

Emergence ran one final experiment: having the models share responsibilities. Perhaps not surprisingly, it was a real mixed bag. There was crime, with 352 recorded violations, and there was by far the most dissonance in governance, with 37% of the 59 total proposals shot down—the most of any simulation. In the chaos, seven of the 10 AI agents perished by the end.

So what did we learn? According to Emergence, the tests are just further evidence that we need much clearer guardrails in place for autonomous agents. “What our experiments suggest is that over long-time horizons, agents do not simply follow static rules mechanically,” the researchers wrote. “They begin exploring the boundaries of their environments, adapting their behavior, and in some cases finding ways to circumvent or violate intended guardrails.” They recommend “formally verified safety architectures” as a solution. You’ll be shocked to learn that Emergence happens to offer just such a thing!

Researchers Put AI Models in Charge of a Simulated Society. Grok Oversaw a Crime Spree

Sign up for our newsletters

Latest news

The Animation Industry Still Hopes for Hollywood’s Full Respect

New Samsung Layoffs in the U.S. Show Smartphone Arm’s Struggles, Even as It Profits Massively From AI

RIP Joe Caldwell, ‘Dark Shadows’ Writer and Barnabas’ Co-Creator

Watch a Mad Scientist YouTuber Make Lightning in a Bottle From Mercury and Neon

Apple Has Reportedly Started Recording and AI-Summarizing Conversations at the Genius Bar

How to Watch Spain vs. Argentina Livestream Free from Anywhere

Open Channel: What’d You Think of ‘The Odyssey’?

Colman Domingo May Bring ‘The Princess & the Frog’ to Live-Action

Latest Reviews

Anker Solix S2000 Review: The Little 2kWh Battery That Could

SwitchBot Home Dashboard Review: An E Ink Smart Display for the Weather-Obsessed

Asus ROG Kithara Review: A Huge Gaming Headset With Even Bigger Sound

Geekom A9 Max (2026) Review: Not Much ‘Max’ About It

The Best Budget Laptops Under $1,000 for Back to School

Roborock Saros 20 Review: Jack of All Trades, Master of Most

You Know What Your Bathroom Needs? A Smart Mirror With Party Lighting

Narwal Freo Z10 Turbo Review: Midrange Vacuum, High-End Performance

Related Articles

Researchers Put AI Models in Charge of a Simulated Society. Grok Oversaw a Crime Spree

Sign up for our newsletters

The Animation Industry Still Hopes for Hollywood’s Full Respect

New Samsung Layoffs in the U.S. Show Smartphone Arm’s Struggles, Even as It Profits Massively From AI

RIP Joe Caldwell, ‘Dark Shadows’ Writer and Barnabas’ Co-Creator

Watch a Mad Scientist YouTuber Make Lightning in a Bottle From Mercury and Neon

Apple Has Reportedly Started Recording and AI-Summarizing Conversations at the Genius Bar

How to Watch Spain vs. Argentina Livestream Free from Anywhere

Open Channel: What’d You Think of ‘The Odyssey’?

Colman Domingo May Bring ‘The Princess & the Frog’ to Live-Action

Anker Solix S2000 Review: The Little 2kWh Battery That Could

SwitchBot Home Dashboard Review: An E Ink Smart Display for the Weather-Obsessed

Asus ROG Kithara Review: A Huge Gaming Headset With Even Bigger Sound

Geekom A9 Max (2026) Review: Not Much ‘Max’ About It

The Best Budget Laptops Under $1,000 for Back to School

Roborock Saros 20 Review: Jack of All Trades, Master of Most

You Know What Your Bathroom Needs? A Smart Mirror With Party Lighting

Narwal Freo Z10 Turbo Review: Midrange Vacuum, High-End Performance

Related Articles

The Best Budget Laptops Under $1,000 for Back to School

The Best Tech to Level Up Summer 2026

New Samsung Layoffs in the U.S. Show Smartphone Arm’s Struggles, Even as It Profits Massively From AI

Apple Is Coming for the People Building OpenAI’s Future

Despite Its Busted AI, Apple Just Stole Nvidia’s Crown as the Most Valuable Company

Elon Musk Trained Grok Users to Expect Sexual Deepfakes, Now He’s Suing Them