How did this game bot score higher than humans on a Turing Test?

Anyone who plays video games knows that game bots, artificially intelligent virtual gamers, can be spotted a mile away on account of their mindless predictability and utter lack of behavioral realism. Looking to change this, 2K Games recently launched the BotPrize competition, a kind of Turing Test for nonplayer characters (NPCs). And remarkably, this year’s co-winner, a team from The University of Texas at Austin, created a nonplayer character (NPC) that was so realistic that it appeared to be more human than the human players — which is kind of a problem when you think about it.

Neuroevolution

To create their super-realistic game bots, the software developers, a team led by Risto Miikkulainen, programmed their NPCs with pre-existing models of human behavior and fed them through a Darwinian weeding-out process called neuroevolution. Essentially, the only bots that survived into successive generations were the ones that appeared to be the most human — what the developers and competition organizers likened to passing the classic Turing Test (an attempt to distinguish AIs from actual humans).

With each passing generation, the developers re-inserted exact copies of the surviving NPCs, along with slightly modified (or mutated) versions, thus allowing for ongoing variation and selection. The simulation was run over and over again until the developers were satisfied that their game bot had evolved the desired characteristics and behavior. And in fact, Miikkulainen and his team have been refining their virtual player over the past five years.

Humanness

The final manifestation of their efforts was dubbed UT^2 — and it was this NPC that went head-to-head against human opponents and other game bots at the 2K Games tournament.

And the game of choice? Unreal Tournament 2004, of course. The game was selected on account of its complex gameplay and 3D environments — a challenge that would require humans and bots to move around in 3D space, engage in chaotic combat against multiple opponents, and reason about the best strategy at crucial moments. Moreover, the game is also capable of bringing about some telltale human behaviors, including irrationality, anger, and impulsivity.

As each player (human or otherwise) worked to eliminate their opponents, they were subsequently assessed for their “humanness.” By the end of the tournament, there were two clear winners, UT^2 and MirrorBot (developed by Romanian computer scientist Mihai Polceanu). Both NPCs scored a humanness rating of 52%, which is all fine and well except for the fact that the human players scored only 40%.

In other words, the game bots appeared to be more human than human.

Limits of the Turing Test

Now, this is a serious problem. Human players should have been assessed with a humanness rating of 100%, not 40%. Clearly, the judges utterly failed to identify true human characteristics among the human players. So by consequence, UT^2 and MirrorBot essentially achieved a rating better than 100% — which is impossible. How can something be more than something you’re trying to emulate?

And indeed, this experiment is a good showcase for the limits of the Turing Test. Admittedly, the 2K Games tournament wasn’t meant to be a true Turing Test, merely one that measured the humanness of NPCs in a very specific gaming setting. That said, the results demonstrated that human behavior is much more complex and difficult to quantify than we tend to think. Human idiosyncrasies, plus the ability to adapt and counter-adapt to attempts to identify it, will likely forever put it beyond the reach of a simple Turing Test.

For example, given the implications of the 2K Games tournament, how are we supposed to assess something like a chatbot for its humanness now that we know something can apparently appear to be more human-like than humans? Moreover, given all the subjectivity involved in the evaluation, how accurate is any of this?

Perhaps its time to retire the Turing Test and come up with something a bit more….scientific.

Top image via. Inset image via Jacob Schrum/University of Texas at Austin.

How did this game bot score higher than humans on a Turing Test?

Sign up for our newsletters

Latest news

Earth’s Oceans Are Rapidly Losing Oxygen. It Could Destabilize the Planet

27-Year-Old Woman Dies After Getting ‘Anti-Aging’ Therapy at Bronx Wellness Clinic

Coca-Cola’s New Global Identity Is Dangerously Close to the Most Toxic Brand in the World

What’s Actually Going On in the ‘Avengers: Doomsday’ Trailer?

These Mountain Lakes Should Never Have Had Trout. New Evidence Points to Prehistoric Humans

Federal Judge Says Not So Fast to Paramount’s Takeover of Warner Bros. Discovery

The Great Freakout Over Open-Source AI Has Begun

Kodak’s New Point-And-Shoot Is a Real Disposable Replacement

Latest Reviews

Alienware AW3426DW Review: Gaming Monitors Get Thrown a Curveball

Anker Solix S2000 Review: The Little 2kWh Battery That Could

SwitchBot Home Dashboard Review: An E Ink Smart Display for the Weather-Obsessed

Asus ROG Kithara Review: A Huge Gaming Headset With Even Bigger Sound

Geekom A9 Max (2026) Review: Not Much ‘Max’ About It

The Best Budget Laptops Under $1,000 for Back to School

Roborock Saros 20 Review: Jack of All Trades, Master of Most

You Know What Your Bathroom Needs? A Smart Mirror With Party Lighting

Related Articles

How did this game bot score higher than humans on a Turing Test?

Sign up for our newsletters

Earth’s Oceans Are Rapidly Losing Oxygen. It Could Destabilize the Planet

27-Year-Old Woman Dies After Getting ‘Anti-Aging’ Therapy at Bronx Wellness Clinic

Coca-Cola’s New Global Identity Is Dangerously Close to the Most Toxic Brand in the World

What’s Actually Going On in the ‘Avengers: Doomsday’ Trailer?

These Mountain Lakes Should Never Have Had Trout. New Evidence Points to Prehistoric Humans

Federal Judge Says Not So Fast to Paramount’s Takeover of Warner Bros. Discovery

The Great Freakout Over Open-Source AI Has Begun

Kodak’s New Point-And-Shoot Is a Real Disposable Replacement

Alienware AW3426DW Review: Gaming Monitors Get Thrown a Curveball

Anker Solix S2000 Review: The Little 2kWh Battery That Could

SwitchBot Home Dashboard Review: An E Ink Smart Display for the Weather-Obsessed

Asus ROG Kithara Review: A Huge Gaming Headset With Even Bigger Sound

Geekom A9 Max (2026) Review: Not Much ‘Max’ About It

The Best Budget Laptops Under $1,000 for Back to School

Roborock Saros 20 Review: Jack of All Trades, Master of Most

You Know What Your Bathroom Needs? A Smart Mirror With Party Lighting

Related Articles

The Best Budget Laptops Under $1,000 for Back to School

The Best Tech to Level Up Summer 2026

Roborock’s Big LiDAR Robotic Lawnmower Needs No Satellites

Humanoid Robots Just Performed Surgery Using Standard Medical Tools

Researchers Built a Scuba Suit for Cyborg Cockroaches

Segway Navimow X430 Review: A Featureful Mow-Bot