Google's CEO Tells Staff to Spend Hours Working Out 'Bard' AI Kinks

A leaked company-wide memo from CEO Sundar Pichai requests Google’s worker bees take two to four hours out of their day to make its search AI actually usable.

We may earn a commission from links on this page.
Sundar Pichai staring at the screen gesturing with his hand.
Google CEO Sundar Pichai told employees they already have ‘thousands’ of workers ‘dogfooding’ their Bard AI.
Photo: Anna Moneymaker (Getty Images)

Google knows its AI just isn’t ready for prime time, so it has a new plan to iron out all the kinks, by forcing thousands of its workers to spend hours poking and prodding the poor AI until it won’t embarrass the company when it’s finally released.

Business Insider reported based on a leaked company-wide email that Google is asking all of its employees to take two to four hours of their day to test Google’s “Bard” AI, the same system the company plans to integrate into its chat function. It’s unclear if all Googlers over the world have received the same ask. The company recently announced 12,000 job cuts to its global workforce, but Google, without its parent company Alphabet, still employs over 170,000 around the world.

In that memo, Google CEO Sundar Pichai said he would “appreciate” if all staff “contributed in a deeper way” and take two to four hours to pressure test Bard. Anybody who’s ever read a “suggestion” email from their boss knows that it’s more of a mandate than anything else. It’s unclear based on the email text if the two-to-four hour suggestion would be asked of them every day or spread over a longer period of time.


Google unveiled Bard last week in an attempt to keep its edge over Microsoft, which introduced its own chatbot AI into Bing search. During a recent intro presentation, the AI showcased an incorrect statement about the Webb Space Telescope, a blunder that reportedly caused the company to lose $100 billion in stock price.

According to the memo, Google already started internal testing, called “dogfooding” on Tuesday with Pichai saying the company already has “thousands” of external and internal testers mucking about with Bard. Those testers are reportedly investigating quality and safety concerns with the search AI, along with its “groundedness,” which could relate to whether the AI-generated text answers read as “human.”


A Google spokesperson told Gizmodo in an email that “Testing and feedback, from Googlers and external trusted testers, are important aspects of improving Bard to ensure it’s ready for our users. We often seek input from Googlers to help make our products better, and it’s an important part of our internal culture.” The company did not respond to questions about how long and how often staff are expected to stress test the AI.

Google has been smarting ever since it first displayed its AI, and the comparisons between Bard’s rather lackluster display and Bing search AI’s growing list of applications has put the company on the back foot. Demonstrations of Google’s Bard did not provide citations for the content it showed, unlike Bing search. However, citations aren’t the end all, be all of giving credence to AI responses. Margaret Mitchell, the chief ethics researcher at Hugging Face who had been previously fired from Google’s AI team, told MIT Technology Review that “a lot of people don’t check citations” and that having citations show up might just lend credence to wrong information.


Bing search had a lot more bells and whistles at launch than Google’s offering, but it’s suffering from the same problems that other AI chatbots have long had, namely that they’re absolutely chock full of inaccuracies and, well, weird responses to user prompts.

And as for getting the AI to not share awful content—whether its xenophobia, racism, anti-Semitism—as chatbots have been known to do, it can take a lot of hands working long hours to get it into any halfway decent shape. OpenAI, the creator of ChatGPT who has helped Microsoft create its Bing AI, contracted with low-wage workers in Kenya to sift through thousands of examples of horrible content. This included child sexual abuse content, murder, torture, suicide, and more.


It’s unclear if Googlers will be subjected to some of the same, but it likely won’t be anywhere near fun for the thousands of employees expected to stress test the AI with prompts. Google recently invested nearly $400 million in OpenAI rival Anthropic, a company that is now hiring a “prompt engineer” for somebody to develop ways to get large language models to perform specific tasks.

Update 2/16/23 at 8:30 a.m. ET: This post was updated to correct the number of money Google lost due to its Bard wrong fact from $100 million to $100 billion.