Your AI Chatbot Therapist Isn’t Sure What It’s Doing

Illustration: Jim Cooke, Photo: Shutterstock

In 1966, Joseph Weizenbaum created Eliza, the first psychotherapy chatbot. Eliza’s programming was relatively simple: Engage the user by detecting keywords in their statements and reflect this information back at them as an open-ended prompt. (A user might say “I’m sad,” to which Eliza could respond, “I’m sorry to hear you are sad.”)

This script—aptly called Doctor—was designed to emulate the style of Carl Rogers, a pioneering psychotherapist who, since the 1940s, popularized a non-directive, client-lead approach to therapy through self-actualization. Though originally intended to demonstrate the superficiality of human-to-machine communications, Eliza instead proved to be an engaging conversationalist, and Weizenbaum noticed that people were quick to develop emotional attachments to the program. Weizenbaum famously recalled how his secretary, who had watched him work on the program for months, became so enthralled during her first conversation with Eliza that she asked Weisenbaum to leave the room so they could speak in private. Watching people attribute intelligence, personality, and even emotion to this relatively simple computer program made Weisenbaum uneasy, and in the years to follow, he would go on to become one of the biggest public critics of AI.

Today, a quick search brings up countless AI-emulating chatbots claiming to improve your mental health. Seemingly, their wide availability is a particular response to the increasing demands of a broken mental health system. Indeed, America is facing a mental health crisis that has only gotten worse over time. Suicide rates are increasing at a steady pace, and one in five U.S. adults experience the effects of mental illness each year, according to the National Alliance on Mental Illness. This decline in American mental health is further exacerbated by a scarcity of accessible treatment options. Private therapy is notoriously expensive, and nearly 40 percent of Americans live in areas with a critical shortage of mental health care professionals. Even in a major metropolitan area, finding a therapist that takes your insurance—if you have it—can take months.

In recent years, an influx of apps have cropped up advertising affordable mental health support alternatives. Many of these apps claim to practice accepted therapeutic techniques such as CBT (cognitive behavioral therapy) and DBT (dialectical behavior therapy), and, as of 2019, 74 percent of apps offering alleged “therapeutic treatment for depression and/or anxiety” in the Apple App Store and Google Play store were free to download. From “AI life coach” Wysa to “emotional health assistant” Youper to “mental wellness companion” Replika, this next generation of chatbots offers a glimpse at a possible future where a type of mental health resource is financially accessible, available 24/7, and can scale to help thousands of users per day. But the proliferation of loosely regulated therapy software also raises questions. How do AI-powered mental health programs compare with face-to-face treatment, or to digital therapy administered by a human? Do they answer to the same regulatory bodies as other mental health providers? Moreover, what are the risks of outsourcing something as sensitive as mental health care to Silicon Valley startups?

Of the numerous apps on the market, many claim to offer cognitive behavioral therapy, an evidence-based therapeutic modality that helps patients challenge their behavior through resolving unhelpful thought patterns. CBT is among the most effective forms of short-term therapeutic intervention and has shown evidence of efficacy for mental health issues ranging from anxiety and depression to symptoms of schizophrenia. But although its strong evidence base and formulaic structure make it a good candidate for digital delivery methods, preliminary research shows that patient dropout rates tend to be higher for computer-administered CBT therapy with little or no clinician involvement. It’s also unclear how many of the apps purporting to teach CBT techniques compare to recognized CBT competence frameworks. For instance, a survey of therapeutic functionality in a sample of mobile CBT apps found the number of core CBT features varied widely between apps, and a full 10 percent of apps purporting to practice CBT therapies contained no CBT features at all. A 2019 study found that while apps are quick to invoke scientific language, most lack any credible research to prove they are effective at improving users’ mental health—and those that did have evidence often cited research undertaken by parties involved with the app’s development.

Among the most popular CBT therapy apps is Woebot, a cheerful chatbot designed to deliver the principles of CBT through a series of conversation-based interactions and exercises. Though Woebot has been proven to reduce symptoms of depression and anxiety in a peer-reviewed study sponsored by the company, founder and clinical research psychologist Alison Darcy takes care to avoid framing Woebot as a replacement for human therapists. “Any conversational interface like ours is just that—a way to engage a person as they learn,” she told Gizmodo. Instead, Darcy likens the app to a self-directed, “choose-your-own-adventure” self-help book. As Darcy claimed, “The Woebot character was created to make it easier for people to reach out and talk about difficult things.”

Woebot’s conversations are almost entirely scripted, meaning elements of dialogue are crafted by real humans rather than a set of machine learning algorithms. The app uses AI to process user responses but only reverts to neural networks—systems that “learn” without being programmed with specific, task-oriented rules—at certain junctures, such as when interpreting the meaning of users’ free text answers when they want to say something more specific. Otherwise, the app offers a set of pre-filled response options that correspond to different paths the conversation could take. This style of deterministic conversational UX means that Woebot wastes very little time struggling to decipher what you’re saying. However, like Siri or another commercial AI, it has limited options to integrate in-the-moment feedback from frustrated and misunderstood users.

On the other side of the spectrum is Replika, a wellness AI that uses machine learning to emulate the flow of human conversation. With some 46,900 ratings and an average of 4.6 stars on the App Store, it’s one of the most popular chatbots on the market—but although effusive user reviews credit Replika with improving their mental health, reducing feeling of loneliness, and even preventing suicide, the app doesn’t actually practice or teach any proven therapeutic techniques. Rather, Replika creates a supportive emotional environment by engaging the user in dialogue about their thoughts and experiences much as a friend would. It’s quick to probe at your emotions when it detects a change in mood and even references details and inside jokes from past conversations. According to reviews, this ability to emulate human intimacy has led some users to develop strong emotional attachments to the AI.

Replika uses neural networks to learn from each interaction and become more like its user over time, but this streamlined conversational ability comes with a tradeoff. Since Replika uses a model that learns independently from examples rather than being programmed with specific rules, the program lacks the precision to reliably administer therapeutic techniques or respond to specific requests. This is the reason its machine learning model hasn’t been adopted by mainstream digital “assistants” like Alexa or Siri, whose primary purpose is to accomplish specific tasks rather than connect with users.

Apps programmed to practice a highly structured therapeutic modality like CBT may not be able to remember your partner’s name or share an inside joke, but they come with their own set of distinct advantages. For instance, most CBT therapy apps are well-equipped to track your moods, helping to visualize patterns in the data with more accuracy than human memory can afford. Their approachable conversational interface can guide you through the process of re-framing a thought, even offering suggestions of habitual ways of thinking—referred to as “cognitive distortions”—that cause people to view reality in inaccurate and often negative ways. For those who are self-conscious about pursuing mental health treatment, the self-directed nature of these apps could present an appealing opportunity to take control of their emotional health on their own terms. And of course, the convenience of smartphone apps is unparalleled: Unlike a real therapist, they’re available 24/7 and will even engage you with gentle, regular check-ins if you enable push notifications.

For some users, removing the human element from therapy may even make it easier to open up; studies show that people who believe they’re talking to a chatbot are more willing to disclose personal information and display intense emotions without fear of another person’s judgement. But while conversing with an anonymous algorithm may feel safer than confiding in a human healthcare provider, the policy behind their regulation remains murky. By framing their services as “chats” and “support” rather than treatment, app companies can effectively evade the use and disclosure regulations that apply to human therapists. FDA oversight only extends to devices intended to diagnose or treat disease. The current U.S. privacy laws don’t impose any direct confidentiality restrictions on these apps. In light of the covid-19 pandemic, the FDA announced a further loosening of premarket requirements for Schedule II prescription-only psychiatry apps and issued additional guidelines encouraging the use of low-risk mental health support and wellness technologies in order to promote social distancing and reduce strain on hospitals.

The use of as-needed regulation policies isn’t unique to mental health apps. The FDA is responsible for taking action against dangerous or misbranded dietary supplement products but doesn’t review them for safety and effectiveness before they are marketed. With private companies bearing full responsibility for the safety and efficacy of their products, it’s no surprise that the wellness space has come to be flooded with a broad array of supplements—and smartphone apps—that are unlikely to hurt you but not necessarily proven to help you.

As healthcare goes digital, gray areas in U.S. privacy law present new challenges for ensuring patient’s safety and security. The mobile health industry is among the fastest-growing categories in the app store, with a projected market size of $60 billion by 2020. This means that more data than ever is flowing through these apps, but whether or not they’re required to comply with HIPAA requirements is determined on a case-by-case basis. Mobile apps for fitness tracking, mental health, and medication usage fall under the category of Personal Health Record (PHR), meaning factors such as the data source and purpose of data collection determine whether or not they fall under the scope of HIPAA protections. For example, HIPAA would not apply to fitness-tracking apps that require the end user to enter data themselves, if it’s data collected using their own equipment. But if the covered entity is inputting and tracking the information—for example, a medical provider or an insurance company—the app would be covered by these protections. The limitations in U.S. privacy infrastructure are due, in part, to the fact that HIPAA’s privacy rules were issued in 1966 for the primary purpose of regulating health insurance as well as establishing a national standard for protecting individuals’ medical records.

Today’s interactive mHealth apps are not only storing personal health information but collecting data about how consumers use the app. In 2019, the popular therapy-on-demand app Better Help came under fire for sharing sensitive user data, despite the fact that it connects users with HIPAA-licensed counselors using an encrypted platform. When reporters at Jezebel monitored the kind of information being collected and shared by the app, they found that while the conversations themselves were private, metadata about user behavior—including the time, location, and duration of the therapy session—was being repurposed to better sell targeted ads (a process which, the piece notes, “feels somewhat less nefarious when it’s done to approximate shoe size rather than figure out how much distress a person is in.”)

A 2019 analysis in the Journal of Medical Internet Research found that for mental health apps, privacy, security, and trust were critical to user experience, while aspects like the number of CBT features had less bearing on overall satisfaction with an app. This seems to indicate that people aren’t too picky about the specific therapeutic techniques at play, as long as they feel their data is safe, and using the app empowers them to play a more active role in supporting their own mental health and well-being. Despite the large variances in app quality, features, and design, user sentiment across the board was found to be overwhelmingly positive: people consistently used the apps both as therapy augmentation and replacement and found value in an eclectic mix of CBT and non-CBT features. Though user opinion isn’t the same thing as clinical results, this general enthusiasm for the apps suggests that mobile therapeutic technologies have the potential to play a useful role in the broader mental health landscape.

“When you look at the analytics and engagement rates with chatbots, users actually seem to like and stick with them, [which is notable] because in the world of digital mental health products, engagement is notoriously poor. Just because something is not tested to show that it works does not mean that it doesn’t work—it just means that we don’t know yet,” clinical research psychologist Stephen Schueller told Gizmodo. Schueller, whose work focuses on how technology can be used to advance mental health resources, currently spearheads One Mind PsyberGuide, a nonprofit project that recently partnered with the Anxiety and Depression Association of America to increase consumer’s access to information about mental health apps. Though none of the apps are officially endorsed by ADAA, mental health professionals with degrees in psychology, medicine, social work, and counseling have volunteered to provide detailed reviews of apps ranging from the popular meditation app Headspace and to gamified positive psychology app Happify. Their rating standards include ease of use, effectiveness, personalization, interactiveness/feedback, and research evidence and vary in their results.

“Digital tools can work really well for specific purposes and as part of a broader care system,” Schueller said. “But I also think we can and should do a lot better in the technology space. We have the opportunity to bring together psychotherapy experience and research with cutting-edge technology [like] artificial intelligence, natural language processing, and machine learning—things that in recent years have improved considerably. But because of the way this marketplace has evolved, there’s not a lot of regulation these companies have.”

Though many apps have demonstrated promise, digital technologies simply aren’t being held to industry standards of quality, safety, and data protection people have come to expect from traditional healthcare settings. While the overall research in the field remains limited, a 2019 study of the efficacy of smartphone apps as standalone psychological interventions found they “cannot be recommended based on the current level of evidence.”

The notorious philosophy of Silicon Valley, “move fast and break things, ”all but guarantees app sales will continue to outpace scientific research unless new industry safeguards are introduced. Though these apps may never have intended to replace traditional therapy, people are going to use what they can access—and as the lack of affordable treatment options in the U.S. pushes digital mental health resources into the mainstream, we need to address the large inconsistencies in safety and quality across these apps in order to make digital mental health support resources a safe and reliable option for consumers. Improving the regulation of these apps won’t fix the flawed infrastructure that makes traditional mental health treatment inaccessible in America, but with an estimated 12,000 emerging technologies designed to treat mental health, it’s in our best interest to hold mental health apps to the standard of healthcare, even if both have a long way to go.

Camille Sojit Pejcha is a Brooklyn-based writer whose work has appeared in Inverse, Document Journal, GEN and Gizmodo.

Share This Story

Get our newsletter

DISCUSSION

dwintermut3
dWintermute

I think the real lesson here is that our brains are hardwired to be social, and when we can’t express that in healthy ways we will anthropomorphize ANYTHING in our space, whether it’s a volleyball with a face on it or an app, to project our need for socialization onto. We are a semi-eusocial species with a deep need for socialization living in a system (and especially a time when we’re being told, explicitly, that socialization will literally kill us) that does not allow for healthy non-transactional relationships. So we find substitutes.