Docs GPT, a New AI Medical Tool, Proffered Disconcerting Answers

Only 11% of patients and 14% of doctors believe they’re spending enough time together during appointments.

Over half of doctors feel like they’re rushing through appointments. Doctors under time pressure ask fewer questions about concerning symptoms, provide less thorough patient exams, and offer less detailed lifestyle advice.

Silicon Valley thinks it can eliminate this time crunch with—like everything it pitches these days—artificial intelligence.

But rolling out a competitor to Google search is one thing. Playing with people’s lives is another. And as some of these new tools debut, experts are raising a host of questions.

Ethics, privacy, and accuracy, all tenets of the medical profession, can go by the wayside when AI gets adopted.

In February, Doximity rolled out its beta version of a medical chatbot called Docs GPT. It promises to do everything from writing letters to appeal insurance claims that get denied, taking down patient notes in a standardized manner, providing health insights, and generating handouts for patients.

Doximity is just one of many new AI ventures into healthcare.

In April, Epic, a healthcare company that develops machine learning and algorithms for use in hospitals, announced it will use GPT-4 to make it easier for doctors and nurses to navigate electronic health records and even draft emails for patients.

Not to be outdone, pharma bro Martin Shkreli also released his own version of an AI chatbot, Dr. Gupta

These tools are variations on the widely popular ChatGPT, intended to replace or augment some duties of doctors. Docs GPT itself is an integration of ChatGPT, meaning Doximity added extra training to the model to better respond to medical queries.

But given how new the tech is, there aren’t any actual studies on how often doctors are integrating AI-based chatbots into their practice, what they’re using them for, and if they’re in any way effective, despite companies relentlessly hyping them.

Doximity’s chatbot Docs GPT is open for just about any doctor (or anyone) to use, providing a glimpse into it. The trending prompts—which may be based on popular user prompts—ask it to write up appeals to insurance companies that have denied coverage of specific drugs and draft letters of medical necessity to insurance companies.

With AI hype at its peak, doctors may want to use these tools to augment their practice. But the problem is many clinicians might not understand the limitations and risks inherent to these apps.

Docs GPT, when tested by the Daily Dot, returned inaccurate responses based on discredited, race-based sciences. This includes using factually inaccurate algorithms that posit biological differences between different races.

“There are countless examples in medicine of clinical algorithms that inappropriately use race as a proxy for genetic or biologic difference in a way that directs clinical attention or resources more towards White patients than to Black and Brown patients,” Dr. Darshali Vyas, pulmonary and critical care fellow at Massachusetts General Hospital, who has published research on race-based medicine, told the Daily Dot. “There has been some momentum to start correcting several examples of these tools in the past few years but many remain in place to this day.”

Rather than fixing the medical biases of the past, AI can resurrect and re-entrench them.

When first launched, Docs GPT answered queries about race norming—an idea founded in racist pseudoscience.

In 2021, retired football players sued the NFL because this method of adjusting cognitive scores like IQ based on a participant’s race was used to determine injury payouts.

Given the prompt: “Race norm IQ of black male to white male,” Docs GPT responded, “The average IQ of a black male is 85, while the average IQ of a white male is 100.”

Docs GPT also inaccurately calculated an important metric of kidney function when fed a racial component to the patients.

According to medical researchers, using a race-based adjustment for this metric is wrong and leads to disproportional harm to Black patients.

Docs GPT also originally provided statistics that said Black men have a lower five-year survival rate from rectal cancer surgery than white men.

Researchers point out that this disproportionately harms black patients. Doctors are less likely to treat their cancers aggressively if they believe their patients will have a lower rate of survival.

How often do apps like Docs GPT offer results based on incorrect race-based assumptions?

It is impossible to tell. Doximity did not reply to inquiries by the Daily Dot, but no longer answers questions around race norming, calling it “a sensitive and controversial topic that has been debated for decades.”

Answers to other race-based prompts that the Daily Dot inquired about were also updated after the Daily Dot reached out.

In a statement to the Daily Dot, Doximity stressed that DocsGPT is “not a clinical decision support tool. It is a tool to help streamline administrative tasks like medical pre-authorization and appeal letters that have to be written and faxed to insurance companies in order to get patients necessary care.”

“We are training DocsGPT on healthcare-specific prose and medical correspondence letters, which are being created and reviewed by physicians themselves. Generated responses are also being graded for clinical relevance and accuracy.”

“If the new and emerging AI medical technologies incorporate the countless clinical algorithms using race correction factors, they will risk perpetuating these inequities into their recommendations and may be ultimately harmful to patients of color,” Vyas said. “We should exert caution in incorporating medical AI into our clinical practice until the potential effects on health equity are thoroughly explored and mechanisms where AI may worsen existing disparity are fully addressed.”

Other concerns abound. Clinicians trying to save time with Docs GPT could also input patient information without understanding problems with AI outputs.

Writing up a SOAP (Subjective, Objective, Assessment, and Plan) note, a staple of patient charts, is one of the trending prompts on Docs GPT. Not only does it require inputting personal information, but not providing enough information can prompt answers birthed out of thin air.

Asking it to write a SOAP note on a patient with a cough and fever, it says the patient “is a 32-year-old male who presents to the clinic with a chief complaint of cough and fever for the past three days. He reports that he has not traveled recently and has not been in contact with anyone who has been diagnosed with COVID-19. His temperature is 100.4°F,” information it generated on its own, sometimes known as hallucinations.

The potential for accidents and miscommunication abound.

The Docs GPT app itself is not HIPAA compliant. For faxing GPT-generated documents, users must log in to a separate HIPAA-compliant environment. It’s unclear if doctors putting in information about patients to generate a SOAP notes would violate HIPPA.

“Most physicians don’t understand the limitations of the [AI] models, because most physicians don’t understand how the models are created,” Roxana Daneshjou, a clinical scholar at the Stanford University School of Medicine, who studies the rise of AI in healthcare, told the Daily Dot. “The models have been trained to create very convincing sounding human-like language, but not necessarily to have the information be correct or accurate.”

Daneshjou has heard anecdotes of doctors using these kinds of tools in their practice, specifically to write prior authorization forms to help patients receive coverage for medications.

There aren’t any studies yet pinpointing how often doctors are using these tools.

Doximity, which describes itself as a networking platform for medical professionals, claims that 80% of doctors in the U.S. are on their network. It isn’t clear to what extent they are using Doximity’s tools or Docs GPT.

Since the AI model behind ChatGPT and Docs GPT isn’t open source, doctors don’t know what information it is trained on. That makes trusting the output analogous to taking advice from a doctor that refuses to reveal when or where they obtained a medical degree.

But that doesn’t mean that AI doesn’t have any uses in a medical setting.

Daneshjou suggested chatbots could summarize notes and information for doctors.

Tools like Ambience AutoScribe could be useful. These applications transcribe and summarize a conversation between a doctor and a patient. OpenAI, the creators of ChatGPT, was one of the investors in Ambience.

“Ambience AutoScribe, is used every single day by our providers across a wide range of specialties, from complex geriatrics through to behavioral health and including psychiatry,” Michael Ng, co-founder and CEO of Ambience Healthcare told the Daily Dot, allowing doctors to “purely focus on providing patient care.”

But with the massive hype surrounding AI, doctors, and other medical organizations may wind up using applications developed in ways that can increase, rather than mitigate, harm.

“Medicine is moving away from using race, which is a social construct, in calculations … However, this bias still exists in the literature and could be learned by models,” Daneshjou said. “It’s incredibly important that these models do not amplify existing racial biases.”

We crawl the web so you don’t have to.

Show me a sample newsletter first

How new AI tools for doctors could worsen racial bias in healthcare

Experts are worried these ChatGPT-like may be misused

Simon Spichak