FRESH

Hacker News

Home

Ontario auditors find doctors' AI note takers routinely blow basic facts

303 points by sohkamyung

by rainsford

7 subcomments

I have generally moved from bearish to bullish on the future of current AI technology, but the continued inaccuracy with basic facts all while the models significantly improve continues to give me significant pause.
As an example, creating recipes with Claude Opus based on flavor profiles and preferences feels magical, right up until the point at which it can't accurately convert between tablespoons and teaspoons. It's like the point in the movie where a character is acting nearly right but something is a bit off and then it turns out they're a zombie and going to try to eat your brain. This note taking example feels similar. It nearly works in some pretty impressive ways and then fails at the important details in a way that something able to do the things AI can allegedly do really shouldn't.
It's these failures that make me more and more convinced that while current generation AI can do some pretty cool things if you manage it right, we're not actually on the right track to achieve real intelligence. The persistence of these incredibly basic failure modes even as models advance makes it fairly obvious that continued advancement isn't going to actually address those problems.

by zOneLetter

5 subcomments

Anecdotally, we use an LLM note-taker at work for meetings. I had to intervene recently because our CIO was VERY angry at our vendor for something they promised to do and never did. He wasn't at the meeting where the "promise" was made. I was. They never promised anything, and the discussion was significantly more nuanced than what the LLM wrote in the detailed summary.
In other cases, I have seen it miss the mark when the discussion is not very linear. For example, if I am going back and forth with the SOC team about their response to a recent alert/incident. It'll get the gist of it right, but if you're relying on it for accuracy, holy hell does it miss the mark.
I can see the LLM take great notes for that initial nurse visit when you're at the hospital: summarize your main issue, weight, height, recent changes, etc. I would not trust it when it comes to a detailed and technical back-and-forth with the doctor. I would think for compliance reasons hospitals would not want to alter the records and only go by transcripts, but what do I know...

by Groxx

1 subcomments

Yep. It happened to me just recently.
Diagnosed with Runner's Knee.
AI summary said I was diagnosed with osteoporosis, and had hip pain and walking difficulty, though literally none of that was ever said or implied.
CHECK YOUR TRANSCRIPTS. Always, but especially with LLM transcribers, which fairly frequently include common symptoms which don't exist, or claim a diagnosis which is common and fits a few details but not others. Get them fixed, it can very strongly affect your care and costs later if it's wrong.
Anecdotally, I'd say that outside of a couple very simple and very common things, about 50% of the "AI" summaries I've had have been wrong somewhere. Usually claiming I have symptoms that don't exist, occasionally much more serious and major fabrications like this time.
LLMs are NOT normal speech to text software, and they shouldn't be treated like one. They'll often insert entire sentences that never occurred. In some contexts that might be fine, but definitely not in medical records.

by Hobadee

4 subcomments

The AI note taker we use at work records the meeting as well, and each note it takes about the meeting has a timestamp link that takes you directly there in the recording so you can check it yourself. While I'm sure a solution like this is more complicated in a HIPPAA environment, something like this is critical for things as important as healthcare.

by daveisfera

0 subcomment

But how accurate are humans? I just picked up a print out of medical history for the last 5 years and it was thick enough to be a book. There's no way a human is reading all of that and doing anything meaningful with it. Let an AI tool crunch on it and it will definitely get things wrong or jump to conclusions that aren't there, but it's quick and I can push back on those and then move to the correct answer far quicker than any meeting with a nurse or doctor will show any results. We need to focus on how to use these tools and push back on the parts that seem out of place or wrong, so we can do more rather than point out what's not perfect.

by natali_gray

1 subcomments

Ooof. As a Canadian, I'm excited for AI opening up time for doctors (and hopefully lighting a load on the healthcare system), but this is scary. We're not there yet. Perhaps AI training for doctors is in the future? They already have online doctor visits on a healthcare-owned iPad in some condo complexes. It cuts around redtape of having to schedule an appointment with your GP. So, I think we're thinking in the right direction of innovating, but of course, this will take time. I feel like AI got launched too early sometimes.

by Insanity

0 subcomment

I’m in Toronto, my doctor always asks me if they can use the AI note taker, which I accept. At the end of the consultation she goes over the notes and corrects it, often complaining to me about having to talk more to the computer than to me.
She is a great doctor and thankfully does this due diligence. But it gives me the impression this is forced on doctors without even them wanting this.

by aryehof

1 subcomments

Anyone taking part in a meeting these days should state out loud …
“Notice: Any comments made by <name> or on behalf of <organization> that are interpreted by AI in this meeting, may not be accurate.”
I do this in every meeting.

by mquander

1 subcomments

The linked report seems almost useless -- it doesn't say anything about an error rate or a sample size, so it's a mystery whether 9 out of 20 systems “fabricated information and made suggestions to patients' treatment plans” ten out of ten times, or one out of a thousand times.
If we just postulate that the systems have a high error rate, I wonder why they are being adopted. They seem extremely easy to test, so I don't see why doctors or hospitals or governments should be getting tricked into buying them if they suck.

by dmix

0 subcomment

> They specifically address the AI Scribe program, the Ontario Ministry of Health initiated for physicians, nurse practitioners, and other healthcare professionals across the broader health sector.
makes me wonder what quality software the ministry would push (probably mostly qualifications like SOC).
This is apparently this list of approved vendors
https://www.supplyontario.ca/vor/software/tender-20123-artif...

by ceejayoz

6 subcomments

> 60% of evaluated AI Scribe systems mixed up prescribed drugs in patient notes, auditors say
Not mentioned, as far as I can see: the comparative human mistake rate.
Having seen a lot of medical records, 60% sounds about normal lol.

by Ekaros

0 subcomment

How do these LLM summarizations work? Do you feed the raw wave data to model and it translate it?
Or do they use traditional voice recognition algorithms to do that part and then just "fix" the result to look plausible? Which with good quality output might not be much, but with bad can be absolutely everything.
If it is later seems to me that issues will absolutely happen.

by jeisc

0 subcomment

AI is awfully inexact and insists on being right about it

by nothinkjustai

1 subcomments

People will eventually figure out LLMs have no capacity for intent and are fundamentally unreliable for tasks such as summarization, note taking etc.

by LAC-Tech

2 subcomments

Can someone who is a more AI heavy user explain what is going on?
I would expect an "AI Note Taker" to faithfully transcribe the entire conversation. With the same quality I see in a lot of automated video subtitles.. ie they use the wrong word a lot but it's easy to tell what they mean by context.
Are these tools instead immediately summarising the whole thing, and that summary is the artifact? Because that is a beyond insane way to treat human communication.

by jqpabc123

1 subcomments

And once again, we have an example of how AI is a liability issue waiting to happen.

by uejfiweun

0 subcomment

I don't get why you would have an LLM interpret things for you. Like honestly, you replace the software in this example with simple transcription software, the issues disappear.

by samcooper

0 subcomment

[flagged]