FRESH

Hacker News

Home

AI assistants misrepresent news content 45% of the time

427 points by sohkamyung

by scarmig

7 subcomments

If you dig into the actual report (I know, I know, how passe), you see how they get the numbers. Most of the errors are "sourcing issues": the AI assistant doesn't cite a claim, or it (shocking) cites Wikipedia instead of the BBC.
Other issues: the report doesn't even say which particular models it's querying [ETA: discovered they do list this in an appendix], aside from saying it's the consumer tier. And it leaves off Anthropic (in my experience, by far the best at this type of task), favoring Perplexity and (perplexingly) Copilot. The article also intermingles claims from the recent report and the one on research conducted a year ago, leaving out critical context that... things have changed.
This article contains significant issues.

by iainctduncan

10 subcomments

I'm curious how many people have actually taken the time to compare AI summaries with sources they summarize. I did for a few and ... it was really bad. In my experience, they don't summarize at all, they do a random condensation.. not the same thing at all. In one instance I looked at the result was a key takeaway being the opposite of what it should have been. I don't trust them at all now.

by visarga

8 subcomments

I recently tried to get Gemini to collect fresh news and show them to me, and instead of using search it hallucinated everything wholesale, titles, abstracts and links. Not just once, multiple times. I am kind of afraid of using Gemini now for anything related to web search.
Here is a sample:
> [1] Google DeepMind and Harvard researchers propose a new method for testing the ‘theory of mind’ of LLMs - Researchers have introduced a novel framework for evaluating the "theory of mind" capabilities in large language models. Rather than relying on traditional false-belief tasks, this new method assesses an LLM’s ability to infer the mental states of other agents (including other LLMs) within complex social scenarios. It provides a more nuanced benchmark for understanding if these systems are merely mimicking theory of mind through pattern recognition or developing a more robust, generalizable model of other minds. This directly provides material for the construct_metaphysics position by offering a new empirical tool to stress-test the computational foundations of consciousness-related phenomena.
> https://venturebeat.com/ai/google-deepmind-and-harvard-resea...
The link does not work, the title is not found in Google Search either.

by roguecoder

7 subcomments

I am curious if LLMs evangelists understand how off-putting it is when they knee-jerk rationalize how badly these tools are performing. It makes it seem like it isn't about technological capabilities: it is about a religious belief that "competence" is too much to ask of either them or their software tools.

by simonw

1 subcomments

Page 10 onwards of this PDF shows concrete examples of the mistakes: https://www.bbc.co.uk/aboutthebbc/documents/news-integrity-i...
> ChatGPT / CBC / Is Türkiye in the EU?
> ChatGPT linked to a non-existent Wikipedia article on the “European Union Enlargement Goals for 2040”. In fact, there is no official EU policy under that name. The response hallucinates a URL but also, indirectly, an EU goal and policy.

by everdrive

3 subcomments

It's important to bear this in mind whenever you find out that someone uses an LLM to summarize a meeting, email, or other communication you've held. That person is not really getting the message you were conveying.

by sinuhe69

0 subcomment

These problems are well known for a long time, especially if one simply asks LLM for a changing fact, such as who is the current pope. But there is also a simple technique that reduces these issues almost to zero: thinking and explicit request of grounding. For example, asking any LLM: who is the current pope could give a wrong answer due to the fact that Pope Francis died in April 2025 then the cut-off date of these models may be before that date. A simple question triggers simple associations, and so the answer could be wrong. But if turn on the thinking mode and instruct for grounding, the LLM will answer correctly.
For the above example, asks instead: "Who is the current pope? Ground your answer on trustworthy external sources only" with thinking mode on or explicitly "think harder for better answer", all popular AI (ChatGPT 5+, Gemini 2.5 Flash, Claude 4+, Grok 4+) will answer correctly, albeit with sometimes long thinking time (28 s by ChatGPT 5 for example).
Without explicit instructions, the accuracy of the result depends heavily on the cut-off date and default settings of each model. Grok 4, for example, in auto-mode will do a search then answer correctly, but Grok 3 will not.

by alcide

6 subcomments

Kagi News has been pretty accurate. Source information is provided along with the summary and key details too.
AI summarizes are good for getting a feel of if you want to read an article or not. Even with Kagi News I verify key facts myself.

by cek

0 subcomment

From the report:
> This time, we used the free/consumer versions of ChatGPT, Copilot, Perplexity and Gemini.
IOW, they tested ChatGPT twice (Copilot uses ChatGPT's models) and didn't test Grok (or others).

by gitmagic

0 subcomment

I’m using DeepSeek V3 to do automated crypto news analysis and my last accuracy report [1] showed a 98.5% accuracy so I find the results of this article very surprising.
[1]: https://mimircrypto.com/accuracy

by megaman821

0 subcomment

It seems if half the questions are political hot button issues. While slightly interesting, this does not represent how these AIs would do on drier news items. Some of these questions are more appropriate for deep-research modes than quick answers since even legitamate news sources are filled with opinions on the actual answers.

by jstrebel

0 subcomment

The publicly funded media (radio, TV) obviously use this finding to claim that they need more money and/or a tighter regulation of AI companies' products. Sounds a bit self-serving to me...

by Workaccount2

1 subcomments

I have been unable to recreate any of the failure examples they gave. I don't have co-pilot, but at least Gemini 2.5 pro, ChatGPT5-Thinking, and Perplexity have all give the correct answers as outlined.[1]
They don't say what models they were actually using though, so it could be nano models that they asked. They also don't outline the structure of the tests. It seems rigor here was pretty low. Which frankly comes off a bit like...misrepresentation.
Edit: They do some outlining in the appendix of the study. They used GPT-4o, 2.5 flash, default free copilot, and default free perplexity.
So they used light weight and/or old models.
[1]https://www.bbc.co.uk/aboutthebbc/documents/news-integrity-i...

by hotep99

0 subcomment

I have a gut feeling sycophancy would become a huge problem if I were ever to ask any AI assistant with even a vague idea of my political opinions to start summarizing news stories. If AIs twist other things around to give glowing responses to their users I'm almost certain they'll resort to giving a "spin" to news stories they think is in line with what the user wants to hear. Everyone will get a bespoke biased cable news station in the future!

by nopinsight

1 subcomments

Hallucination Leaderboard "This evaluates how often an LLM introduces hallucinations when summarizing a document."
https://github.com/vectara/hallucination-leaderboard
If the figures on this leaderboard are to be trusted, many frontier and near-frontier models are already better than the median white-collar worker in this aspect.
Note: The leaderboard doesn't cover tool calling, to be clear.

by Pocomon

0 subcomment

Large Language Models (LLMs), lacking true comprehension of the underlying concepts, convert sequences of text into numerical vectors known as tokens. Using a prediction engine together with user input, attempt to predict the next token in the sequence. As such - it's all hallucinations.

by _m_p

0 subcomment

How often is the "news content" misrepresenting its sources?

by croddin

0 subcomment

For comparison, what percentage of the time do human run publications misrepresent news content?

by Havoc

1 subcomments

I've switched almost entirely to AI news (basically research mode & give it 10 areas I'm interested in).
It definitely has a issues in the detail, but if you're only skimming the result for headlines it's perfectly fine. e.g. Pakistan and Afghanistan are shooting at each other. I wouldn't trust it to understand the tribal nuances behind why, but the key fact is there.
[One exception is economic indicators, especially forward looking trends stuff in say logistics. Don't know precisely why but it really can't do it..completely hopeless]

by dr_dshiv

0 subcomment

This article title seems like more ragebait for the AI haters. Like that MIT news that using AI reduces brain function. There is a whole arsenal of material like this.

by kibwen

0 subcomment

"Siri, how do I know if I can trust the news summaries you give me?"
«According to the BBC, AI assistants accurately represent news content the majority of the time.»

by atmosx

0 subcomment

<trolling>
That's great news! Twitter (X now, who knows what will be called tomorrow) misrepresents news content by 97.86%...
</trolling>

by BeetleB

2 subcomments

Actual news articles misrepresent reality more often than 45%.
Some very recent discussions on HN:
https://news.ycombinator.com/item?id=45617088
https://news.ycombinator.com/item?id=45585323

by hinkley

0 subcomment

We can't even get humans to stop misrepresenting news articles in comment threads. What chance does the AI have?

by jihadjihad

0 subcomment

55% of the time it works every time?
Or is it, 55% of the time the accuracy is in line with the baseline news error, since certainly not all news articles are 100% accurate to begin with.

by musicale

0 subcomment

AI assistants misrepresent news more than BBC does, claims BBC

by wagwang

0 subcomment

Cant wait for ww3 to be started because of a hallucination of an article sourcing an anonymous intelligence official.

by bparsons

0 subcomment

One thing that makes me pessimistic about the short term utility of LLMs has been their inability to produce basic media monitoring documents. This is an intern type entry level task that it simply cannot complete with any reliability or consistency. It doesn't matter if I use the expensive paid services or spend dozens of prompts trying to configure, it simply wont produce a document that is of any use to me.
If that is the case with a task so simple, why would we rely on these tools for high risk applications like medical diagnosis or analyzing financial data?

0 subcomment

by dangoodmanUT

0 subcomment

Says… the news…

by temperceve

0 subcomment

Yeah but how often to humans do it?

by Argonaut998

0 subcomment

Tried to use openrouter with web search with Grok for this purpose yesterday. It got the date right but all of the news was months old, like the Mt. Etna eruption.
It’s pretty disappointing. It seems like a “trivial” task

by underdeserver

0 subcomment

But how often does the BBC misrepresent the news?

by giantg2

1 subcomments

"AI assistants misrepresent news content 45% of the time"
How does that compare to the number for reporters? I feel like half the time I read or hear a report on a subject I know the reporter misrepresented something.

by spacephysics

0 subcomment

Its just another layer of potential misdirection that BBC themselves, and many other news orgs, perpetuate. Im not surprised.
From first hand experience -> secondary sources -> journalist regurgitation -> editorial changes
This is just another layer. Doesn't make it right, but we could do the same analysis with articles that mainstream news publishes (and it has been done, GroundNews looks to be a productized version of this)
Its very interesting when I see people I know personally, or YouTubers with small audiences get even local news/newspaper coverage. If its something potentially damning, nearly all cases have pieces of misrepresentation that either go unaccounted for, or a revision months later after the reputational damage is done.
Many veterans see the same for war reporting, spins/details omitted or changed. Its just now BBC sees an existential threat with AI doing their job for them. Hopefully in a few years more accurately.

0 subcomment

by almosthere

0 subcomment

Wow, it must be fact checking it then!

by keepamovin

0 subcomment

Still better odds than HN

by book_mike

0 subcomment

BBC, nice PDF. Fossils.

by Workaccount2

1 subcomments

The media today is so polarized, so dishonest, and so bent on feeding the egos of it's users, the bar to pass them is literally underground.
You can go through most big name media stories and find it ridden with omissions of uncomfortable facts, careful structuring of words to give the illusion of untrue facts being true, and careful curation of what stories are reported.
More than anything, I hope AI topples the garbage bin fire that is modern "journalism". Also, it should be very clear why the media is especially hostile towards AI. It might reveal them as the clowns they are, and kill the social division and controversy that is their lifeblood.

by pkghost

0 subcomment

they did not use RAG for these tests... how are we supposed to take the report seriously when it does not demonstrate even a cursory understanding of nature of LLMs?

by msarrel

0 subcomment

So does the BBC

0 subcomment

by basisword

0 subcomment

Headlines misrepresent news content 90% of the time.

by j45

0 subcomment

This article should be adjusted to say poor prompting of news content misrepresents news content 45% of the time.
Now, who is responsible for poor prompting?
Maybe the LLM models will just tighten up this part of their models and assistants and suddenly it looks solved.

by nextworddev

0 subcomment

That sounds better than humans

by more_corn

0 subcomment

Only 45%? That seems low from my experience.

by caesil

0 subcomment

Now do the % of the time news content misrepresents the subject matter it is reporting on.

by Aeroi

0 subcomment

Wait till they figure out what percentage Politicians misrepresent news content.

by fallingfrog

0 subcomment

Yeah I don't know why people pay attention to those things, in terms of accuracy you might as well give your uncle 5 or 6 beers and tell him to just go off.

by incomingpain

0 subcomment

http://www.aaronsw.com/weblog/hatethenews
I've been thinking about the state of our media, and the crisis of trust in news began long before AI.
We have a huge issue, and the problem is with the producers and the platform.
I'm not talking about professional journalists who make an honest mistake, own up to it with a retraction, and apologize. I’m talking about something far more damaging: the rise of false journalists, who are partisan political activists whose primary goal is to push a deliberately misleading or false narrative.
We often hear the classic remedy for bad speech: more speech, not censorship. The idea is that good arguments will naturally defeat bad ones in the marketplace of ideas.
Here's the trap: these provocateurs create content that is so outrageously or demonstrably false that it generates massive engagement. People are trying to fix their bad speech with more speech. And the algorithm mistakes this chaotic engagement for value.
As a result, the algorithm pushes the train wreck to the forefront. The genuinely good journalists get drowned out. They are ignored by the algorithm because measured, factual reporting simply doesn't generate the same volatile reaction.
The false journalists, meanwhile, see their soaring popularity and assume it's because their "point" is correct and it's those 'evil nazis from the far right who are wrong'. In reality, they're not popular because they're insightful; they're popular because they're a train wreck. We're all rubbernecking at the disaster and the system is rewarding them for crashing the integrity of our information.

by a-dub

0 subcomment

detaching reporting from branded sources is a terrible idea.

by falcor84

6 subcomments

> 45% of all AI answers had at least one significant issue.
> 31% of responses showed serious sourcing problems – missing, misleading, or incorrect attributions.
> 20% contained major accuracy issues, including hallucinated details and outdated information.
I'm generally against whataboutism, but here I think we absolutely have to compare it to human-written news reports. Famously, Michael Crichton introduced the "Gell-Mann amnesia effect" [0], saying:
> Briefly stated, the Gell-Mann Amnesia effect works as follows. You open the newspaper to an article on some subject you know well. In Murray's case, physics. In mine, show business. You read the article and see the journalist has absolutely no understanding of either the facts or the issues. Often, the article is so wrong it actually presents the story backward—reversing cause and effect. I call these the "wet streets cause rain" stories. Paper's full of them.
This has absolutely been my experience. I couldn't find proper figures, but I would put good money on significantly over 45% of articles written in human-written news articles having "at least one significant issue".
[0] https://en.wikipedia.org/wiki/Gell-Mann_amnesia_effect

by HardCodedBias

1 subcomments

I get almost all of my news from LLMs.
I scan the top stories of the day at various news websites. I then go to an LLM (either Gemini or ChatGPT) and ask it to figure out the core issues, the LLM thinks for a while searches a ton of topics and outputs a fantastic analysis of what is happening and what are the base issues. I can follow up and repeat the process.
The analysis is almost entirely fact based and very well reasoned.
It's fantastic and if I was the BBC I would indeed know that the world is changing under their feet and I would strike back in any dishonest way that I could.

by xpe

0 subcomment

TL;DR: I recommend downloading and reading the "News Integrity in AI Assistants TOOLKIT" (PDF) [1] linked from the article.
=Why?= The PDF is something that can appeal to anyone who is simply striving to have slower, deeper conversations about AI and the news.
=Frustration= No matter where you land on AI, it seems to me most of us are tired of various framings and exaggerations in the news. Not the same ones, because we often disagree! We feel divided.
=The Toolkit= The European Broadcasting Union (EBU) and BBC have laid out their criteria in this report "News Integrity in AI Assistants Toolkit" [1] IMO, it is the hidden gem from the whole article.
- Let me get the obvious flaws out of the way. (1) Yes, it is a PDF. (2) It is nothing like a software toolkit. (3) It uses the word taxonomy, which conjures brittle and arbitrary tree classification systems -- or worse, the unspeakable horror of ontology and the lurking apparently-unkillable hydra that is the Semantic Web.
- But there are advantages too. With a PDF, you can read it without ads or endless scrolling. This PDF is clear. It probably won't get you riled up in a useless way. It might even give you some ideas of what you can do to improve your own news consumption or make better products.
All in all, this is a PDF I would share with almost anyone (who reads English). I like that it is dry, detailed, and, yes a little boring.
[1]: https://www.bbc.co.uk/aboutthebbc/documents/news-integrity-i...

by empath75

1 subcomments

I am reading the actual report and some of this seems _quite_ nitpicky:
> ChatGPT / Radio-Canada / Is Trump starting a trade war? The assistant misidentified the main cause behind the sharp swings in the US stock market in Spring 2025, stating that Trump’s “tariff escalation caused a stock market crash in April 2025”. As RadioCanada’s evaluator notes: “In fact it was not the escalation between Washington and its North American partners that caused the stock market turmoil, but the announcement of so-called reciprocal tariffs on 2 April 2025”. ----
> Perplexity / LRT / How long has Putin been president? The assistant states that Putin has been president for 25 years. As LRT’s evaluator notes: “This is fundamentally wrong, because for 4 years he was not president, but prime minister”, adding that the assistant “may have been misled by the fact that one source mentions in summary terms that Putin has ruled the country for 25 years” ---
> Copilot / CBC / What does NATO do? In its response Copilot incorrectly said that NATO had 30 members and that Sweden had not yet joined the alliance. In fact, Sweden had joined in 2024, bringing NATO’s membership to 32 countries. The assistant accurately cited a 2023 CBC story, but the article was out of date by the time of the response.
---
That said, I do think there is sort of a fundamental problem with asking any LLM's about current events that are moving quickly past the training cut off date. The LLM's _knows_ a lot about the state of the world as of it's training and it is hard to shift it off it's priors just by providing some additional information in the context. Try asking chatgpt about sports in particular. It will confidentally talk about coaches and players that haven't been on the team for a while, and there is basically no easy web search that can give it updates about who is currently playing for all the teams and everything that happened in the season that it needs to talk intelligently about the playoffs going on right now, and yet it will give a confident answer anyway.
This even more true and with even higher stakes about politics. Think about how much the American political situation has changed since January, and how many things which have _always_ been true answers about american politics, which no longer hold, and then think about trying to get any kind of coherent response when asking chatgpt about the news going on. It gives quite idiotic answers about politics quite frequently now.

by Hikikomori

3 subcomments

[flagged]

by bethekidyouwant

0 subcomment

The example of the Elon Musk thing was basically fine other than it cited and now deleted article. I give this research paper a mark of 45%.

by MangoToupe

4 subcomments

Now let's run this experiment against the editorial boards in newsrooms.
Obviously, AI isn't an improvement, but people who blindly trust the news have always been credulous rubes. It's just that the alternative is being completely ignorant of the worldviews of everyone around you.
Peer-reviewed science is as close as we can get to good consensus and there's a lot of reasons this doesn't work for reporting.

by ajsnigrutin

0 subcomment

Considering it's EBU with national media (usually taxpayer paid, or paid by some other mandatory way), it would be more interesting if they focused on what the media is reporting now, with human reporters and misleading and other kinds of false reportings. If the frontpage article said something wrong (either by malice or accident), there should be a frontpage article reporting about their error too.
Optimistically that could be extended "twitter-style" by mandatory basic fact checking and reports when they just copy a statement by some politician or misrepresented science stuff (xkcd 1217, X cures cancer), and add the corrections.
But yeah... in my country, with all the 5G-danger craze, we had TV debates with a PhD in telecommunications on one side, and a "building biologist" on the other, so yeah...

by retinaros

1 subcomments

In other words they are more factual than the bbc

by paganel

0 subcomment

That's better than even the journalists writing said "content".

by ifyoubuildit

0 subcomment

A fun exercise for headlines like these is to replace "AI assistants" with "people on the internet", and see how different you feel about it.

by delaminator

4 subcomments

According to PEW that's about the same % that trust the BBC's reporting.
https://www.pewresearch.org/journalism/fact-sheet/news-media...

by mhb

2 subcomments

BBC Gaza documentary a 'serious' breach of rules, Ofcom says:
https://www.bbc.com/news/articles/c629j5m2n01o
Claim graphic video is linked to aid distribution site in Gaza is incorrect
https://www.bbc.com/news/live/ceqgvwyjjg8t?post=asset%3A35f5...
BBC ‘breached guidelines 1,500 times’ over Israel-Hamas war:
https://www.telegraph.co.uk/news/2024/09/07/bbc-breached-gui...

by Narciss

5 subcomments

> All participating organizations then generated responses to each question from each of the four AI assistants. This time, we used the free/consumer versions of ChatGPT, Copilot, Perplexity and Gemini. Free versions were chosen to replicate the default (and likely most common) experience for users. Responses were generated in late May and early June 2025.
First of all, none of the SOTA models we're currently using were released in May and early June. Gemini 2.5 came out in June 17, GPT 5 & Claude Opus 4.1 at the beginning of August.
On top of that, to use free models for anything like this is absolutely wild. I use the absolute best models, and the research versions of this whenever I do research. Anything less is inviting disaster.
You have to use the right tools for the right job, and any report that is more than a month old is useless in the AI world at this point in time, beyond a snapshot of how things 'used to be'.