FRESH

Hacker News

Home

Massive study detects AI fingerprints in millions of scientific papers

25 points by pseudolus

by WheelsAtLarge

1 subcomments

This is not surprising. People are going to seek the path of least resistance. I suspect many non english speaking academics can now get publish under their own name using LLM's as translators. My question is: Are LLMs making the papers easier to read? Some authors make it point to say what they want to say in the least understandable manner. That's been the past. Are things changing?

by alganet

1 subcomments

I decided to look at it more closely.
It is massive, lots of papers analyzed, but just the Abstract part of each.
One way you could look at it is: some authors used an LLM to create the abstract from the paper contents. "Hey, there's a lot of new books with AI covers"
One other way is: There could be a correlation between LLM usage in the abstract and LLM usage during the production or writing of the paper. "Hey, I wonder if this book with an AI cover also was written by an AI. It should be investigated"

by physicsguy

0 subcomment

I'm no longer in academia but I was asked to review a paper recently which was similar to one of my final papers, so I accepted.
The paper was written by four Chinese authors who had previously collaborated together. It was one of the worst papers I've ever seen. It went over the same ground as previous studies so in that sense there was nothing new, but more bizarrely it did things like state well known equations incorrectly (most notably one of Maxwell's equations) incorrectly and drifted off onto total tangents about how one might calculate certain things in an idealised situation that had no relevance to the method that the authors used. My assumption is that they had generated the methods section using generative AI and not gone over it with even a cursory check.
But the worst part is that I recommended rejecting it outright, and the journal just sent it on to it's slightly less prestigious sister journal rather than doing so.

by orionsbelt

1 subcomments

While I am wary of AI being used to pump out crappy papers with bad science, I will say that many academics can be good in their fields while being quite bad at communicating clearly. I don't think it's a bad thing for a good scientist to use AI be used to take a genuine and scientifically interesting draft paper and use it to improve the writing so that it's clearer to the reader.

by garylkz

1 subcomments

Curious, currently how is the use of AI being detected from papers?
From the article I saw that they're using "excess words" as an indicator, is that a reliable method?
Also, is it possible that it's just autocorrect that added "excess words" when fixing grammar? If that's the case, should that be considered as "use of AI"?

by PeterStuer

0 subcomment

From the study:
" while our approach can detect unexpected lexical changes, it cannot separate different causes behind those changes, like multiple emerging topics or multiple emerging writing style changes. For example, our approach cannot distinguish word frequency increase due to direct LLM usage from word frequency increase due to people adopting LLM-preferred words and borrowing them for their own writing. For spoken language, there is emerging evidence for such influence of LLMs on human language usage (32). However, we hypothesize that this effect is much smaller and much slower. "
Since "unexpected" can be left out as there is no "expected" baseline, we are left with "detected lexical changes, cause unknown" followed by a purely speculative and unproven hypothesis.
That of course does not stop them from claiming: "In conclusion, our work showed that the effect of LLM usage on scientific writing is truly unprecedented".
This is slop. Even if you agree with the suspicion of widespread use of LLMs in Academic writing (I do, and anecdotally suspect even their upper bound is way underestimated) They did not show such an effect at all. They used a projection on a dataset, hypothesized, but did not investigate a cause.
But hey, let not the absence of information restrain us from pushing the agenda: "Our analysis can inform the necessary debate around LLM policies providing a measurement method for LLM usage". In other words, here is what we did not do but you can pretend we did anyway in citing us as if we did because nobody reads anything but the abstract and the conclusion.
As a bonus kicker: "We hope that future work will meticulously delve into tracking LLM usage more accurately and assess which policy changes are crucial to tackle the intricate challenges posed by the rise of LLMs in scientific publishing." I'm sure this sentence would flag this as LLM slop, even according to their own sloppy methodology.