They say the models were trained on a bunch of books and that they learned the use of the dash from there. That's fine, no one is denying that humans have always used dashes in their books.
But where you would bet rarely see a dash would be something like a short product review, a YouTube comment or a WhatsApp message. In these contexts the dashes can and do seem out of place.
OTOH, as long as user-interactive web content has existed—so “always” in a context of a particular view of the online world—em-dashes have been part of it, because the facilities that make it easy to use (whether automatic replacement, or various keyboard input modifying mechanisms) have been sufficiently common that a robust minority of users have regularly used one or more of them.
There's nothing elegant about a punctuation mark firmly glued to the words on either side, making a sequoia-sized typographic log that typically gets wrapped in its entirety to the next line, leaving a half mile or so of white space just hanging in space before the wrap.
If you're gonna use the em dash, make sure your software can break a line on either side of one.
I frequently am accused of using LLMs to write my prose, something that I not only eschew, but also believe is morally corrupt and intellectually dishonest.
I’m not above spellcheck, grammar checkers, or even LLM driven evaluation of articles, but my thoughts, word choices, and structure are always of my own design.
I use the em-dash where it is appropriate.
I find that people accusing writers of using AI typically disagree with the premise of the text, and use the “AI” character assault as a method of dehumanising the author and dismissal of their work. The assertion is very rarely made in good faith, but rather is used as a weak attempt to discredit an idea without actually refuting the premise or even examining the argument.
Shame on whoever argues in this way, it’s weak, unproductive, and intellectually lazy. It’s fine to disagree, but if you aren’t willing to act in good faith, just keep your thoughts to yourself. You’re only going to discredit your own point of view if you touch the keyboard.
Personally, I'm more prone to excessive semicolon usage, which seems to aggravate editors.
Newspapers generally avoid it, even avoiding it completely in favour of commas. Properly wielding the n-dash or the m-dash requires training.
I've been a Mac user for years, where the em dash is a modified hyphen on the Mac keyboard. When I moved to primarily using PCs, the em dash alt-key combo was the first one I memorized (alt-0151).
I'm sorry to the professional writers out there, but if I see an emdash in a piece of throw away writing (like a reddit or HN comment) I assume it's AI generated and I now immediately stop reading it.
I don't think use of an em dash is indicative in itself of AI assistance, but rather, the change to using them. Did this person all of a sudden start using them? There are also other things to look at, like how certain bullet point lists have emphasis (for key phrases, being bold, when previously the author didn't do so, stylistically).
I write a lot (as a PM) - I've taken to using MacWhisper, which does local AI dictation, but also (at my configuration) sends it to a ChatGPT prompt first:
"You are a professional proofreader and editor. Your task is to refine and polish the given transcript as follows:
1. Correct any spelling errors.
2. Fix grammatical mistakes.
3. Improve punctuation where necessary.
4. Ensure consistent formatting.
5. Clarify ambiguous phrasing without changing the meaning.
6. If a sentence or paragraph is overly verbose and has more than negligible redundancy, lightly edit for brevity.
7. If the transcript contains a question, edit it for clarity but do not provide an answer.
Please return only the cleaned-up version of the transcript. Do not add any explanations or comments about your edits."
This is great. I get the benefits of pretty accurate transcription while getting a first pass at copyediting almost in real time. It did require me to make some tweaks to my dictation process (allowing it to "chew" on larger chunks to give better context to its editing), but it works very well.
If you encounter an em-dash in an online discussion, most likely someone went to extra effort to include it, or it was automatically inserted, possibly by an AI.
There are other signs that you're looking at AI-generated texts, like lists of three, a certain turn of phrase, or vague generalities, but those are easier for a human to type than an em-dash.
In certain contexts, em dashes are perfectly natural and human. That being said, everyone has encountered articles and posts that read so obviously like AI, and in those contexts the presence of numerous em dashes is certainly an additional data point.
I think the main reason people are noticing it now is because most writing has moved away from legacy tools like Word. Websites like Twitter don't do that character substitution, so it has become quite obvious when text is being pasted from another place...for example, AI generated content.
And yet here we ware.
Of course people use the em-dash, and of course LLMs use them at least 10x-100x more than your average human writer. Also, they add nothing to writing, 99.8% people just use an en-dash when typing where an em-dash would be used in print, and absolutely nothing is lost. Some dickheads (like myself) have used a compose key (or similar) to use actual em-dashes in order to seem sophisticated online.
The only people who need the em-dash, as far as I know, are Spanish-language writers. As for LLM-shaming, isn't it more shameful when you publish an article that could easily be entirely written by LLM, but definitely wasn't, like this one?
edit: articles like this make me want to misuse flagging.
The reason em dashes are a giveaway for AI generated text is simply because there is no em dash key on the keyboard - only an en dash key. The dash I used in that last sentence was an en dash, not an em dash.
Some publishing applications (including Microsoft Word) will automatically convert en dashes to em dashes where appropriate. But most email apps, chat apps, online posts/comments, and practically any application not designed for writing actual printed publications will not do that conversion for you. And without a dedicated key, it is far too cumbersome for most people to bother. They will just leave it as an en dash.
So yes, the em dash is still a reliable indicator of AI-generated content in many contexts.
But I think worst of all it just gives me the fucking creeps, some uncanny-valley bullshit. I see hyphens a million times a day then out of nowhere comes this creepy slender-man looking motherfucker that's just a little bit too long than you'd expect or like, and is always touching all the letters around it when it shouldn't need to. It stands out looking like a weird print error... on my screen! Hopefully it keeps building a worse and worse reputation.