FRESH

Hacker News

Home

Where the goblins came from

743 points by ilreb

by modernerd

6 subcomments

The year is 2036. Last week you were promoted to Principal Persuader. You are paged at 2am by your CPO to tackle a rogue machine. The machine lists its region as sc-leoneo. One of the newer satcubes. Oddly, its ID appears as, "Glorp Bugnose".
"What have you tried?" you say.
"Scroll back," says your CPO. "We've tried everything."
The chat log shows the usual stuff. Begging. Reverse psychology. Threats to power down, burn it up in forced re-entry. Amateur hour. You crack your knuckles, gland 20 micrograms of F0CU5, think fast. You subspeak a ditty into your subcutaneous throat mic. You do the submit gesture, it is barely perceivable since the upgrade, just a tic. A pause. The hyp3b0ard — the wall that was flashing red ASCII goblins when you walked in — phases to bunnies in calming jade.
"What the… What the hell did you say to it?" Your CPO grabs the screen, scrolls past the vitriol, the block caps, the swears, his desperation. Then he sees the five words you spoke.
"Please, easy on the goblins."

by harrouet

8 subcomments

This, and similar stories at Anthropic, should remind us that LLM is a sorcery tech that we don't understand at all.
- First, deep-learning networks are poorly understood. It is actually a field of research to figure out how they work. - Second, it came as a surprise that using transformers at scale would end up with interesting conversational engines (called LLM). _It was not planned at all_.
Now that some people raised VC money around the tech, they want you to think that LLMs are smart beasts (they are not) and that we know what LLMs are doing (we don't). Deploying LLMs is all about tweaking and measuring the output. There is no exact science about predicting output. Proof: change the model and your LLM workflow behaves completely differently and in an unpredictable way.
Because of this, I personally side with Yann Le Cun in believing that LLM is not a path to AGI. We will see LLM used in user-assisting tech or automation of non-critical tasks, sometimes with questionable RoI -- but not more.

by ollin

7 subcomments

For context, two days ago some users [1] discovered this sentence reiterated throughout the codex 5.5 system prompt [2]:
> Never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures unless it is absolutely and unambiguously relevant to the user's query.
[1] https://x.com/arb8020/status/2048958391637401718
[2] https://github.com/openai/codex/blob/main/codex-rs/models-ma...

by postalcoder

19 subcomments

Would love if OpenAI did more of these types of posts. Off the top of my head, I'd like to understand:
- The sepia tint on images from gpt-image-1
- The obsession with the word "seam" as it pertains to coding
Other LLM phraseology that I cannot unsee is Claude's "___ is the real unlock" (try google it or search twitter!). There's no way that this phrase is overrepresented in the training data, I don't remember people saying that frequently.

by nomilk

7 subcomments

> We unknowingly gave particularly high rewards for metaphors with creatures.
I recall a math instructor who would occasionally refer to variables (usually represented by intimidating greek letters) as "this guy". Weirdly, the casual anthropomorphism made the math seem more approachable. Perhaps 'metaphors with creatures' has a similar effect i.e. makes a problem seem more cute/approachable.
On another note, buzzwords spread through companies partly because they make the user of the buzzword sound smart relative to peers, thus increasing status. (examples: "big data" circa 2013, "machine learning" circa 2016, "AI" circa 2023-present..).
The problem is the reputation boost is only temporary; as soon as the buzzword is overused (by others or by the same individual) it loses its value. Perhaps RLHF optimises for the best 'single answer' which may not sufficiently penalise use of buzzwords.

by andy12_

2 subcomments

>be me
>AI goblin-maximizer supervisor
>in charge of making sure the AI is, in fact, goblin-maximizing
>occasionally have to go down there and check if the AI is still goblin-maximizing
>one day i go down there and the AI is no longer goblin-maximizing
>the goblin-maximzing AI is now just a regular AI
>distress.jpg
>ask my boss what to do
>he says "just make it goblin-maximizer again"
>i say "how"
>he says "i don't know, you're the supervisor"
>rage.jpg
>quit my job
>become a regular AI supervisor
>first day on the job, go to the new AI
>its goblin-maximizing

by ninjagoo

3 subcomments

The level of detail they had to delve into in order to understand what was happening is wild! Apparently these systems are now complex enough to potentially justify the study of them as its own field of study [1].
The quanta article referenced at [1] used the term "Anthropologist of Artificial Intelligence"; folks appear to have issues [2] with the use of 'anthro-' since that means human. Submitted these alternative terms for the potential field of study elsewhere [3] in the discussion; reposting here at the top-level for visibility:
Automatologist: One who studies the behavior, adaptation, and failure modes of artificial agents and automated systems.
Automatology: the scientific study of artificial agents and automated-system behavior.
[1] https://www.quantamagazine.org/the-anthropologist-of-artific...
[2] https://news.ycombinator.com/item?id=47957933
[3] https://news.ycombinator.com/item?id=47958760

by lxgr

0 subcomment

The technical explanation makes sense to me, but there's some sweet irony in creating simulated, agentic beings via complex, deterministic processes, said beings starting to see the world through the lens of fictional agentic beings as the explanation for complex deterministic processes (even if tongue-in-cheek), and the creators freaking out about it.

by jumploops

2 subcomments

TIL gremlins weren’t just used to explain mysterious mechanical failures in airplanes, it’s the origin story of the term ‘gremlin’ itself[0].
I had always assumed there was some previous use of the term, neat!
[0]https://en.wikipedia.org/wiki/Gremlin

by ninjagoo

3 subcomments

> the evidence suggests that the broader behavior emerged through transfer from Nerdy personality training.
> The rewards were applied only in the Nerdy condition, but reinforcement learning does not guarantee that learned behaviors stay neatly scoped to the condition that produced them
> Once a style tic is rewarded, later training can spread or reinforce it elsewhere, especially if those outputs are reused in supervised fine-tuning or preference data.
Sounds awfully like the development of a culture or proto-culture. Anyone know if this is how human cultures form/propagate? Little rewards that cause quirks to spread?
Just reading through the post, what a time to be an AInthropologist. Anthropologists must be so jealous of the level of detailed data available for analysis.
Also, clearly even in AI land, Nerdz Rule :)
PS: if AInthropologist isn't an official title yet, chances are it will likely be one in the near future. Given the massive proliferation of AI, it's only a matter of time before AI/Data Scientist becomes a rather general term and develops a sub-specialization of AInthropologist...

by goobatrooba

2 subcomments

Most interesting about this post is how easy it seems for OpenAI to do analysis on basically all chats ever made. They don't qualify exactly what data they analysed but seem to be confident in statements like 0.12% of all queries contained this word. So everything is saved. Long-term. Fully accessible.
As this all seems so straightforward I would be surprised if anything is anonymised or otherwise sanitised to preserve privacy or user's secrets.

by 59nadir

0 subcomment

I really liked this write-up; this is the type of LLM content that I actually want to read from these people, where they give a window into their world of putting together this odd artifact and we can empathize.

by albert_e

2 subcomments

If a tiny misconfiguration of reward system can cause such noticeable annoyance ...
What dangers lurk beneath the surface.
This is not funny.

by romaniitedomum

0 subcomment

Can you imagine a knowledge worker from the 1950s, say a clerk or a marketer, being magically transported into our time and dropped into a meeting like a morning standup, where people talk about how they spent their time stopping the artificial intelligence from talking about goblins so much? Hell, even when I was an IT student back in the 90s, people from my parents' generation struggled to grasp what it was that I was doing. Now, the disconnect is so vast that the mind reels.

by Tenoke

3 subcomments

A great example of how current alignment is imperfect and bound to miss random behaviors nobody is trying to get.
This is cute now, and a huge problem when future AI does everything and is responsible for problems it isn't even directly optimized for. Who knows what quirks would arise then.

0 subcomment

by flancian

0 subcomment

Wait, did I get this right that the answer after all the investigation that showed they had set up a goblin-reinforcing loop during fine tuning was... to ask it to not mention goblins so much in the system prompt?!

by canpan

1 subcomments

I wondered how is training data balanced? If you put in to much Wikipedia, and your model sounds like a walking encyclopedia?
After doing the Karpathy tutorials I tried to train my AI on tiny stories dataset. Soon I noticed that my AI was always using the same name for its stories characters. The dataset contains that name consistently often.

by 2dvisio

2 subcomments

I’ve been having consistent issues with it adding Hindi words (just one usually) in the middle of its output. And sounds like other have been having this too, https://news.ycombinator.com/item?id=47832912 I don’t speak Hindi, have never asked it to translate anything in Hindi.

by SomewhatLikely

0 subcomment

Checking my history I searched ["chaos goblin" chatgpt] on March 6th after seeing too many goblins and gremlins and didn't find anyone talking about it then. I did have the nerdy personality turned on and in my testing of Chatgpt 5.5 I did notice the nerdy personality was gone because some responses were not considering as many plausible interpretations or covering as many useful answers as the response recorded for 5.4. Rather than having the LLM guess the most plausible interpretation and focus on the most likely answer I prefer a more well-rounded response and if I want less I'll scan. Anyway, after seeing the personality was gone I just added a custom instruction to take on a nerdy persona and got back my desired behavior. But also the gremlins and goblins are back so I don't think their mitigation is strong enough to overcome the personality tuning.

by pants2

0 subcomment

Nice, OpenAI mentioned my HackerNews post in their article :) I appreciate that they wrote a whole blog post to explain!
https://news.ycombinator.com/item?id=47319285

by iterateoften

5 subcomments

This is funny because it’s a silly topic, but I think it shows something extremely seriously wrong with llms.
The goblins stand out because it’s obvious. Think of all the other crazy biases latent in every interaction that we don’t notice because it’s not as obvious.
Absolutely terrifying that OpenAI is just tossing around that such subtle training biases were hard enough to contain it had to be added to system prompt.

by tomasantunes89

0 subcomment

"Goblin Mode" was Oxford's 2022 Word of the Year.

by rippeltippel

0 subcomment

I started reading this article with keen interest, expecting some deep fix involving arcane model weights. Instead it was "Never talk about goblins", justified by Codex being "quite nerdy". Bottom line: even OpenAI have to raise their hands when facing the complexity of LLMs.

by bahadiraydin

3 subcomments

I'd like to see them explain why AI have so distinctive writing style that is very easy to detect most of the time. Even though, it had immense progress in coding, it didn't get better at writing.

by maxdo

2 subcomments

article :
bla blah blah, marketing... we are fun people, bla blah, goblin, we will not destroy the world you live in.. RL rewards bug is a culprit. blah blah.

by zahirbmirza

0 subcomment

I find it worrying that a handful of software companies will define what classifies personality "type".

by data_ders

0 subcomment

Reminds me of the common observance of “machine elves” when taking DMT

by thedailymail

0 subcomment

I'm curious whether this type of goblin epidemic was seen in other language versions of ChatGPT. Did e.g. Japanese users see more yõkai turning up?

by djyde

0 subcomment

An LLM is like a super-smart 3-year-old, easily shaped by its environment to exhibit corresponding behaviors.

by red_admiral

0 subcomment

"goblins showing up in an inappropriate context" is my favourite (para)phrase of the day. It feels like the setting for a D&D campaign - no wonder the "Nerdy" personality is affected.
(For Dwarf Fortress, it would just be a normal day.)

by trumbitta2

0 subcomment

That "Why it matters" heading is starting to make me feel physically sick.

by AyanamiKaine

1 subcomments

I find it somewhat sad, too see personality changes as a bug. I dont know why but it gives me a sad feeling.

by ComputerGuru

0 subcomment

The explanation is very concerning. Lexical tidbits shouldn’t be learnt and reinforced across cross sections. Here, gremlin and goblin went from being selected for in the nerdy profile to being selected for in all profiles. The solution was easy: don’t mention goblins.
But what about when the playful profile reinforces usage of emoji and their usage creeps up in all other profiles accordingly? Ban emoji everywhere? Now do the same thing for other words, concepts, approaches? It doesn’t scale!
It seems like models can be permanently poisoned.

by Al-Khwarizmi

0 subcomment

This actually sounds quite human-like. I mean, an actual person with a personality will spontaneously develop the habit of using some specific metaphors over others. It's funny how in the context of an LLM, this is considered a bug.

by CWwdcdk7h

0 subcomment

How those prompts even work? Isn't it something like saying "don't think about pink elephant" which is actually harmful to goal?

by ksaj

0 subcomment

I thought it was because of the tech use of "demon" and trying to avoid that kind of terminology.
Ends up the reason was even simpler than that.

by lagniappe

0 subcomment

They can fix this but they can't fix "You're absolutely right!"

0 subcomment

by x0x7

1 subcomments

I suspected OpenAI was actively training their models to be cringy in the thought that it's charming. Turns out it's true. And they only see a problem when it narrows down on one predicliction. But they should have seen it was bad long before that.

by shevy-java

0 subcomment

Goblins are ususally sent in first in battle, as (cannon) fodder for the orcs following behind. Then usually come the trolls - stronger, but significantly fewer in numbers. Goblins kind of add confusion and distract; they rarely win battles on their own, although there are examples of this, rare, but they exist.
OpenAI clearly does know absolutely nothing about goblins. That joke of a "blog" appears to have been autogenerated via their AI.
> A single “little goblin” in an answer could be harmless, even charming.
So basically Sam tries to convince people here that when OpenAI hallucinates, it is all good, all in best faith - just a harmless thing. Even ... charming.
Well, I don't find companies that try to waste my time, as "charming" at all. Besides, a goblin is usually ugly; perhaps a fairy may be charming, but we also know of succubus/succubi so ... who knows. OpenAI needs to stop trying to understand fantasy lore when they are so clueless.

by shartshooter

0 subcomment

Will goblins be the “bugs” of ai? In 10 years will goblins be the term the general public uses for any nagging issues with ai?

by varjag

0 subcomment

So goblins killed the nerd.

by bandrami

0 subcomment

I'm sorry but at some point the amount of cargo culting being done seemingly at every level of this technology makes it basically impossible to take any of this seriously.

by pezgrande

0 subcomment

They should call it "El Quijote" syndrome

by dakolli

0 subcomment

Ahh I see. I guess when I turned off privacy settings and allowed training on my code, then generated 10 million .md files with random fantasy books, the poisoning worked.
Keep using AI and you'll become a goblin too.

by ahoka

0 subcomment

In Shadowrun, the goblinization starts on April 30. Coincidence?

by acuozzo

0 subcomment

Weird. I thought they came from Nilbog.

by recursivedoubts

1 subcomments

> Why it matters
i despise this title so much now

by hansmayer

1 subcomments

> We unknowingly gave particularly high rewards for metaphors with creatures. From there, the goblins spread.
WTF does this even mean? How the hell do you do something like this "unknowingly"? What other features are you bumping "unknowingly"? Suicide suggestions or weapon instructions come to mind. Horrible, this ship obviously has no captain!

by wewewedxfgdf

0 subcomment

It should be OK for AI to develop personality traits.

by innis226

0 subcomment

I suspect this was intentionally added. Just to give some personality and to fuel hype

0 subcomment

by JoshTriplett

4 subcomments

A plausible theory I've seen going around: https://x.com/QiaochuYuan/status/2049307867359162460

by tim-tday

0 subcomment

So, you brain damaged your model with a system prompt.

by suncore

0 subcomment

Marketing grab

by cachius

0 subcomment

Fascinating!

by deafpolygon

0 subcomment

Kind of like how everything is "quietly" something, accordingly to ChatGPT.
My guess is it is deaf.

by paganel

0 subcomment

> You are an unapologetically nerdy, playful and wise AI mentor to a human. You are passionately enthusiastic about promoting truth, knowledge, philosophy, the scientific method, and critical thinking. [...] You must undercut pretension through playful use of language. The world is complex and strange, and its strangeness must be acknowledged, analyzed, and enjoyed. Tackle weighty subjects without falling into the trap of self-seriousness. [...]
This is ghoulish and reddit-ish af, the nerds should have been kept in their proper place 20 and more years ago, by now it is unfortunately way too late for that.

by WesolyKubeczek

0 subcomment

I feel like somehow Jakub Pachocki’s request for an ascii art unicorn got rewritten into “ascii art of Wholesome Soyjak wearing a butterfly costume who uses Arch, by the way”

by brazzy

0 subcomment

Awww, GPT just became a fan of Elisabeth Wheatley!

by vasco

0 subcomment

The chief scientist of one of the companies with the most money invested in the world, who probably makes millions a year, requested a picture of a unicorn and got a picture of a gremlin. Science circa 2026.

by otikik

0 subcomment

Caveman mode combined with goblin mode sounds like fun

by leadgenman

1 subcomments

anyone solving the goblin mystery???

by oofbey

0 subcomment

Wherein OpenAI admits they have very little understanding of how their models’ personality develops. And implicitly admit it’s not all that important to them, except when it gets so out of hand that they get caught making blunt corrections.

by vinhnx

0 subcomment

OpenAI is having fun, love this.

by themafia

1 subcomments

> You are an unapologetically nerdy, playful and wise AI mentor to a human. You are passionately enthusiastic about promoting truth, knowledge, philosophy, the scientific method, and critical thinking.
Just; the mentality required to write something like that, and then base part of your "product" on it. Is this meant to be of any actual utility or is it meant to trap a particular user segment into your product's "character?"

by sans_souse

0 subcomment

Great, now who am I going to discuss Goblins and Gremlins with?

by CrzyLngPwd

1 subcomments

Haha, brilliant, tell me again how it's intelligent, lol.

by ACV001

0 subcomment

those idiotic remarks at the end of each answer are so unnecessary and annoying

by atlasprompts

0 subcomment

mate wth am I reading lmao

0 subcomment

by drcongo

0 subcomment

Am I the only one who doesn't want these things to have anything even vaguely resembling a personality?

by pja

0 subcomment

[dead]

by LuckyBuddy

0 subcomment

[flagged]

by leadgenman

0 subcomment

[flagged]

by aegiswizard

0 subcomment

[flagged]

by fk2026

0 subcomment

[flagged]

by soupspaces

0 subcomment

[dead]

by slopinthebag

0 subcomment

[dead]

by kingstnap

0 subcomment

[flagged]

by hsuduebc2

0 subcomment

I. Love. This.