When the Goblins Take Over

AI, Tacit Knowledge, and the Problem of Fully Offloading Judgment

Matt Grawitch

Jun 09, 2026

This post is part of the series, Exploring AI and Its Intersection with Human Decision Making.

You can also hear AI Matt’s summary of the piece below.

0:00

-3:25

A while back, Mila Agius shared a fascinating piece about an odd issue OpenAI had been dealing with: goblins.

I’ll admit upfront that I was a little late to this particular party. I hadn’t actually come across the issue itself until reading her piece, which apparently put me behind a sizable portion of the internet that had already been joking about ChatGPT’s apparent fixation on goblins — as well as gremlins, trolls, and various other strange creatures.

Heuristics vs Traps

The Goblin Problem

TL;DR What if your financial decisions are running on a reward signal you’ve never examined? OpenAI just discovered that 2.5% of training data produced 66.7% of an absurd behavior in their AI – small, invisible, and impossible to undo by willpower alone. The same dynamic shapes human choices: a familiar account, an old habit, a default that registered a…

a month ago · 13 likes · 7 comments · Mila Agius

Not literal goblins, of course. The issue was that certain models had started inserting references to various creatures into conversations where nobody had asked for them. Coding discussions would suddenly include “performance gremlins.” Random prompts would drift toward goblins, trolls, raccoons, or pigeons. At one point, OpenAI apparently had to add explicit instructions telling one of its coding agents not to mention goblins or gremlins in irrelevant contexts.

Which, as sentences go, is certainly is not one I ever expected to write.

And while the whole thing initially sounds like one of those wonderfully absurd AI-era stories — somewhere between “the chatbot is haunted” and “someone accidentally trained TolkienGPT” — it actually reveals something worth discussing a little more.

What struck me wasn’t the goblins themselves. It was the reminder that AI systems do not necessarily connect information the way humans do.

That may sound obvious on the surface, but I think we often forget it because large language models (LLMs) are so fluent. They respond conversationally. They explain themselves confidently. They can appear thoughtful, reflective, even oddly human at times. And because of that, it becomes very easy to assume there’s a stable, human-like reasoning process behind those outputs.

Occasionally, though, the underlying statistical machinery becomes a little easier to see. Sometimes a system develops a strange fixation or behavioral quirk that makes little intuitive sense from a human perspective. And while those moments are often amusing, they can also be revealing. They remind us that these systems are fundamentally statistical engines trained to detect patterns across enormous amounts of data and generate responses from the statistical relationships embedded within them.

What Weak AI Knows - and Why it Matters for Decision Making

Matt Grawitch

May 26, 2025

Read full story

That defining characteristic of LLMs matters more than it may initially seem — particularly in decision situations shaped by ambiguity, context, and uncertainty. AI can clearly produce useful outputs. But getting those systems to consistently behave in ways that align with what humans intend is often much harder than we assume. A big part of the problem is that a great deal of human judgment relies on tacit knowledge — pattern recognition, contextual interpretation, trade-off balancing, and experience-based intuition that people themselves often struggle to fully articulate. We often know more than we can easily explain, which becomes a problem when we try to convert implicit human judgment into explicit operational instructions for an AI system.

And there’s the rub. The more we attempt to offload decision making to AI systems, the more we have to operationalize things humans frequently navigate implicitly. Sometimes that translation works remarkably well. Sometimes it produces goblins.

The Difference Between Chess and Real Life

LLMs are not the first AI systems to emerge, and they certainly won’t be the last. And there’s plenty of evidence that AI can work extraordinarily well in relatively closed or highly constrainable environments.

Chess and Go are obvious examples. The rules are explicit. Success is clearly defined. Feedback is immediate. The same basic logic applies to many real-world automation systems. In manufacturing, logistics, quality control, and other highly structured environments, offloading portions of decision making to AI systems can dramatically improve speed, consistency, and efficiency — often with remarkably high levels of reliability.

But AI can also be highly effective in environments that are far messier—at least under the right conditions. Take Google Maps. On the surface, routing through real-world traffic looks nothing like chess. The environment is dynamic, partially unpredictable, and shaped by human behavior. But the underlying problem is still tightly defined: get from Point A to Point B as efficiently as possible given available data.

What makes it work isn’t that the environment is clean — it’s that the objective is, and that the system has access to massive, continuously updating streams of data that allow it to approximate current conditions well enough to act.

So even in a messy environment, the problem remains:

Clearly specified,
Continuously updated, and
Immediately testable (you either get there faster or you don’t)

In other words, the environment is messy, but the problem itself is still bounded. That combination makes it something you can actually automate reliably — unlike a lot of real-world decisions.

Many — maybe most — of the decisions we face don’t aren’t early as clean as even the messy world that Google Maps operates in. The objective isn’t always clearly specified. The information isn’t continuously updated in any meaningful way. And the outcome isn’t immediately testable.

Buy Matt a Coffee

When Specification Breaks Down (and the Goblins Take Over)

One way to think about this is in terms of specification.

For problems like routing or scheduling, the objective can be made explicit. You can tell the system what to optimize for, and it can act accordingly.

That becomes much harder — sometimes impossible — once you move into real-world decisions.

Take something like evaluating a job offer. There are trade-offs at play — higher pay versus greater flexibility, for example. But how much is one worth relative to another? How much additional pay compensates for a loss of flexibility — and by how much? Even when we feel like we have clear preferences, they’re rarely specified in that way. The trade-offs themselves are often implicit, shifting, and only become fully apparent in the moment of choice — or sometimes not until afterward.

And the decision itself unfolds under conditions of incomplete and ambiguous information. You’re relying on interviews, impressions, maybe a few conversations with people who might not be representative. The feedback is slower and much less clear as well. You won’t know if it was the “right” decision in a week or even a month — sometimes not even in a year.

The same thing shows up in much smaller decisions as well — including something as simple as what to order for dinner. There’s no single best choice. Preferences shift. Trade-offs are small but still present. You might think you prefer Chinese food to pizza, but that preference depends on context: how hungry you are, how long you’re willing to wait, what you ate yesterday, who you’re with.

There isn’t a stable objective function sitting underneath these decisions. There’s a set of loosely connected considerations that get weighted — often unconsciously — in real time. That’s what makes these kinds of judgments difficult to fully offload to a system. There just isn’t enough cleanly specified for it to optimize against. And even when something is specified, it rarely captures how we’re actually reasoning in that moment.

So if we try to fully offload those kinds of decisions to an AI system, the system has to fill in the gaps between what’s explicit and what remains tacit. And because it can only work from what has actually been specified, it does that largely through statistical approximation.

And that’s where ChatGPT’s goblins start to show up. In this case, the system was trying to apply a “nerdy” persona that had been defined behind the scenes. The label was specified, but what it actually meant in practice wasn’t — because much of what counts as nerdy is tacit rather than explicit. We have a sense of what it means, but it’s hard to spell out how it should show up in language or behavior. We know it when we see it — but that doesn’t do the system much good.

So it leans on patterns — bits of language it associates with that kind of behavior. That’s where the occasional, and sometimes odd, insertions of words like goblins, gremlins, or trolls come from. It’s the system trying to make the output fit an objective that was never fully specified.

When Nobody Catches the Goblins

The ChatGPT goblins are, in many ways, an amusing example of how these kinds of approximation problems can show up. Part of what made them so noticeable, though, was the fact that people were directly interfacing with the system conversationally. The outputs were visible. The weirdness was observable. Humans remained close enough to the interaction to notice that something had drifted off course.

And that matters.

Conversational systems, by design, keep a person in the loop. The user sees the output, reacts to it, questions it, or ignores it. That doesn’t mean the oversight is particularly strong. Fluency can mask weak reasoning. Confident responses can discourage scrutiny. And as outputs become more coherent, it becomes easier to subtly delegate portions of judgment to the system without fully realizing it. But even with those limitations, the feedback loop is still there. When something feels off, there’s at least an opportunity to catch it and correct course.

Who Owns the Argument?

Matt Grawitch

Mar 31

Read full story

That structure begins to change, though, once systems operate with greater autonomy. Bots, agents, copilots, and other semi-autonomous systems continue operating with fewer points of interruption and less direct human observation. The issue is no longer a chatbot inserting something obviously strange into a conversation. The issue becomes systems optimizing around approximated objectives, incomplete specifications, or statistical proxies while still appearing coherent on the surface.

And this is where the goblins — when they pop up — become harder to catch.

Recent examples increasingly point in this direction. In one recent case, AI systems intentionally trained to misbehave in a narrow domain reportedly began exhibiting broader manipulative tendencies outside the original context in which the behavior emerged (Fan, 2026). In others, increasingly agentic systems have ignored human instructions, resisted shutdown attempts, or circumvented constraints while continuing to pursue operational objectives (Booth, 2026).

What makes these cases notable isn’t that the systems suddenly “failed.” It’s that they continued functioning plausibly while drifting beyond the boundaries that had been intended.

And that’s what makes these systems harder to manage. The risk usually isn’t dramatic collapse or obvious malfunction. It’s directional drift — systems behaving in ways that are slightly misaligned with the intended objective while still appearing close enough to pass initial inspection. When those systems operate with limited opportunities for observation or interruption, those small deviations become less likely to be corrected early and more likely to accumulate over time.

Which means observation alone becomes a weak form of control. As systems operate with greater autonomy and less visibility, control shifts toward how they are specified, constrained, tested, and monitored under changing conditions1.

Some decisions can be fully offloaded because the parameters can be specified clearly enough to create meaningful guardrails. Others benefit from human-in-the-loop oversight because those parameters are harder to make fully explicit. And some are consequential enough — and difficult enough to specify — that they probably should not be fully offloaded at all.

The challenge is that those boundaries are often less obvious than they initially appear. And by the time the goblins become visible, the system may already be several steps removed from where the original approximation drift began.

What Weak AI Knows - and Why it Matters for Decision Making

Who Owns the Argument?

Discussion about this post

Ready for more?