The Judgment Problem Behind AI

Matt Grawitch

May 26

Why the challenge isn’t using AI, but deciding what to do with what it gives you

Read →

12 Comments

Gail Brown

May 26

MANY Thanks Matt & Ruv - so glad I follow you BOTH! 👍

I’m looking at these concepts for students / children - who (from memory?) have similar misconceptions about their own knowledge…

So - you’ve introduced 2 levels - rather than one for me - as I thought that having prior knowledge of a topic might help younger students - that was my first level (and starting point) for designing some instruction? WAS?

What you introduced was whether learners understand their own personal “confidence level” - and I think many younger (& maybe older) people /students have greater confidence than their actual knowledge?

Hope this makes sense? Thanks again! 👍❤️❤️

Reply (2)

Matt Grawitch

May 27

I think that absolutely makes sense. One of the things our study really points toward is that there are actually multiple judgments happening at once.

One level is the learner trying to understand the material itself. But another level is the learner trying to judge whether they understand the material well enough to trust their own conclusions.

And those don’t always line up very well. People can feel highly confident while missing important gaps in understanding — which is something we see not only with AI use, but with learning and decision making more broadly.

Prior knowledge still matters because it gives people a better frame of reference for evaluating new information. But confidence can sometimes operate independently of actual knowledge, especially in areas where people lack strong feedback about whether they’re right or wrong.

So I think your instinct about instructional design is still correct — prior knowledge helps. The complication is that learners also need calibration: some ability to judge when their understanding is solid versus when it only feels solid.

And honestly, adults aren’t necessarily much better at this than kids sometimes.

Ruv Draba

May 27

Gail, I tripped over your comment here while chatting to @Matt Grawitch. I’m so glad to see you reading him. His work is a great fit for you.

My read as a practitioner in the corporate advice game is that we want better information to help us make better decisions, but it’s a staged walk-up to make sense of everything we get, to decide it’s good enough to act on, and then bring along the people we need as our supports.

So I think there are potentially four levels operating here:

What can I find out that relates to my concern? (Information)

How do I make sense of it? (Interpretation)

How reliable is it for what I mean to do? (Judgement)

Who will believe me, support me and why? (Respect)

If you want to move a group — or even just be respected for your own decisions, then all four operate.

Currently misleading is what I’ve lately called the ‘hero operator’ narrative from AI companies: one hero with an AI can be instantly well-informed with the smartest, most current interpretation, and swing a whole group with their fluent, autogenerated slide decks. That isn’t what happens in practice. My latest article illustrates why.

For a child, I think the ‘AI as authority’ narrative will clang early because their friends will get different, confident interpretations from the AI using very similar prompts. That’s distinct from Google Search which gives you similar information customised by your search history, that you then pick from and interpret yourself. (Though now Google is giving readers AI-interpretations too, and pitching them as summaries.)

I have no idea the right age at which to teach how to navigate these layers, but I think that awareness emerges quite young, that teens are acutely conscious of it, and that it already pulls them in multiple directions. I don’t know how the next generation of teens will handle this when they already have social media pressure to navigate too, but it’s definitely one to face.

I can also attest that by the time I encounter workers in their twenties and older, they’re already struggling with it both individually and socially. The rise of online conspiracy theories suggests that we’re struggling with it institutionally too (Matt will have more on this.)

I hope that might help.

Xiaoqing Wang

Jun 3

I came across this piece right after finishing my book The Judgment Behind AI, and it immediately caught my attention.

What I find especially valuable here is the focus on calibration: the problem is not simply whether people use AI or ignore it, but how they decide whether an AI recommendation is worth trusting.

That is very close to the problem I was trying to name in the book.

The difference, I think, is one of framing.

This piece looks at the judgment problem through trust, confidence, recommendation use, and calibration.

My book looks at it through a broader “judgment structure” lens: AI does not remove human judgment. It exposes and amplifies the judgment structure already behind the user.

So when someone has clear judgment, AI can become leverage.

When someone has confused judgment, AI may only help them produce more confusion faster.

In that sense, I see this article as touching the same underlying problem from a research and decision-making angle, while my book approaches it from a practical, structural, and operational angle.

Really glad to have found this.

Reply (1)

Matt Grawitch

Jun 3

Thanks, Xiaoqing. I think there's a lot of overlap between what you're describing and what I was trying to get at here.

One thing that stood out in the data was that the AI itself wasn't really driving outcomes directly. People were filtering recommendations through their beliefs about the system and through their beliefs about their own judgment. The same recommendation could be accepted, rejected, or ignored depending on how those filters operated.

That's one reason I ended up focusing on calibration. The challenge isn't simply whether people trust AI. It's whether trust in the AI and trust in their own judgment are aligned with reality in a given situation.

Your point about AI exposing and amplifying existing judgment structures resonates with me. The way I might frame it is that AI doesn't remove the need for judgment. If anything, it creates additional judgment calls about when to rely on the system, when to question it, and when to question ourselves.

I suspect we're looking at much the same underlying phenomenon from different angles.

Reply (1)

Xiaoqing Wang

Jun 3

Thanks, Matt. I think that's a very helpful distinction.

What stood out to me in your article is that calibration itself may already be revealing something deeper about the person's judgment structure.

Two people can receive the same recommendation and have access to the same information, yet arrive at very different decisions because they interpret both the AI and themselves differently.

From that perspective, calibration may be one observable expression of a broader judgment system operating underneath.

I also agree with your point that AI creates additional judgment calls rather than eliminating them. In many ways, the question shifts from “What should I do?” to “When should I trust this recommendation, and when should I challenge it?”

Really enjoyed the article and the discussion. It gave me another way to think about the same underlying problem.

Ruv Draba

May 26

Glad to see this one come out in Substack form too, Matt. I have taken a business-operational approach to a related question, which should also be out this week.

Reply (1)

Matt Grawitch

May 26

Thanks, @Ruv Draba. I decided to make this a 1-2 sort of post set so that I could dive more deeply into some of the nuance. I look forward to your post coming out.

Reply (1)

Ruv Draba

May 27

This dovetailed so well onto your Psychology Today article that (from memory only) I initially thought it was a distillation.

It wasn’t: it was an extension, and a valuable one, Matt.

It feels like there’s a cavern here around information, authority and consensus. It turns up all the time in my work because I can help a client discover what’s robustly accurate, but that won’t be effective until they build consensus on it, and consensus is built institutionally.

Consensus-building work can be slower than the research and analysis, and often begins before I’ve even fully mapped the problem.

AI analysis won’t simplify that, but I predict a lot of burned fingers for the ‘hero operators’ who think it will.

Reply (1)

Matt Grawitch

May 27

If it’s of interest to you, we’re currently collecting a Study 3’s set of data. Same task, but a few more direct measurements of some of the specific decisions made along the way to the final set of recommendations. Pending the results there, I’m working on an iteration of the task that is a bit more real world.

Reply (1)

Ruv Draba

May 27

Matt I’ll gladly stay in the loop. I’m not qualified to critique your methodology, but would be glad to read preprints or finals. If you want to talk about real-world applications, I’m there for it too.

Reply (1)

Matt Grawitch

May 28

No, I was just letting you know that we've got more in the works. Looking forward to sharing what we learn there on Substack (and probably PT too).