FRESH

Hacker News

LLM-as-a-Courtroom

74 points by jmtulloss

by aryamanagraw

3 subcomments

We kept asking LLMs to rate things on 1-10 scales and getting inconsistent results. Turns out they're much better at arguing positions than assigning numbers— which makes sense given their training data. The courtroom structure (prosecution, defense, jury, judge) gave us adversarial checks we couldn't get from a single prompt. Curious if anyone has experimented with other domain-specific frameworks to scaffold LLM reasoning.

by test6554

2 subcomments

Defence attourney: "Judge, I object"
Judge: "On what grounds?"
Defence attourney: "On whichever grounds you find most compelling"
Judge: "I have sustained your objection based on speculation..."

by pu_pe

1 subcomments

Every time I see some complex orchestration like this, I feel that the authors should have compared it to simpler alternatives. One of the metrics they use is that human review suggests the system is right 83% of the time. How much performance would they achieve by just having a reasoning "judge" decide without all the other procedure?

by nader24

0 subcomment

This is a fascinating architecture, but I’m wondering about the cost and latency profile per PR. Running a Prosecutor, Defense, 5 Jurors, and a Judge for every merged PR seems like a massive token overhead compared to a standard RAG check.

by jpollock

0 subcomment

Is the llm an expensive way to solve this? Would a more predictive model type be better? Then the llm summarizes the PR and the model predicts the likelihood of needing to update the doc?
Does using a llm help avoid the cost of training a more specific model?

by unixhero

0 subcomment

Excuse my ignorance: Is this not exactly what you can ask Chatgpt to assist with.

by emsign

3 subcomments

An LLM does not understand what "user harm" is. This doesn't work.