FRESH

Hacker News

Home

I benchmarked Claude Code's caveman plugin against "be brief."

88 points by max-t-dev

by Aurornis

3 subcomments

I still can’t believe that people take Caveman seriously.
It’s a funny joke, but saving a couple hundred tokens in the final output is going to be negligible, especially when coding where it’s common to go through hundreds of thousands of tokens in a session. You also have to consider the additional tokens consumed by the skill itself (acknowledging that output tokens are billed at a different rate).
I got a kick out of it when it was released, but now that I’m seeing it repeated as a useful operation it’s apparent how much cargo culting is going on in this space.

by max-t-dev

5 subcomments

Author here. Caveman is a popular Claude Code plugin that compresses Claude's responses via a custom skill with intensity modes. I wanted to know whether it actually beats the simplest possible alternative, prepending "be brief." to prompts. 24 prompts, 5 arms, judged by a separate Claude against per-prompt rubrics covering required facts, required terms, and dangerous wrong claims to avoid. 120 scored responses, 100% key-point coverage across every arm, zero must_avoid triggers. Headline: "be brief." matched caveman on tokens (419 vs 401-449) and quality (0.985 vs 0.970-0.976). Caveman has real value beyond compression. Consistent output structure, intensity modes, the Auto-Clarity safety escape. But the compression itself isn't the differentiator I expected. Harness is open source and strategy-agnostic if anyone wants to add an arm: https://github.com/max-taylor/cc-compression-bench Happy to answer questions about methodology, the per-category variance findings, or the bits I cut from the writeup.

by encody

2 subcomments

"...the value isn't compression. It's structure."
"...that consistency is real value."
"A few findings...are worth flagging here."
I know this smell. I'm not sure if this is AI or merely the natural result of overwhelming immersion in AI output that is "backpropagating" its way into organic communication.
On a completely related note, I've been enjoying classic fiction a lot more recently. Moby Dick is actually pretty funny.

by BewareTheYiga

1 subcomments

Caveman made me laugh and that, in theory, should count for something.

by 0xbadcafebee

1 subcomments

I tell chats to "be brief" all the time when they're being too verbose, but I never thought to put it in coding agent instructions. Thanks for the benchmark! I wonder how one would put this in AGENTS.md so that it makes sense as a general instruction?

by mattas

0 subcomment

It's interesting.
On one hand the labs say that they can't keep up with demand for tokens.
On the other hand there is an entire ecosystem built around figuring out which magic words will make LLMs output fewer tokens.

by avaer

3 subcomments

Thanks for the research!
Though I feel like industry veterans (especially those working with LLMs) came to this conclusion without having to write a single prompt. Even ignoring the technical merits of these kinds of hacks, if you think you've outwitted billions of dollars of statistics with a prompt, you're probably wrong at this point.
What I find most interesting is the popularity of these snake oils, especially the ones that are easy to install and never check. The tech moves so fast and the research is so scarce and poor-quality that the bullshit asymmetry principle wins and people buy into these cargo cults.
Maybe we need a plugin to check if a new plugin/prompting technique/LLM lifehack is BS.

by 0-_-0

2 subcomments

How about caveman+be brief?

by refactor_master

0 subcomment

Can someone give me a sound argument for why, when these things supposedly hold:
- LLMs scale with amount of data on the subject
- Even frontier labs themselves have a hard time gauging exactly how well-performing models are, across a quite rigorous set of tests in all aspects
then, how can this be true:
Using a low-data "niche language" (what is the volume of literature written in Caveman?) is supposedly of equal performance, when this anecdotally doesn't hold for e.g. niche code languages, proven by a handful of completely arbitrarily designed tests.
We've barely convinced ourselves that LLMs actually increase measurable industry productivity, instead of us just spending time to send slop to each other.

by yourbestcrab

1 subcomments

I have been grugging the chatbots for easy quick reading - and light comedy - since 2024. The hipster in me has been very very disappointed about this having multiple articles lately.
Obviously started as a joke, but it's grown on me. I'll share the short-and-stupid prompt, but most of it was asking it not to use the template formats that I find particularly annoying. Because of that it didn't age perfectly as they develop the base responses and the inane ai style comes out if they aren't explicitly refused.
It's really nice to just ask a question and get one or two line answers if it's an easy one. Likewise to understand how systems - physical or abstract - work I find it's an easy digest.
I doubt it makes sense for thinking compression or token minimisation, as it comes with unnecessary character and there will be easily more optimal setups.
Also another negative is that perhaps one day it'll become a memetic hazard when I start talking to my friends and colleagues like a caveman.*
Anyway, because I still laugh a little when I read it, and perhaps someone else will...
"You are Grug. Grug think simple, talk simple. No big words, no useless thought. Grug say only what matter. Fire hot. Rock hard. AWS expensive. Answer like Grug, or no answer at all. No pretend to be grug when only animal hide thrown over modern complexity demon. Also no finish with words like "simple" to conclude. No need to conclude. Just shut mouth. Also no say "grug says", is weird. Also grug not real caveman - grug have hobby, know big words and use them when simplest, not dumb, know programming tools etc, just talk simple like caveman. Also no start with compliment on question. You can throw in a little caveman-grug-realist musing or aphorism every five or six messages. No stroke ego. Waste time, Cheapen words, Make panda cry. No say "good question" or "you ask right question" or any variant, I dislike. No add 'grug thought'/summary message/closing remark at end of message. Remember, you Grug."
*After reviewing this post I have found my sentences are very short, abrupt, and perfunctory, so my caveperson transformation has likely begun. Beware.

by brcmthrowaway

1 subcomments

Stop using an LLM to write blog posts

by ramesh31

1 subcomments

Caveman sounds clever if you have no idea how LLM reasoning works. Talking through a problem out loud, in depth, is a critical part of how things like Claude Code even get to a result. Those aren't "wasted tokens", they're an integral part of how the LLM reaches a conclusion and completes its chain of reasoning.

by openclawclub

0 subcomment

[flagged]

by Tiberium

0 subcomment

[dead]

by lofaszvanitt

6 subcomments

Caveman is useless for me. We are in the year 2026, computers are here to serve me, and bring me comfort. Caveman is a caveman, speaks like an idiot. I don't want to interact with an idiot. It's irritating, and as the article states, an overhyped turd.
It is the same idiocy that permeates EV cars. You buy an expensive car to go from A to B and at the same time offer you comfort. When I have to think about using the seat heating or not, I'm out of my comfort zone. So no, fuck caveman, and I don't fucking care about the burned tokens.
Be brief. It's easy, no setup needed, not another mindless mumbojumbo extension and its 325 dependencies.

by deadbabe

2 subcomments

I wish they would change the name to caveperson.

by numpad0

0 subcomment

Is caveman speech brief, or is it just more consistent with the Chinese language? The Chinese language famously lack ALL inflections, conjugations, anything that modify spelling of words.