FRESH

Hacker News

Home

Codex-maxxing

96 points by dnw

by skiing_crawling

7 subcomments

Is this LLM psychosis? So much tending and conversing with the matmuls but what was the outcome? Are people who get this into it more successful somehow? It reminds me of people who take drugs and get "revelations" but then are not particularly over represented in the group of successful people for all of their deep insights.

by devinprater

3 subcomments

I'm trying to get out of this. I'm a blind person. No one in tech goes to bed thinking about us, as it were. So, as a non-programmer, vibe coding accessibility fixes was an outlet to the daily million papercuts of using operating systems built by people who cannot understand me.
Well, I have barely anything to show for months of this. I made Termux more accessible on Android, made an MUD client for Emacs, fixed up some Emacspeak stuff because it's been abandonned going on 3 years now, and Emacs packages wait for no one, and tried added Grade 2 Braille entry support to BRLTTY. That failed because depression sucks and who would even use this vibe coded junk anyway.
The more open nature of Android made it rather easier. How far behind in features TalkBack is compared to VoiceOver, besides AI image description, made it feel like trying to heal a broken arm with pain pills. So I'm trying to tell myself that I can't fix everything, and that it's not my fault if other people, and companies, choose to not consider accessibility. I mean I can't help Google if they choose to not be helped.
Ah well, Global Accessibility Awareness Day is this Thursday. Maybe Apple will finally announce LLM image descriptions, and hopefully my iPhone 16 will be good enough for them because I can't afford to upgrade in this economy.

by 4k0hz

1 subcomments

The author of this post works at OpenAI on the Codex team.

by grebc

2 subcomments

All the AI stuff lately is just like Unix Porn reddit but posted to places where the people don’t care about it.

by isodev

2 subcomments

Anything with “maxxing” in the name is most likely not good for you

by lionkor

1 subcomments

I hope I never have to work with people like this. Actual nightmare fuel to live your entire life through LLMs. "I trained Claude to love my wife so I can focus on prompting" vibes.

by bilekas

0 subcomment

> Thariq has a very good post about preferring HTML over Markdown as an output format. I think that instinct is right
I bet you do, working at OpenAI you get paid for more token use.

by mohsen1

2 subcomments

in tsz (https://tsz.dev) I am Codex-Maxxing with this:
Give each Codex an AgentName and ask them to mark their PR/issue/comments with those. Have one or two "managers" that manage PRs and overall project direction. I write the project directions and make long lasting issues. Each Codex session has an almost unachievable `/goal` but they are asked to achieve the goal by landing changes in `main` via PRs
I am running about 14 Codex sessions on 4 machines right now for about two weeks since OpenAI 10x'ed my 20x account and I simply can not run out of tokens fast enough.
Side note: I have multiple Claude accounts too but the new Claude Code `/goal` command is seriously broken. It waits long pauses between iterations and sometimes prematurely stops.

by andai

0 subcomment

Hey they added heartbeats? Wasn't that the killer feature OpenClaw had? I haven't used it in a while, but imagine they'd be pretty similar now.
Main difference would be just in how they're used (general purpose assistant vs "coding assistant") but the actual capabilities seem to be identical.

by manuisin

0 subcomment

Codex lags when chats become too long. Barely takes a day before loading certain chats freezes the UI and causes all sorts of issues

by parf02

4 subcomments

Most people I know underutilize voice mode. Such a game changer for making brain dumps the LLM can just gobble up

by bilekas

0 subcomment

> Last week I tried to migrate the Python Rich library into Rust. Because the original project already had a large unit test suite, I could set a goal like: migrate Rich into Rust, but it must pass all the unit tests from the original library.
At what point do we stop calling this development ? It's nothing even close to the process of development or engineering. "I tried to migrate X". No you didn't, you tried to ask an LLM and hoped for the best.
I mean, honestly at what point would you bother, there's no learning happening, there's no creativity happening, just talking to a literal text generator to request your refund while you go for a shower, novelty, maybe even convenient but absolutely not development.

by nubg

0 subcomment

> When I come back to Slack, replies are often already sitting in drafts.
He must be a pleasure to work with

by esperent

3 subcomments

> Every 30 minutes, check Slack and Gmail for unanswered messages that need my attention...
> When I come back to Slack, replies are often already sitting in drafts. I still decide what gets sent, but the expensive part of gathering context is done.
This just feels so dystopian to me. I hope that I never work with you or someone else doing this.
I personally do use LLMs for work messaging but I'm extremely careful to state clearly like "here's a draft for that quotation request that Claude wrote:" or something like that. I would never present that as my own words.

by ninininino

0 subcomment

"Every 30 minutes, check Slack and Gmail for unanswered messages that need my attention.
Help me prioritize what matters most.
If someone asks me a question, research the answer as deeply as you can and draft a reply for me, but do not send it."
This is a very dangerous road to go down. You may feel like you are getting more done but end up living your life on autopilot, without any introspection or applying your own taste.

0 subcomment

by syl5x

0 subcomment

inb4 "I got prompt injected and they stole my stuff". Now real talk, there are some viable usages of codex here but nothing novel its the same "old": "MEMORY,VAULT,BG TASKS" that everyone is doing.
And about voice mode, I thought it was a good idea but I seriously don't know how you guys use it, my thoughts whenever I use voice are "aaaaaaaaahhhhhh, uhmmm" and then cancel it so that I can type and organize my thoughts. I don't really think those "brain dumps" are useful when you are thinking out loud like "We should really do X oh wait but actually Y is in the way and we have to take into consideration Z, but wait Y was actually done" and so on, and it turns out that your assumptions are wrong, it becomes a mess. I am in favor of the LLM to work with facts and always verify it. To me this post is basically selling Codex app and that's it, nothing new inside.

by ivanbelenky

0 subcomment

something is happening with `codex`, at tamarillo.ai we did a [little experiment](https://research.tamarillo.ai/coding-harness-inspection/), with 400K repos that have AI harnesses configured and very interesting behavior is observed
- growing fast as fuck
- overepresentation on starred repos (even though stars mean less these days, it is definitely something to look at)
- overepresentation in `rust`
- in terms of aliveness, codex is first

by m3ch4m4n

0 subcomment

lol why is this on the front page of hacker news?

by mwilcox

0 subcomment

Slop

by tommy29tmar

0 subcomment

[flagged]

by xiaosong001

0 subcomment

[flagged]

by armada1122

0 subcomment

The diff-as-review point is the one I keep coming back to.
The cost of memory-as-files isn't writing them. It's that the agent will cheerfully claim it updated something and not actually do it, or write a one-line stub that satisfies the spec but loses the original signal. Without a verification layer, the vault accumulates plausible-looking entries that quietly drift from reality.
What ended up working for me was treating the agent's self-reported summary as a wish, not a fact. A separate process diffs the actual file system against the claimed changes and flags mismatches.
After a few cycles, the agent gets calibrated and stops claiming things that don't survive a file check. That has the side benefit of making the diff review itself much higher signal: most of what shows up is real.
The split I'd make early is per-agent instructions vs. cross-thread shared notes.
They sound like the same artifact, but “what this agent should always do” and “what sibling work just learned” age very differently. Mixing them means the wisdom gets stale together.