FRESH

Hacker News

Home

What it feels like to work with Mythos

359 points by swolpers

by eithed

19 subcomments

What I find fascinating that there is so little substance in this article about the quality of produced code and the medium. Is the code documented and tested? Is it understandable and extendable? Is it secure? What language, framework, database was used? Author mentions judgement and taste - well, is the code tasteful? Will the model rearchitecture the entire thing if I ask it to add new functionality, spending another 9.5h in tokens? I assume that the research part is domain knowledge = how different types of travel translate to time making it presentable; how did the author verify this?
These questions are even not about AI: if I were to give money to a human agency and were given something they tell me works, I would ask the same questions. If I did not know how to evaluate, I would hire people that do. With LLMs the verification part is what bothers me the most.

by JumpCrisscross

2 subcomments

Anecdote: I fed Fable some models I’ve been hand verifying (basically, I sketch out a scenario for Opus to model, it builds it, I ask it to show me the math, I correct it, we iterate like this, then I double check its code to make sure the math matches the model logic). Fable found almost every error I found, and then had some interesting suggestions for additional variables.
It also burned through my usage quota like a late-90s Hummer.

by olafmol

1 subcomments

This little line from the article scares me: "but a software engineer would iron out the remaining potential bugs that I could not find quickly"
Every sw dev knows this is a very dangerous, and unrealistic, assumption.

by anonzzzies

2 subcomments

Been working on my pet project today with Fable; it seems pretty solid but not too far removed from 4.8; same hallucinating, same type of bugs, same focus in large projects on just doing what you ask and just ignoring whatever that may touch/break/influence. Running tests in the beginning but when fuller context, just 'will run later' and never doing it in the end unless you tell it to (using some assorted swear words). I will keep using it but it's incremental as far as i'm seeing, not the OMG OMG OMG Mythos is here!

by ecocentrik

2 subcomments

Reading the first few paragraphs of what he calls "the most sophisticated academic social science paper I have yet seen from an AI" does not impress as much as I hoped.
"Posterior beliefs about market demand are purely referencedependent: holding dollars raised constant, they track only performance relative to the founder’s self-chosen goal—jumping half a standard deviation at the threshold, responding steeply for the first ten points past it, and flattening thereafter"
Humans generally don't verbalize data this way. The summary document is also very fluffy.

by gopalv

5 subcomments

> It worked for nine and a half hours.
> Again, it wasn’t perfect. As an expert, I was able to spot some errors and omissions (some as a result of the design I had asked for) that I had the AI correct
That's the bit that stuck out to me - that's longer than I would expect to work on a problem in a day or even expect to go back & fix the output of something that has a core reward loop of hours.
My customers are currently clamoring to push down my agent response times from 85 seconds down to below the 20s mark.
At the same time, it is very dissonant to see the industry heading towards hour+ long workflows with an agent.

by mohsen1

2 subcomments

I have been using it for less than an hour so take this with a grain of salt of being excited for the new tech.
In a project like mine (https://github.com/tsz-org/tsz) I am constantly frustrated that models were not doing enough research and were not taking into account other situations. Again and again models would produce code that would fix one thing and break 2 other tests that were "unrelated".
With Fable it seems like tasks are taking much longer (I have not seen a pull request from Fable sessions yet) but reading the transcription of those sessions I can see how it is doing the right thing by not leaving any stone unturned.
As the article says, it's hard to communicate this "feeling" about models because it is very project specific but I thought I share

by nstart

2 subcomments

Desperate to know what the prompt for the poem is. The idea of it felt familiar so I went down the rabbit hole and found: 14 years ago, a poem on reddit [https://www.reddit.com/r/RedditDayOf/comments/tjjw2/may_12_a...] . Nowhere near the length of the one the author shared but the same idea.
> This is from "The Cyberiad", a collection of science-fiction fairy tales by Polish author Stanislaw Lem ... In one of the stories, a robot constructor named Trurl creates a machine that writes poetry. A jealous rival named Klapaucian challenges the machine to compose "...a poem about a haircut! But lofty, noble, tragic, timeless, full of love, treachery, retribution, quiet heroism and in the face of certain doom! Six lines, cleverly rhymed, and every word beginning with the letter s!!"
And the computer responds with:
"Seduced, shaggy Samson snored.
She scissored short. Sorely shorn,
Soon shackled slave, Samson sighed.
Silently scheming,
Sightlessly seeking
Some savage, spectacular suicide"
The author had to be referencing this moment in their challenge to Fable/Mythos. I'm curious to know what their exact prompt was.

by selfawareMammal

7 subcomments

What are people working on that they see such a substantial difference between Mythos and Opus? I'd say I'm working with advanced stuff and more than often Deepseek is even more than enough. Why is everybody a genius in here?

by economistbob

1 subcomments

And therein is the problem most perfectly expressed. He prompted that all the data should be real and validated and then simply trusted that it was. That was for a data driven project. People will do that for countless things, even critical things.

by theturtletalks

6 subcomments

This is what he built:
https://isochronic-passage-chart.netlify.app/
Doesn’t work too well on mobile but looks interesting

by asdK120

3 subcomments

Mollick runs the Generative AI Lab at Wharton, with all the corporate sponsors.
He is a professor but sadly also an AI shill. He should switch to advertising washing power.

by michaelteter

2 subcomments

As a software engineer and solution provider, I do not feel threatened by this.
I do not fear that management will get tools like Mythos and then not need people like me. Most of the value I provide is in translating what the management/client _thinks_ they need into what is the real problem and solution.
That's not an insult to them, it's just pointing out that they see only their problem, and they imagine what would be the solution. They then ask for that solution. Quite often, what they want built isn't what they need. And I've seen so many problems, from so many domains and scenarios, that I can usually recognize the core need and propose (and build or direct building of) a solution which resolves that need AND has an eye toward the likely future needs.
Mythos may do an excellent job providing a high quality result based on what is asked of it. But the result will only be as good as the quality, clarity, and presentation of the request.
If I hire a home builder to build me a custom home, that builder is going to ask me a thousand questions - questions I had never even thought of. Mythos isn't going to ask all those questions - it's going to make the best choices it can without the consultant's level of interaction. And the buyer will get what they get. Sure, the buyer can then say, "oh, I don't want any hallways - just connected spaces." Then the house gets demolished and rebuilt to the new, clearer spec. Repeat, repeat repeat. Maybe eventually the buyer gets what they really want. More likely they give up before reaching that point, and they go and hire a real builder.
I'll sum it up like this: You can get great results with minimal effort if you don't really care too much about the details. But if you don't care much about the details, then your need probably wasn't very significant.

by shadytrees

0 subcomment

The Balatro game that Fable spit out (Flipside) https://play-flipside.netlify.app/ is buggy but fun. Fable also fixed one of my personal pet peeves. Unlike Balatro, it comes with a calculator to preview the score!

by thepasch

1 subcomments

What it feels like to work with Fable:
> Switched to Opus 4.8: Fable 5 has safety measures that flag messages on most cybersecurity or biology topics. They may flag safe, normal content as well. These measures let us bring you Mythos-level capability in other areas sooner, and we're working to refine them. Send feedback or learn more.

by neaden

3 subcomments

Man, that poem it made is terrible. Like just incredibly bad. Sure it's neat that software can make an incredibly bad poem but there is enough bad poetry in the world that we don't need it.

by wxw

1 subcomments

I am… underwhelmed by the artifacts in the post.
I don’t see why working longer is a pro. The results don’t seem much better than you’d get from putting Opus in a long loop.

by clhodapp

0 subcomment

So, the author gave the model single sentence prompts like "Balatro, but for the game of coin flips" plus generic encouragement like "make it better" three times ended up with netlify-hostable web games each time? That is hard for me to believe.
It's likely that at least some amount of additional context was provided to the model to enable it to reliably create the desired form factor. This introduces the caveat that the author probably views some amount of context as being trivial / beneath the level of mentioning. But then the question becomes where they draw the line.

by fractorial

0 subcomment

I’ve been falling back to Opus 4.6 since 4.7 and 4.8. I recently found success in using Opus 4.6 for cheap orchestration and reasoning while Opus 4.8 High/Max agents do the work.
I made serious progress towards repairing a proof for a conjecture that was published 10 days ago but kept running into a wall with one of the Lemmas.
I threw Fable 5 Max at it with the same subagent set up and in an hour it claimed to have disproved a core theorem of the paper.
The Lean construction looks correct, but I still need to verify it rigorously. This is certainly not something Opus 4.6 Max could do and it’s likely something Opus 4.8 Max could do with more delicate orchestration and time. However, the “one-shot” Fable 5 did give me pause.

by Ameo

0 subcomment

Most depressing thing I've read in weeks, and that's a high bar. Hooray to humanity for creating the thing which has destroyed all the value of of being good at creating things.

by pu_pe

0 subcomment

The isochrone maps are quite beautiful [1], and go beyond the scope and refinement of some earlier human attempts I could find [2][3][4].
[1] https://isochronic-passage-chart.netlify.app/
[2] https://mapitout.welcome-to-nl.nl/
[3] https://commutetimemap.com/
[4] https://andrewding.ca/flightisochrones/

by mieubrisse

0 subcomment

Having a good prompt-engineering skill is the highest-leverage thing IMO, so I burnt 2 Max 20x usage windows to help Fable help me refactor mine. With its partnership we:
- Went deep on "what types of guidance even are there? what does giving good guidance mean?"
- Sampled my existing Claude guidance (CLAUDE.md, skills, hooks, etc.) and broke their guidance into "atoms"
- Categorized them by clustering, the same way Big Five was generated
- Generated a new candidate
- Then used independent agents to compare it against my existing corpus assuming that the new one would be worse
Working with it felt like working with a supersmart entity capable of generating very plausible-sounding but not-necessarily-true statements. The outcome certainly felt like an alien artifact, like nothing I'd make myself.
Only time'll tell if it holds up, but it sure had some interesting ideas.

by xavierforge

0 subcomment

No question the capability jump is real, but in my experience it correlates with shortcut-taking. Fable 5 (and Opus 4.8 before it) hallucinates more than any Claude model I’ve used. The most common failure mode is asking it to modify existing code and watching it skip reading the original file, reconstruct that section from imagination, and then apply edits on top of its own invention, even with full context provided.
Maybe my prompts are too vague, but it’s worth noting that every example in the post is a greenfield build, and vague prompting seems to hold up fine when there are no existing constraints to respect.

by mjamesaustin

0 subcomment

The snake game is legit very fun. Once I got the ability to pick up the apples and plant apple trees, I was sold.

by SupremumLimit

0 subcomment

I took a brief look at the code for one of the projects (https://github.com/emollick/concord/) he breathlessly praises and says "a software engineer would iron out the remaining potential bugs that I could not find quickly". The code looks like an unmaintainable mess.
Other commenters have pointed out that his isochrone map contains a lot of nonsense as well.
So the most charitable interpretation here is that this is a case of Gell-Mann amnesia.

by dmzxnico

0 subcomment

Probably just a model that was trained on high code bases, tuned to find security breaches and bugs by being "smart" enough to actually test the code by itself / manually going through the app / website feels easy for Fable so Mythos is just a better version.

by recursivedoubts

1 subcomments

would it be possible for mythos to make the space bar scroll the pages on your website properly?

by mawadev

1 subcomments

Isn't it weird that we started to gauge the quality of a model by checking the vibe of the vibe coding?

by jgilias

0 subcomment

Cool. But.
Most of the “impressive” stuff is not “the model” but “the harness”. Spinning up the subagents and teams of lower models, letting them explore, do adversarial coding. It’s all in the harness. Granted, Mythos might be better at that orchestration, but it’s still the harness.
Second is the prompting. The author is an expert in what they’re doing and prompts the system in a way that yields useful results. I see too many people believing that if an expert can achieve those results in a domain they’re familiar with, then them as non-experts will be able to as well. And that’s a fallacy that Mythos doesn’t change.

by kgeist

0 subcomment

Judging by the benchmarks on Artificial Analysis, "a very real leap over every model" is 2-3 points over competitors (say, 62 for Fable 5 vs. 59 for ChatGPT 5.5 xhigh for coding).

by lominming

0 subcomment

My main issue with many of these tests and reviews is that most of the results focus on testing the harness (in this case, likely Claude Code) rather than evaluating the model’s inherent performance.

by 382hi

1 subcomments

I think Qwen 3.7-Plus is better at reasoning than Mythos, and I've used both for quite a while.

by vb-8448

0 subcomment

Nice, but I'm really curious about how many tokens have been used.
There is only one hint: 475k tokens in the screenshot when OP asked the model to fix some behaviour, but it would be fascinating to know the total tokens amount.

by Aperocky

1 subcomments

> This is a map that shows the distance you can travel in a given length of time, and the first one was created in 1881 showing travel times from London.
The first item on the article, the first thing it showed, was wrong though.
It is 100% faster to go from London to New York in 1881 than Volgagrad. Or any of the Russian hinterland colored green or Turkey or Egypt.

by ElijahLynn

0 subcomment

Loved the article!
And I'm excited to try it, but also have a fear that I will like it too much and then won't have access to it in 2 weeks... But maybe I will and maybe it will be worth it and I'll just pay a bunch of extra for it and it'll be great!
I think the article could be improved by actually sharing more feelings. I clicked on the article for feelings but I didn't see that many feelings described.

by kleiba2

0 subcomment

> So I asked Fable to solve the problem, first generating a complex 19 page design document and then executing it.
> It worked for nine and a half hours.
And how much did that cost?

by root_axis

0 subcomment

I just can't stand this type of fawning language.

by philipswood

0 subcomment

> It also created a 10-page epic rhyming poem about a haircut where every word starts with the letter s
Wow

by ComplexSystems

0 subcomment

Who can afford to use this damn thing though? They're pricing everyone out of the market with stuff like this.

by catigula

1 subcomments

>Ethan Mollick
Just an FYI this guy is an AI hype-beast. Some of his tweets are truly out there.

by steve1977

1 subcomments

> it is indicative of AI solving a hard problem involving research, math, visual development, taste, judgement, complex coding, and more.
Is it a hard problem or is it just labor intensive?

by brockVond2021

0 subcomment

on the places I've checked, mostly Paris to places in Ireland or Britain, the times are off by an order of magnitude
looks nice but deeply flawed
classic LLM output

by ElijahLynn

0 subcomment

> The work has shifted from process to outcome. I no longer steer; I commission.

by PaulHoule

0 subcomment

My wife likes to say "feelings aren't facts"

by ThejaCH

0 subcomment

What it feels like to work with Mythos? Feels like am poor

by LogicFailsMe

0 subcomment

I'm using Fable this afternoon and it's definitely a step up from Opus 4.8, finding and fixing things Opus 4.8 was blind to even perceiving. The next 13 days are going to be fun IMO. And Opus 4.8 was less annoying than Opus 4.7 FWIW.
Edit: A couple hours in and I just got my first gaslighting attempt from the model. Good times!

by the_doctah

2 subcomments

More Mythos Marketing.

by 12345hn6789

0 subcomment

The coin flip game does not work. I tossed 2 coins and it broke after that. You cannot progress forward.
Not a great start for "a generational leap in model effectiveness"

by philipwhiuk

0 subcomment

Given that token counts are easily available not providing how much any of his examples cost is lunacy.

by queenkjuul

0 subcomment

not only is the site completely unusable on mobile ootb, but when i enable desktop mode on Android, my taps are detected in the wrong spot--clicking Chicago registers as Saskatoon.
At first i thought its routing was just completely botched.
The text overflow on the legend is pretty funny considering how well the other graphics turned out
(Edit: referring to the map app)

by zb3

0 subcomment

Was the condition of being granted early access to this castrated model writing a post praising it?

by zuzululu

0 subcomment

> First, how good is Fable? In experiment after experiment I conducted, it outperformed basically every other public model I have used by a considerable margin.
What makes me excited is that GPT 5.6 (its actually GPT 6) is going to be crazy

by younglunaman

1 subcomments

>What it feels like to work with Mythos >Looks Inside >So I did this with fable...
What?

by honeycrispy

0 subcomment

Reading it, I can't help but feel he's being paid to write this. Or maybe he hopes to be paid. The language he uses makes him sound like he's fawning over the lost days of his childhood. Pardon me for being skeptical, but a trillion dollar company running a net-loss is hoping to IPO, and needs to sway public opinion by any means necessary. I would imagine that no dirty marketing scheme is off of the table, even from the self-proclaimed "good guys".

by nickphx

0 subcomment

oh look, more overhyped drivel from a non-technical person.

by Andy_Donner

0 subcomment

[flagged]

by andrewvu0203

0 subcomment

[flagged]

by aryehof

0 subcomment

[flagged]

by ath3nd

0 subcomment

[dead]

by et-al

2 subcomments

[flagged]

by pbgcp2026

1 subcomments

So, Ethan Mollick has just broke an NDA he signed. Typical. Out of everyone participating in Project Glasswing it was, of course, the Uni to f*k it up.