FRESH

Hacker News

Home

Building SQLite with a small swarm

105 points by kyars

by comex

6 subcomments

If it works, then it’s impressive. Does it work? Looking at test.sh, the oracle tests (the ones compared against SQLite) seem to consist in their entity of three trivial SELECT statements. SQLite has tens of thousands of tests; it should be possible to port some of those over to get a better idea of how functional this codebase is.
Edit: I looked over some of the code.
It's not good. It's certainly not anywhere near SQLite's quality, performance, or codebase size. Many elements are the most basic thing that could possibly work, or else missing entirely. To name some examples:
- Absolutely no concurrency.
- The B-tree implementation has a line "// TODO: Free old overflow pages if any."
- When the pager adds a page to the free list, it does a linear search through the entire free list (which can get arbitrarily large) just to make sure the page isn't in the list already.
- "//! The current planner scope is intentionally small: - recognize single-table `WHERE` predicates that can use an index - choose between full table scan and index-driven lookup."
- The pager calls clone() on large buffers, which is needlessly inefficient, kind of a newbie Rust mistake.
However…
It does seem like a codebase that would basically work. At a large scale, it has the necessary components and the architecture isn't insane. I'm sure there are bugs, but I think the AI could iron out the bugs, given some more time spent working on testing. And at that point, I think it could be perfectly suitable as an embedded database for some application as long as you don't have complex needs.
In practice, there is little reason not to just reach for actual SQLite, which is much more sophisticated. But I can think of one possible reason: SQLite has been known to have memory safety vulnerabilities, whereas this codebase is written in Rust with no unsafe code. It might eat your data, but it won't corrupt memory.
That is impressive enough for now, I think.

by gmerc

5 subcomments

Why do people fall for this. We're compressing knowledge, including the source code of SQLite into storage, then retrieve and shift it along latents at tremendous cost in a while loop, basically brute forcing a franken version of the original.

by bob1029

2 subcomments

> 84 / 154 commits (54.5%) were lock/claim/stale-lock/release coordination.
Parallelism over one code base is clearly not very useful.
I don't understand why going as fast as possible is the goal. We should be trying to be as correct as possible. The whole point is that these agents can run while we sleep. Convergence is non linear. You want every step to be in the right direction. Think of it more as a series of crystalline database transactions that must unroll in perfect order than a big pile of rocks that needs to be moved from a to b.

by simonw

1 subcomments

"Implements + tests against sqlite3 as oracle"
That's the real unlock in my opinion. It's effectively an automated reverse engineering of how SQLite behaves, which is something agents are really good at.
I did a similar but smaller project a couple of weeks ago to build a Python library that could parse a SQLite SELECT query into an AST - same trick, I ran the SQLite C code as an oracle for how those ASTs should work: https://github.com/simonw/sqlite-ast
Question: you mention the OpenAI and Anthropic Pro plans, was the total cost of this project in the order of $40 ($20 for OpenAI and $20 for Anthropic)? What did you pay for Gemini?

by khazhoux

4 subcomments

I'm a heavy Cursor user (not yet on Claude) and I see a big disconnect between my own experience and posts like this.
* After a long vibe-coding session, I have to spend an inordinate amount of time cleaning up what Cursor generated. Any given page of code will be just fine on its own, but the overall design (unless I'm extremely specific in what I tell Cursor to do) will invariably be a mess of scattered control, grafted-on logic, and just overall poor design. This is despite me using Plan mode extensively, and instructing it to not create duplicate code, etc.
* I keep seeing metrics of 10s and 100s of thousands of LOC (sometimes even millions), without the authors ever recognizing that a gigantic LOC is probably indicative of terrible heisenbuggy code. I'd find it much more convincing if this post said it generated a 3K SQLite implementation, and not 19K.
Wondering if I'm just lagging in my prompting skills or what. To be clear, I'm very bullish on AI coding, but I do feel people are getting just a bit ahead of themselves in how they report success.

by comrade1234

1 subcomments

What's the point of building something that already exists in open source. It's just going to use code that already exists. There's probably dozens of examples written by humans that it can pull from.

by delegate

1 subcomments

Great work! Obviously the goal of this is not to replace sqlite, but to show that agents can do this today. That said, I'm a lot more curious about the Harness part ( Bootstrap_Prompt, Agent_Prompt, etc) then I am in what the agents have accomplished. Eg, how can I repeat this myself ? I couldn't find that in the repo...

by losteric

1 subcomments

This blog post doesn't say anything about your experience.
How well does the resulting code perform? What are the trade-offs/limitations/benefits compared to SQLite? What problems does it solve?
Why did you use this process? this mixture of models? Why is this a good setup?

by randomifcpfan

1 subcomments

Interesting to compare this to the in-progress project https://github.com/Dicklesworthstone/frankensqlite
Which aims to match SQLite quality and provide new features (free encryption, multiple simultaneous writers, and bitflip resistance.)

by samrus

1 subcomments

I cant quite tell if the tests that passed were sqlites own famously thorough test suite, or your own.
If its sqlites suite then its great the models managed to get there, but one issue (without trying to be too pessimistic) is that the models had the test suite there to validate against. Sqlites devs famously spend more of their time making the tests than building the functionalities. If we can get AI that reliably defines the functionality of such programs by building the test suite over years of trial and error, then we'll have what people are saying

by mrmrcoleman

2 subcomments

Take a look at SQLite’s test coverage. It’s impressive: https://sqlite.org/testing.html
590x the application code

by cadamsdotcom

0 subcomment

If anyone is looking for ideas for these projects - it’d be great to be able to run macos applications on linux…
Someone could have a swarm of agents build “wine for macos apps”.

by small_model

1 subcomments

Would be better to choose a small subset of functionality and get that working as well as sqlite (or better) Then iterate that way. Context size is too small to work on such a large system.

by rco8786

2 subcomments

Not a single comment about whether it actually works or not?

by scirob

2 subcomments

Did they pass all unit tests in the end ?

by MagicMoonlight

1 subcomments

Why would you need 6 different models running across three providers? Just have a single one running, then you avoid all this nonsense around locking.
And this is ultimately pointless, because it’s just a shitter SQLite. It’s nothing new. If you’re going to build something big like this, there needs to be a real business case
You could already slop out a replica of SQLite if you wanted. But you don’t, because of the effort it would take to test and maintain it.

by marxisttemp

1 subcomments

Who cares?

by k33n

1 subcomments

> There isn’t a great way to record token usage since each platform uses a different format, so I don’t have a grasp on which agent pulled the most weight
lol

by diimdeep

1 subcomments

Why do you think that it is a good idea to make it public ? It is obviously half hallucinated mostly broken unusable piece of low effort (on human part), with as much value as blurry image generated with stable diffusion that people now widely consider bad taste and slop.

0 subcomment

by tonetheman

0 subcomment

[dead]