FRESH

Hacker News

Home

100k TPS over a billion rows: the unreasonable effectiveness of SQLite

402 points by speckx

by rich_sasha

4 subcomments

I concur that sqlite is quite amazing. That said, I was a heavy user and have grown some skepticism as well:
- it is not that hard to lock the db. Usually killing the process that caused the deadlock solves the issue - but you need to identify it / monitor for it. And yes, it happens with WAL too
- but when it does happen, it is quite scary. Simply, anything that touches your DB suddenly stops working - can't read, can't write.
- in some cases, WAL does not checkpoint. This leads to drastic growth in the size of the WAL file, and down the line in catastrophic slowdown of queries - things that take 10ms suddenly take 10 seconds. In my particular case, no tweaking of SQLite params fixed it. I had to monitor for it, and periodically force WAL file to be rolled into the main DB.
- all of this gets harder on Windows, where eg.you cannot just 'lsof' a file.
- the performance stats change somewhat for the worse in the cloud on drives that look local but actually aren't. Of course that is not sqlite's fault, but the blazing fast performance doesn't apply to all commonly encountered environments that look like real local drives.
I'm not dissing SQLite, I use it despite these shortcomings. Equally, I'm happy to reach for something like Postgres, which, well, hasn't burned me yet.
EDIT I should add that despite all this I never managed to corrupt the DB, or break any of the SQL promises - never messed up indices, never saw broken ACID compliance etc. And that's a massive endorsement, on reflection.

by weitendorf

1 subcomments

I've been working on a hybrid protobuf ORM/generic CRUD server based on sqlite
(code at https://github.com/accretional/collector - forgive the documentation. I'm working on a container-based agent project and also trialling using agents heavily to write the individual features. It's working pretty well but the agents have been very zealous at documenting things lol).
This is my first real project using sqlite and we've hit some similarly cool benchmarks:
* 5-15ms downtime to backup a live sqlite db with a realistic amount of data for a crud db
* Capable of properly queueing hundreds of read/write operations when temporarily unavailable due to a backup
* e2e latency of basically 1ms for CRUD operations, including proto SerDe
* WAL lets us do continuous, streaming, chunked backups!
Previously I'd only worked with Postgres and Spanner. I absolutely love sqlite so far - would still use Spanner for some tasks with an infinite budget but once we get Collector to implement partitions I don't think I would ever use Postgres again.

by matthewaveryusa

7 subcomments

The only caveat being this assumes all your data can fit on a single machine, and all your processing can fit on one machine. You can get a a u-24tb1.112xlarge with 448 vcores, 24TB RAM for 255/hour and attach 64TB of EBS -- that's a lot of runway.

by diath

5 subcomments

> Hopefully, this post helps illustrate the unreasonable effectiveness of SQLite as well as the challenges you can run in with Amdahl's law and network databases like postgres.
No, it does not. This article first says that normally you would run an application and the database on separate servers and then starts measuring the performance of a locally embedded database. If you have to keep the initial requirement for your software, then SQLite is completely out of equation. If you can change the requirement, then you can achieve similar performance by tuning the local PGSQL instance -- and then it also becomes a valuation of features and not just raw throughput. I'm not saying SQLite is not an option either, but this article seems confusing in that it compares two different problems/solutions.

by jaketoronto

2 subcomments

Are you limiting your # of connections to postgres to 8? Is this unnecessarily throttling your throughput? This seems like quite the bottleneck... connection pools are good when your app is overwhelming your db.. but in this case, you really should be trying to put more load on Postgres... I'm concerned that this whole experiment is tainted by this choke point. I would love to see this tested again with a much larger connection pool. (Also, might be nice for you to share what the CPU and thread usage on the Postgres side was during this). (Side note: I'm a big fan of sqlite.. but I've run into lots of performance bottlenecks caused by arbitrarily low connection pools and the like)

by tomconnors

2 subcomments

Cool stuff as usual, Anders. One of the nice things about running a networked DB is that it makes redeploying the application a bit simpler. You can spin up a new EC2 instance or whatever and once it's online kill the old one. That gets 0 or close to 0 downtime. If the DB is on the same instance, replacing it requires loading up the DB onto the new instance, which seems more error prone than just restarting the app on the original instance, but in my experience that typically incurs downtime or some complicated handoff logic. Have you had to deal with anything like that running sqlite in prd?

by tomasol

3 subcomments

Author is setting PRAGMA synchronous="normal", meaning fsync is not issued as part of every write tx, but eventually. In order to make the comparison fair it should be set to "full".

by janci

2 subcomments

How does SQLite handle HA setups? The minimum I want is reliable automatic failover in reasonable time for user-facing service. Ideally an active-active setup.

by alexwennerberg

1 subcomments

Does anyone have examples of organizations that have leveraged SQLite and written about their experience? I've read a lot of theory and benchmarks about it lately and it seems extremely impressive, but I'm wondering if anyone has written about pushing it to its limits "in production"

by cryptonector

0 subcomment

My problem with SQLite3 is the poverty of types and the dynamic typing. I really want the wealth of types that PG brings. But I really like the SQLite3 implementation better than the PG implementation (granted, I'm talking about the core of the RDBMS, not anything to do with networking since SQLite3 lacks networking).

by ducktective

1 subcomments

Does anyone have rough numbers (max daily users etc) on viability of SQLite vs PostgreSQL for a typical user-facing webapp or e-commerce application?
I know due to some recent update, SQLite can support concurrent reads but still only a single writer. For which cases this would be a problem?
Some recommend it's better to start with postgres anyway if you have any remote thoughts of scaling in mind....

by NorwegianDude

1 subcomments

This is very misleading. The secure defaults for sqlite is changed, so commits are not actually written to the disk. Running sqlite like this will cause data loss on os crash or power loss.

by bob1029

1 subcomments

This is great until you encounter a customer with a hard RPO requirement of 0. SQLite has a few replication options, but I would never trust this in high stakes domains over PGSQL/MSSQL/DB2/Oracle/etc.

by cluckindan

1 subcomments

The first example uses a pattern where application code is run in the middle of a db transaction. That is almost certainly an anti-pattern, unless the computation is very lightweight - and even in that case it could probably be done in SQL.

by SJC_Hacker

0 subcomment

SQLite is cool. You can create an in memory DB ands it’s super fast. Better stick with read only though
With a few lines of code you can even copy an arbitrary DB from disk to memory.

by pmdulaney

1 subcomments

Prefer "surprising" to "unreasonable".

by jcadam

0 subcomment

I've been getting some good results from sqlite-vec on my current project. I need to look at geospatial extensions next.

0 subcomment

by andersmurphy

0 subcomment

(author here) Thank you for the fantastic discussion so far. I've added an Epilogue to the article that covers:
- Pragma synchronous "FULL"
- Giving PG a bigger connection pool

by gausswho

0 subcomment

SQLite db's do not reliably survive Syncthing across multiple devices. Until that's true I am always seeking a DB design that does.

by shadowgovt

0 subcomment

Good. I feel like people keep discovering this principle that "When the engine does far, far fewer things, it's faster," but I certainly won't bash on people learning this, however they get there.

by westurner

1 subcomments

That's a helpful TPS Report.
TIL `SAVEPOINT` can occur in a BEGIN ... END SQLite transaction, and that works with optimizing batch size on a particular node with a given load.
Is there a solution for SQLite WAL corruption?
From https://news.ycombinator.com/item?id=45133444 :
> "PSA: SQLite WAL checksums fail silently and may lose data" https://news.ycombinator.com/item?id=44672902
> sqlite-parquet-vtable, [...]

0 subcomment

by Jyaif

1 subcomments

> If you [..] do not fully understand the nuances and limitations of SQLite DO NOT USE IT.
So, what are the limitations compared to Postgres?

by busymom0

3 subcomments

Previously, I had always used Postgres for database and Rust or NodeJS for my backend. For my new website (https://limereader.com/), I used Swift for my backend, SQLite for Database, Vapor for web server in the Swift app and am self-hosting the site on an old Mac mini.
A sqlite related issue I ran into had to do with accessing the SQLite database from multiple threads. Found out a solution easily: for multi-threading use, SQLite needed to be initialized with a `SQLITE_OPEN_FULLMUTEX` flag. Since then, the website has been running seamlessly for about 3 weeks now.

by dmezzetti

0 subcomment

I've used SQLite as the content storage engine for years with TxtAI. It works great. Also plenty of good add-ons for it such as sqlite-vec for storing vectors. It can take you pretty far and maybe it's all you need in many circumstances.

0 subcomment

by CyberDildonics

0 subcomment

Those numbers are just called a database.

0 subcomment

by koakuma-chan

0 subcomment

Alternative title: SQLite is All You Need

by badmonster

3 subcomments

The real insight here is recognizing when network latency is your bottleneck. For many workloads, even a mediocre local database beats a great remote one. The question isn't "which database is best" but "does my architecture need to cross network boundaries at all?"

by andrewvc

1 subcomments

The HN SQLite worship posts have gotten out of hand. What’s next a post on how appending to files is faster than Kafka?
It’s great that some people have workloads that this is a fit for. What’s more common is the use case managed databases like RDS etc solves for. You have some quantity of data you want to always be there, be available over a network for whatever app(s) need it and want backups, upgrades, access control etc solved for you.
I love SQLite and reach for it for hobby projects, but as a product for general business apps it is quite niche. It has the qualities that make for any popular product on HN, a great getting started experience and a complex maintenance and operational experience.

by Tractor8626

2 subcomments

Sqlite is very cool. But what was the point of slowing postgress down?
> But, wait our transactions are not serialisable, which they need to be if we want consistent transaction processing
You either don't know what serializable does or trying to mislead the reader. There is zero reason to use searializable here.
> Let's say you have 5ms latency between your app server and your database.
5ms latency is unrealistic. Unless you use wifi or you database is in another datacenter.
> I'm talking about transactions per second, specifically interactive transactions that are common when building web applications
No they are not common at all. You probably invented them just to make pg look bad.