- When someone says "stream data over the Internet," my automatic reaction is "open a TCP connection."
Adding a database, multiple components, and Kubernetes to the equation seems like massively overengineering.
What value does S2 provide that simple TCP sockets do not?
Is this for like "making your own Twitch" or something, where streams have to scale to thousands-to-millions of consumers?
- Kind of what I've been working on to build tenancy on top of SQLite CDC to make it a simple repayable SQLite for Marmot (https://github.com/maxpert/marmot). I personally think we have a synergy here, would drop by your discord.
- Shoutout to CodesInChaos for suggesting that instead of a mere emulator, should have an actually durable open source implementation – that is what we ended up building with s2-lite! https://news.ycombinator.com/item?id=42487592
And it has the durability of object storage rather than just local. SlateDB actually lets you also use local FS, will experiment with plumbing up the full range of options - right now it's just in-memory or S3-compatible bucket.
> So I'd try so share as much of the frontend code (e.g. the GRPC and REST handlers) as possible between these.
Right on, this is indeed the case. The OpenAPI spec is also now generated off the REST handlers from s2-lite. We are getting rid of gRPC, s2-lite only supports the REST API (+ gRPC-like session protocol over HTTP/2: https://s2.dev/docs/api/records/overview#s2s-spec)
- Neat! Having literally everything backed by object storage is The Dream, so this makes a lot of sense. So to compare this to the options that are available (that aren't Kafka or Redis streams) I can imagine you could take these items that you're writing to a stream, batch them and write them into some sort of S3-backed data lake. Something like Delta Lake. And then query them using I don't know DuckDB or whatever your OLAP SQL thing is. Or you could what develop your own S3 schema that that's just saving these items to batched objects as they come in. So then part what S2 is saving you from is having to write your own acknowledgement system/protocol for batching these items, and the corresponding read ("consume") queries? Cool!
- Love this. Elegant and powerful. Stateful streams are surprisingly difficult to DIY and as everything becomes a stream of tokens this is super useful tool to have in the toolbox.
- Can this be used as an embedded lib instead of a separate binary as an API?
And am I understanding correctly that if I pointed 2 running instances of s2-lite at the same place in s3 there would be problems since slatedb is single writer?
- Would be useful to have SlateDB WAL go to Valkey or somewhere else to reduce s3 put costs and latency.
by up2isomorphism
1 subcomments
- As someone worked in AWS, I will laughing at some one who really believes that s3 is “bottomless”.
Also there seems not much use cases nowadays want this, if there are any, they already use Kafka.