- Historically, I believe bcache offered a better design than dm-cache. I wonder if that has changed at all?
That said, for this use, I would be very concerned about coherency issues putting any cache in front of the actual distributed filesystem. (Unless this is the only node doing writes, I guess?)
- > For e-commerce workloads, the performance benefit of write-back mode isn’t worth the data integrity risk. Our customers depend on transactional consistency, and write-through mode ensures every write operation is safely committed to our replicated Ceph storage before the application considers it complete.
Unless the writer is always overwriting entire files at once blindly (doesn't read-then-write), consistency requires consistency reads AND writes. Even then, potential ordering issues creep in. It would be really interesting to hear how they deal with it.
by miladyincontrol
0 subcomment
- I just use fs-cache for networked storage caching. Good enough for redhat, good enough for me.
Unsure how performance compares but I like that it works transparently with little more than a mount flag to activate, works fine in containers, and if managed with cachefilesd it can scale dynamically as per configured quotas.
For local disks though? bcache
by 0xbadcafebee
1 subcomments
- This is good timing; I was just looking at a use-case where we need more iops and the only immediate solutions involve allocating way more high-performance disks or network storage. The problem with a cache is having a large dataset with random access, so repeated cache hits might not be frequent. But I had a theory that you could still make an impact on performance and lower your storage performance requirements. I may try this out, but it is block-level, so it's a bit intrusive.
Another option I haven't tried is tmpfs with an overlay. Initial access is RAM, falls back to underlying slower storage. Since I'm mostly doing reads, should be fine, writes can go to the slower disk mount. No block storage changes needed.
- dm-cache writeback mode is both amazing and terrifying. It reorders writes, so not only do you lose data if the cache fails, you probably just corrupted the entire backing disk.
- I remember seeing another strategy where a remote block device was (lazily?) mirrored to a local SSD. The mirror was configured such that reads from the local device were preferred and writes would go to both devices. I think this was done by someone on GCP.
Does this ring any bells? I’ve searched for this a time or two and can’t find it again.
- Why is two-thirds of their I/O crossing AZ boundaries for a read-heavy application? This application seems like it’s not well architected for AWS and puts them at availability risk in the event of a zonal impairment. It looks like they’re using Ceph instead of EBS, and it’s not clear why.
- I was looking into SSD caching recently and decided to go with Open-CAS instead, which should be more performant (didn't test it personally): https://github.com/Open-CAS/open-cas-linux/issues/1221
It's maintained by Intel and Huawei and the devs were very responsive.
by AtlasBarfed
2 subcomments
- "When deploying infrastructure across multiple AWS availability zones (AZs), bandwidth costs can become a significant operational expense"
An expense in the age of 100gbit networking that is entirely because AWS can get away with charging the suckers, um, customers for it
- Hmm.. I have a few questions:
1. How is the cache invalidated to avoid reading stale data?
2. If multi az setup is for high availability then I guess the only traffic between zones must be replication from the active one to the standby zones, in such a setup read cache doesn’t make much sense..