FRESH

Hacker News

Case study: recovery of a corrupted 12 TB multi-device pool

116 points by salt4034

by yjftsjthsd-h

3 subcomments

> This is not a bug report. [...] The goal is constructive, not a complaint.
Er, I appreciate trying to be constructive, but in what possible situation is it not a bug that a power cycle can lose the pool? And if it's not technically a "bug" because BTRFS officially specifies that it can fail like that, why is that not in big bold text at the start of any docs on it? 'Cuz that's kind of a big deal for users to know.
EDIT: From the longer write-up:
> Initial damage. A hard power cycle interrupted a commit at generation 18958 to 18959. Both DUP copies of several metadata blocks were written with inconsistent parent and child generations.
Did the author disable safety mechanisms for that to happen? I'm coming from being more familiar with ZFS, but I would have expected BTRFS to also use a CoW model where it wasn't possible to have multiple inconsistent metadata blocks in a way that didn't just revert you to the last fully-good commit. If it does that by default but there's a way to disable that protection in the name of improving performance, that would significantly change my view of this whole thing.

by harshreality

2 subcomments

by throwaway270925

1 subcomments

> A hard power cycle on a 3 device pool (data single, metadata DUP, DM-SMR disks) left the extent tree and free space tree in a state that no native repair path could resolve.
As a ZFS wrangler by day:
People in this thread seem to happily shit on btrfs here but this seems to be very much not like a sane, resilient configuration no matter the FS. Just something to keep in mind.

by Retr0id

4 subcomments

This is obviously LLM output, but perhaps LLM output that corresponds to a real scenario. It's plausible that Claude was able to autonomously recover a corrupted fs, but I would not trust its "insights" by default. I'd love to see a btrfs dev's take on this!

by stinkbeetle

2 subcomments

> Case study: recovery of a severely corrupted 12 TB multi-device pool, plus constructive gap analysis and reference tool set #1107
Please don't be btrfs please don't be btrfs please don't be btrfs...

by jamesnorden

1 subcomments

People swear btrfs is "safe" now, but I've personally been bitten by data corruption more than once, so I stay away from it now.

0 subcomment

by c-c-c-c-c

0 subcomment

by londons_explore

0 subcomment

Btrfs allows migration from ext4 with a rather good rollback strategy...
Post-migration, a complete disk image of the original ext4 disk will exist within the new filesystem, using no additional disk space due to the magic of copy-on-write.
Why isn't the repair process the same? Fix the filesystem to get everything online asap, and leave a complete disk image of the old damaged filesystem so other recovery processes can be tried if necessary.

0 subcomment

by duskdozer

1 subcomments

Welp. Guess I need to figure out another fs to use for a few drives in a nonraid pool I haven't gotten around to setting up yet. I forget why zfs seemed out. xfs?

by phoronixrly

1 subcomments

To theal author: did you continue using btrfs after this ordeal? An FS that will not eat (all) your data upon a hard powercycle only at the cost of 14 custom C tools is a hard pass from me no matter how many distros try to push it down my throat as 'production-ready'...
Also, impressive work!

by lnx01

0 subcomment

by devnotes77

0 subcomment

by weiyong1024

0 subcomment

by blae

2 subcomments

oh great here comes all the zfs fanboys to shit on btrfs again with made up stories of corruption