Er, I appreciate trying to be constructive, but in what possible situation is it not a bug that a power cycle can lose the pool? And if it's not technically a "bug" because BTRFS officially specifies that it can fail like that, why is that not in big bold text at the start of any docs on it? 'Cuz that's kind of a big deal for users to know.
EDIT: From the longer write-up:
> Initial damage. A hard power cycle interrupted a commit at generation 18958 to 18959. Both DUP copies of several metadata blocks were written with inconsistent parent and child generations.
Did the author disable safety mechanisms for that to happen? I'm coming from being more familiar with ZFS, but I would have expected BTRFS to also use a CoW model where it wasn't possible to have multiple inconsistent metadata blocks in a way that didn't just revert you to the last fully-good commit. If it does that by default but there's a way to disable that protection in the name of improving performance, that would significantly change my view of this whole thing.
Changing the metadata profile to at least raid1 (raid1, raid1c3, raid1c4) is a good idea, especially for anyone, against recommendations, using raid5 or raid6 for a btrfs array (raid1c3 is more appropriate for raid6). That would make it very difficult for metadata to get corrupted, which is the lion's share of the higher-impact problems with raid5/6 btrfs.
check:
btrfs fi df <mountpoint>
convert metadata: btrfs balance start -mconvert=raid1c3,soft <mountpoint>
(make sure it's -mconvert — m is for metadata — not -dconvert which would switch profiles for data, messing up your array)As a ZFS wrangler by day:
People in this thread seem to happily shit on btrfs here but this seems to be very much not like a sane, resilient configuration no matter the FS. Just something to keep in mind.
Please don't be btrfs please don't be btrfs please don't be btrfs...
Post-migration, a complete disk image of the original ext4 disk will exist within the new filesystem, using no additional disk space due to the magic of copy-on-write.
Why isn't the repair process the same? Fix the filesystem to get everything online asap, and leave a complete disk image of the old damaged filesystem so other recovery processes can be tried if necessary.
Also, impressive work!