FRESH

Hacker News

Home

Unpowered SSDs slowly lose data

731 points by amichail

by userbinator

8 subcomments

One key point about retention which is not often mentioned, and indeed neither does this article, is that retention is inversely proportional to program/erase cycles and decreases exponentially with increasing temperature. Hence why retention specs are usually X amount of time after Y cycles at Z temperature. Even a QLC SSD that has only been written to once, and kept in a freezer at -40, may hold data for several decades.
Manufacturers have been playing this game with DWPD/TBW numbers too --- by reducing the retention spec, they can advertise a drive as having a higher endurance with the exact same flash. But if you compare the numbers over the years, it's clear that NAND flash has gotten significantly worse; the only thing that has gone up, multiplicatively, is capacity, while endurance and rentention have both gone down by a few orders of magnitude.
For a long time, 10 years after 100K cycles was the gold standard of SLC flash.
Now we are down to several months after less than 1K cycles for QLC.

by dale_glass

3 subcomments

So on the off-chance that there's a firmware engineer in here, how does this actually work?
Like does a SSD do some sort of refresh on power-on, or every N hours, or you have to access the specific block, or...? What if you interrupt the process, eg, having a NVMe in an external case that you just plug once a month for a few minutes to just use it as a huge flash drive, is that a problem?
What about the unused space, is a 4 TB drive used to transport 1 GB of stuff going to suffer anything from the unused space decaying?
It's all very unclear about what all of this means in practice and how's an user supposed to manage it.

by traceroute66

3 subcomments

I assume this blog is a re-hash of the JDEC retention standards[1].
The more interesting thing to note from those standards is that the required retention period differs between "Client" and "Enterprise" category.
Enterprise category only has power-off retention requirement of 3 months.
Client category has power-off retention requirement of 1 year.
Of course there are two sides to every story...
Enterprise category standard has a power-on active use of 24 hours/day, but Client category only intended for 8 hours/day.
As with many things in tech.... its up to the user to pick which side they compromise on.
[1]https://files.futurememorystorage.com/proceedings/2011/20110...

by leo_e

4 subcomments

We learned this the hard way with "cold" backups stored in a literal safe.
We treated NVMe drives like digital stone tablets. A year later, we tried to restore a critical snapshot and checksums failed everywhere. We now have a policy to power-cycle our cold storage drives every 6 months just to refresh the charge traps.
It's terrifying how ephemeral "permanent" storage actually is. Tape is annoying to manage, but at least it doesn't leak electrons just sitting on a shelf.

by breput

2 subcomments

The spinrite[0] user group has noticed some of these effects, even on in-service drives.
The theory is that operating system files, which rarely change, are written and almost never re-written. So the charges begin to decay over time and while they might not be unreadable, reads for these blocks require additional error correction, which reduces performance.
There have been a significant number of (anecdotal) reports that a full rewrite of the drive, which does put wear on the cells, greatly increases the overall performance. I haven't personally experienced this yet, but I do think a "every other year" refresh of data on SSDs makes sense.
[0] https://www.grc.com/sr/spinrite.htm

by tzs

4 subcomments

What about powered SSDs that contain files that are rarely read?
My desktop computer is generally powered except when there is a power failure, but among the million+ files on its SSD there are certainly some that I do not read or write for years.
Does the SSD controller automatically look for used blocks that need to have their charge refreshed and do so, or do I need to periodically do something like "find / -type f -print0 | xargs -0 cat > /dev/null" to make sure every file gets read occasionally?

by kevstev

0 subcomment

Is there a real source that confirms this with data? I generally like xda, but the quality of their articles is uneven and they trend towards click bait headlines that try to shock/surprise you with thin content underneath. There have been a string of "Here is the one piece of software you didn't know you needed for your NAS" and it turns out to be something extremely popular like home assistant.
This article just seems to link to a series of other xda articles with no primary source. I wouldn't ever trust any single piece of hardware to store my data forever but this feels like clickbait- At one point they even state "...but you shouldn't really worry about it..."

by testartr

8 subcomments

what is the exact protocol to "recharge" an ssd which was offline for months?
do I just plug it in and let the computer on for a few minutes? does it needs to stay on for hours?
do I need to run a special command or TRIM it?

by cobertos

2 subcomments

How does one keep an SSD powered without actually mounting it or working with it? I have backups on SSDs and old drives I would like to keep somewhat live.
Should I pop them in an old server? Is there an appliance that just supplies power? Is there a self-hosted thing I can monitor disks which I have 0 access usage for and don't want connected to anything but want to keep "live"

by lizknope

0 subcomment

I've been hearing about this for years and it makes sense theoretically but has anyone ever actually seen it? What are the errors reported? Or does the drive return bad data but reports no error?
There was a guy on reddit that took about 20 cheap USB flash drives and checked 1 every 6 months. I think after 3 years nothing was bad yet.
I've copied OS ISO images to USB flash drives and I know they sat for at least 2 years unused. Then I used it to install the OS and it worked perfectly fine with no errors reported.
I still have 3 copies of all data and 1 of those copies is offsite but this scare about SSDs losing data is something that I've never actually seen.

by cosmic_cheese

2 subcomments

Is there any type of flash-based storage (preferably accessible to end users) that focuses on long term data retention?
If not, that feels like a substantial hole in the market. Non-flash durable storage tend to be annoying or impractical for day to day use. I want to be able to find a 25 year old SD card hiding in some crevice and unearth an unintentional time capsule, much like how one can pick up 20+ year old MiniDiscs and be able to play the last thing their former owners recorded to them perfectly.

by m0dest

3 subcomments

So, product idea: A powered "cold storage box" for M.2 SSDs. 2 to 8 M.2 slots. Periodically, an internal computer connects one of the slots, reads every byte, waits for some period of time, then powers off. Maybe shows a little green light next to each drive when the last read was successful. Could be battery-powered.

by brian-armstrong

2 subcomments

Powering the SSD on isn't enough. You need to read every bit occasionally in order to recharge the cell. If you have them in a NAS, then using a monthly full volume check is probably sufficient.

by carra

0 subcomment

We may be facing a grim situation in a few years because of this. Right now most consumer-grade storage is flash memory, and all of it suffers from this problem. SSDs, pendrives, SD cards, Compact Flash... Apparently games for the Nintendo 3DS and PS Vita are already suffering from this, and people losing photos in faulty SDs is hardly news.

by sevensor

7 subcomments

Flash is programmed by increasing the probability that electrons will tunnel onto the floating gate and erased by increasing the probability they will tunnel back off. Those probabilities are never zero. Multiply that by time and the number of cells, and the probability you don’t end up with bit errors gets quite low.
The difference between slc and mlc is just that mlc has four different program voltages instead of two, so reading back the data you have to distinguish between charge levels that are closer together. Same basic cell design. Honestly I can’t quite believe mlc works at all, let alone qlc. I do wonder why there’s no way to operate qlc as if it were mlc, other than the manufacturer not wanting to allow it.

by jcynix

3 subcomments

Hmm, so what about these modern high density hard drives which store track parameters for their servos in on-board flash (aka OptiNAND)? Do we get "spinning rust" which might loose the information where exactly it stored the data?
https://blog.westerndigital.com/optinand-explained/

by Wowfunhappy

1 subcomments

> But, most people don't need to worry about it. [...] You should always have a backup anyway. [...] Backing up your data is the simplest strategy to counteract the limitations of storage media. Having multiple copies of your data on different types of storage ensures that any unexpected incidents protect your data from vanishing forever. This is exactly what the 3-2-1 backup rule talks about: 3 copies of data on at least 2 different storage media, with 1 copy stored off-site.
Um. Backups seem like exactly why I might have data on an unpowered SSD.
I use HDDs right now because they're cheaper, but that might not be true some day. Also, I would expect someone less technically inclined than I am to just use whatever they have lying around, which may well be an SSD.

by reflexe

0 subcomment

Also, FYI for the one person here who uses raw nand flash: run ubihealthd (https://lwn.net/Articles/663751/).
It will trigger reads in random areas in flash, and try ti correct any errors found.
Without it, the same issue as in the original article will happen (even if the device is powered on): areas in the NAND were not read for long time will have more and more errors, causing them to be non recoverable.

by groestl

1 subcomments

A strong believe of mine: there is no storage, only communication. I hold that thought since I first heard of SRAM, and I think it applies to everything, knowledge, technology, societies, our universe in general..

by mghackerlady

1 subcomments

This is a really good case for better file systems with built in error correction and self healing. On linux they have btrfs which kinda does this, and some support for zfs. In the BSD land we have zfs and hammer2. Does NT or Mac have anything like this? I think Mac might have some unofficial zfs support but I don't know the state that's in

by buildbot

0 subcomment

Making sure all your important data can be read and is checked at read is like zfs scrubs sole purpose. Seems like at least a monthly scrub is a very good idea for SSDs.

by EVAN1098

0 subcomment

It’s true that SSDs can lose data over time without power, but it usually takes a long period and depends on storage temperature and drive quality. For normal users who power their devices regularly, it’s not a big concern. Still, it’s a good reminder to keep backups if the data really matters.

by behringer

0 subcomment

The article implies this is not a concern for "regular" people. That is absolutely false. How many people get their family photos when they finally decide to recycle that 15 year old PC in the basement?
How many people have a device that they may only power up ever few years, like on vacation. In fact, I have a device that I've only used on rare occasions these days (an arcade machine) that now I suspect I'll have to reinstall since It's been 2 or 3 years since I've last used it.
This is a pretty big deal that they don't put on the box.

by lxgr

0 subcomment

As far as I understand, this even applies to some seemingly read-only storage such as game cartridges, e.g. those for the Nintendo Switch.
Flash storage is apparently cheaper (especially for smaller production runs) and/or higher density these days, so these cartridges just use that and make it appear ROM-like via a controller.

by hosh

1 subcomments

Does this also apply to thumbdrives?

by nubinetwork

0 subcomment

I've got some old SSDs just to test this myself, the old 256gb corsairs I tested previously were fine after a year and a half, but I might have misplaced them...(they only had 10% write life left, so no huge loss) the 512gb samsungs on my desk should be getting pretty ripe soon though, I'll have to check those too.

by BaardFigur

1 subcomments

I don't use my drive much. I still boot it up snd write some data, just not the long term one. Am I in risk?

by 2rsf

0 subcomment

I was in a team that wrote the firmware to handle that 15 years ago, with focus on automotive implementations where temperature might be high and access to the data is harder.

by storus

0 subcomment

I thought that was an ancient issue with Samsung 740? I had that one and it was slowly losing speed when unpowered due to an accumulation of errors and rewriting the individual sectors once for the whole drive made it work fine for a year.

by spoaceman7777

2 subcomments

A solution I haven't yet seen in this thread is to buy multiple drives, and sacrifice the capacity of one of those drives to maintain single parity via a raidz1 configuration with zfs. (raidz2 or raidz3 are likely better, as you can guard against full drive failures as well, but you'd need to increase the number of drives' capacity that you're using for parity.)
zfs in these filesystem-specific parity-raid implementations also auto-repairs corrupted data whenever read, and the scrub utility provides an additional tool for recognizing and correcting such issues proactively.
This applies to both HDDs and SSDs. So, a good option for just about any archival use case.

by paulkrush

3 subcomments

I had to search around and feel like a dork not knowing this. I have my data backed up, but I keep the SSDs because it's nice to have the OS running like it was... I guess I need to be cloning the drives to ISOs and storing on spinning rust.

by stuxnet79

0 subcomment

Welp, new fear unlocked. I need to move all my backups to ZFS sooner rather than later ...

by fsckboy

1 subcomments

>unpowered SSDs slowly lose data
so it's as if the data... rusts, a little bit at a time

by pmarreck

0 subcomment

shameless plug of my anti-bitrot tool, which I am actually enhancing with a --daemon mode currently
https://github.com/pmarreck/bitrot_guard

by fuzztester

1 subcomments

Does the same apply to USB thumb drives, i.e. do they lose their data if not plugged in?

by coppsilgold

0 subcomment

My scrub script:
```
    dd if=$1 of=/dev/null iflag=direct bs=16M status="progress"
    smartctl -a $1
```
If someone wants to properly study SSD data-retention they could encrypt the drive using plain dm-crypt and fill the encrypted volume with zeroes and check at some time point afterwards to see if there are any non-zero blocks. This is an accessible way (no programming involved) to let you write random data to the SSD and save it without actually saving the entire thing - just the key. It will also ensure maximum variance in charge levels of all the cells. Will also prevent the SSD from potentially playing tricks such as compression.

by dboreham

1 subcomments

Quick note to not store any valuable data on a single drive. And when you store it on two drives, don't use the same kind of drive. (Speaking from bitter experience using spinning drives in servers that had a firmware bug where they all died at the time number of seconds of power-on time).

by canadiantim

2 subcomments

What is the best way to store data for a long time then?

by tensility

0 subcomment

Good advice; however, past experience suggests that conventional magnetic hard drives suffer problems of stiction when left in cold storage for too long. I wouldn't trust either technology for long-term archival purposes.

by roschdal

0 subcomment

Shitty Storage Device

by burnt-resistor

0 subcomment

Not having a (verified) backup is driving without a seatbelt.

by bossyTeacher

2 subcomments

This is why I would rather pay someone a couple of dollars per year to handle all this for me. If need be pay two providers to have a backup.

by lofaszvanitt

0 subcomment

xda-developers..... reliable source. nonetheless time to ask the ssd makers whats their take on this. they have support,time to write to them.

by yapyap

0 subcomment

good to know but apart from some edge cases this doesnt matter that much

by KPGv2

0 subcomment

> SSDs have all but replaced hard drives when it comes to primary storage.
Really? I could have sworn that primary storage was the one place they weren't going to replace HDDs. Aren't they more of a thing for cache?
I've aged and busied myself beyond keeping track of this stuff anymore. I'm going to buy a packable NAS in the next couple months and be done with it. Hopefully ZFS since apparently that's the bee's knees and I won't have to think about RAIDs anymore.

by fnord77

0 subcomment

well poop, I was just about to buy an 8Tb ssd to use as a backup device

0 subcomment

by kevin-scott21

0 subcomment

[dead]

by thomas-shelby

0 subcomment

[dead]

by hyperliner

0 subcomment

[dead]

by Barry-Perkins

0 subcomment

[dead]

by TacticalCoder

0 subcomment

So basically if you like to put SSDs on shelves (for offline backups), you should read them from scratch once in a while?
I rotate religiously my offline SSDs and HDDs (I store backups on both SSDs and HDDs): something like four at home (offline onsite) and two (one SSD, one HDD) in a safe at the bank (offline offsite).
Every week or so I rsync (a bit more advanced than rsync in that I wrap rsync in a script that detects potential bitrot using a combination of an rsync "dry-run" and known good cryptographic checksums before doing the actual rsync [1]) to the offline disks at home and then every month or so I rotate by swapping the SSD and HDD at the bank with those at home.
Maybe I should add to the process, for SSDs, once every six months:
```
    ... $  dd if=/dev/sda | xxhsum
```
I could easily automate that in my backup'ing script by adding a file lastknowddtoxxhash.txt containing the date of the last full dd to xxhsum, verifying that, and then asking, if a SSD is detected (I take it on a HDD it doesn't matter), if a full read to hash should be done.
Note that I'm already using random sampling on files containing checksums in their name, so I'm already verifying x% of the files anyway. So I'd probably be detecting a fading SSD quite easily.
Additionally I've also got a server with ZFS in mirroring so this, too, helps keep a good copy of the data.
FWIW I still have most of the personal files from my MS-DOS days so I must be doing something correctly when it comes to backing up data.
But yeah: adding a "dd to xxhsum" of the entire disks once every six months in my backup'ing script seems like a nice little addition. Heck, I may go hack that feature now.
[1] otherwise rsync shall happily trash good files with bitrotten ones

by formerly_proven

1 subcomments

> Even the cheapest SSDs, say those with QLC NAND, can safely store data for about a year of being completely unpowered. More expensive TLC NAND can retain data for up to 3 years, while MLC and SLC NAND are good for 5 years and 10 years of unpowered storage, respectively.
This is somewhat confused writing. Consumer SSDs usually do not have a data retention spec, even in this very detailed Micron datasheet you won't find it: https://advdownload.advantech.com/productfile/PIS/96FD25-S2T... Meanwhile the data retention spec for enterprise SSDs is at the end of their rated life, which is usually a DPWD/TBW intensity you won't reach in actual use anyway - that's where numbers like "3 months @ 50 °C" or whatever come from.
In practice, SSDs don't tend to loose data over realistic time frames. Don't hope for a "guaranteed by design" spec on that though, some pieces of silicon are more equal than others.