Friday, July 01

Geek

To Worry Or Not To Worry

Or, Much Ado About Random Write Endurance

Intel's 320-series 300GB SSD has a quoted 4KB random write endurance - that is, the minimum total volume of data you can write to it in individual 4KB randomly located blocks before it begins to fail - of 30TB.

30TB may sound a lot to you.  The primary MySQL server at my day job does 2.5TB of writes per day (and it's only one of several database servers).  MySQL writes tend to be random-ish, so you might at first glance expect the abovementioned drive under those conditions to burn out in 12 days.  For that reason (and the fact that the database is rather larger than 300GB), we don't use a 320-series SSD; we use a RAID-50 array of 20 enterprise drives each with about 60x the quoted write endurance.  Based on the quoted numbers and measured load, we should be good for at least 10 years.

The question is, though, what is the real-world longevity of SSDs under heavy random write conditions?  I've been very conservative about SSD deployment - for mee.nu I've used the more expensive enterprise SLC drives as well (and RAID-5 at that) even though our write activity is a couple of orders of magnitude lower.  The only MLC drives I've deployed in a production environment have been in applications where reads are random and writes are sequential - some of the Cassandra and Xapian databases at my day job fit this description.

However, this paper, presented at last year's Hot Storage conference, suggests that things might not be nearly so bad.  The authors examine a model of flash cell burnout, and note that if cells are given time to rest between write/erase cycles, their endurance can be expected to increase significantly.

How significantly?  Let's take our 300GB SSD and hit it with 2.5TB of data a day.  Let's assume a worst-case scenario on two  aspects - all of that is individual 4KB random writes, and there's no write-combining done by the OS or RAID controller.  Let's assume a best-case scenario on the other aspects - write multiplication is 1.0 (that is, no blocks need to be moved to allow for the updates) and wear-levelling is perfect across the drive (all blocks are updated evenly).  (All of these assumptions are completely implausible, but the idea is that they'll kind of balance out until I can get more precise data.)

That means that every block on the drive is updated every three hours.  A litle less than three hours, but near enough.  That paper suggests that with a 10,000 second - a little less than three hours - recovery period between write/erase cycles, write endurance of MLC cells can be expected to be 90 times the worst-case situation the manufacturers cite.

That is, rather than two weeks, the drive would last for three years.  And then drop dead all at once given our rather unreasonable scenario.

Which is a completely different picture from what the manufacturer's worst-case numbers might suggest.  And with a RAID controller with battery-backed write-back cache, the number of writes that actually hit the SSD can be significantly less.

The problem is, this is a simulation.  It's a very careful simultion based on the known physical properties of the semiconductor materials used in flash fabrication, but it's still a simulation.  I'm hoping I can get a couple of SSDs solely for the purpose of killing them, because I haven't seen anyone else publish good data on that.

The reason all this matters is that where a 300GB Intel MLC drive costst $600, 300GB of Intel SLC enterprise SSD storage comes to five drives totalling $4000.  The point may become somewhat moot when Intel's 710 MLC-HET drives launch.  The HET, I would guess, stands for something like high-endurance technology; these drives are based on cheaper MLC flash but optimised for reliability rather than capacity.  They will likely (based on reports in the trade press) cost twice as much as the regular MLC drives, but offer 20 to 40 times the endurance - nearly as good as SLC.  If the price and endurance turn out that way, then there will be 3x less reason to risk your data on a statistical model and a consumer drive.

Another thing: Intel's 320 series (unlike the earlier M-series) implement internal full-chip parity in the spare area, so even if one of the flash chips dies completely, the drive will continue operation unaffected.

Posted by: Pixy Misa at 03:56 AM | Comments (7) | Add Comment | Trackbacks (Suck)
Post contains 746 words, total size 5 kb.

1 Hi , 
Great article. I will introduce myself first , my name is Dan Porat and I am working for Anobit. We create SSDs based on MSP(tm) technology combined with MLC NANDS. We also create many more products , but let's focus on the SSDs for now.
I would like to purely pour our numbers to the usage case you mentioned above.
What do we bring to the table? under the following assumptions: 1.100% entropy - i.e totally random data. 2.100% random access - i.e. data is totally scattered around the drive. 3.10 times a day drives own capacity (i.e 4TB per day) The drive will endure 5 Years.
That is , before even mentioning it's formidable performance numbers (32K/24K R/W).
Given your usage model , I think it is a fair solution.
Along drive's life , that makes 7.3PB written to the drive.
Regarding the trade of SLC VS MLC , well , I think some applications would endure using SLC for various reasons  , so there will be place also for the intel/samsung/stec SLC drives for quite some time.
Thanks

Posted by: Dan Porat at Friday, July 01 2011 07:59 PM (zX6Y0)

2 So, how do you make a determination of Dan of anobits spams or not? I think trying to find if the same form comment is left on other blogs discussing SSDs may be one option.

Posted by: Pete Zaitcev at Saturday, July 02 2011 07:42 AM (9KseV)

3 If it's spam, then whoever wrote the spambot deserve the ACM medal.

Posted by: Pixy Misa at Saturday, July 02 2011 08:03 AM (PiXy!)

4 The company has sites in the US and in Israel. It's possible that English is not his first language -- and not all ESL's are as good at it as you are, Pete.

Posted by: Steven Den Beste at Saturday, July 02 2011 10:16 AM (+rSRq)

5 Guys ,  English is far from being my first language. I will gladly rephrase or erase the comment above if it hurts the discussion.
If any of you have any question regarding our technology , feel free.
Dan Porat


Posted by: Dan Porat at Saturday, July 02 2011 11:15 PM (zX6Y0)

6 Hi Dan.  No, it's fine!  It's just that compliments like "great article" are unfortunately mostly seen in spam.

I took a look at your company and products - very interesting work you're doing.  I wish you well.

Posted by: Pixy Misa at Sunday, July 03 2011 01:55 AM (PiXy!)

7 I would just like to add that the high-endurance MLC NANDs are showing under all kinds of names and tags. SandForce has DuraWrite. STEC has cellcare and S.A.F.E. Intel has HET.
Definitely the arena of High-endurance MLC NANDs will be extremely interesting over the coming year.
Thanks
Dan Porat

Posted by: Dan Porat at Monday, July 04 2011 07:03 PM (zX6Y0)

Hide Comments | Add Comment

Comments are disabled. Post is locked.
52kb generated in CPU 0.0132, elapsed 0.1673 seconds.
56 queries taking 0.1577 seconds, 344 records returned.
Powered by Minx 1.1.6c-pink.