Putting Theory to Practice: Understanding the SSD Performance Degradation Problem

Let’s look at the problem in the real world. You, me and our best friend have decided to start making SSDs. We buy up some NAND-flash and build a controller. The table below summarizes our drive’s characteristics:

  Our Hypothetical SSD
Page Size 4KB
Block Size 5 Pages (20KB)
Drive Size 1 Block (20KB
Read Speed 2 KB/s
Write Speed 1 KB/s

 

Through impressive marketing and your incredibly good looks we sell a drive. Our customer first goes to save a 4KB text file to his brand new SSD. The request comes down to our controller, which finds that all pages are empty, and allocates the first page to this text file.


Our SSD. The yellow boxes are empty pages

The user then goes and saves an 8KB JPEG. The request, once again, comes down to our controller, and fills the next two pages with the image.


The picture is 8KB and thus occupies two pages, which are thankfully empty

The OS reports that 60% of our drive is now full, which it is. Three of the five open pages are occupied with data and the remaining two pages are empty.

Now let’s say that the user goes back and deletes that original text file. This request doesn’t ever reach our controller, as far as our controller is concerned we’ve got three valid and two empty pages.

For our final write, the user wants to save a 12KB JPEG, that requires three 4KB pages to store. The OS knows that the first LBA, the one allocated to the 4KB text file, can be overwritten; so it tells our controller to overwrite that LBA as well as store the last 8KB of the image in our last available LBAs.

Now we have a problem once these requests get to our SSD controller. We’ve got three pages worth of write requests incoming, but only two pages free. Remember that the OS knows we have 12KB free, but on the drive only 8KB is actually free, 4KB is in use by an invalid page. We need to erase that page in order to complete the write request.


Uhoh, problem. We don't have enough empty pages.

Remember back to Flash 101, even though we have to erase just one page we can’t; you can’t erase pages, only blocks. We have to erase all of our data just to get rid of the invalid page, then write it all back again.

To do so we first read the entire block back into memory somewhere; if we’ve got a good controller we’ll just read it into an on-die cache (steps 1 and 2 below), if not hopefully there’s some off-die memory we can use as a scratch pad. With the block read, we can modify it, remove the invalid page and replace it with good data (steps 3 and 4). But we’ve only done that in memory somewhere, now we need to write it to flash. Since we’ve got all of our data in memory, we can erase the entire block in flash and write the new block (step 5).

Now let’s think about what’s just happened. As far as the OS is concerned we needed to write 12KB of data and it got written. Our SSD controller knows what really transpired however. In order to write that 12KB of data we had to first read 12KB then write an entire block, or 20KB.

Our SSD is quite slow, it can only write at 1KB/s and read at 2KB/s. Writing 12KB should have taken 12 seconds but since we had to read 12KB and then write 20KB the whole operation now took 26 seconds.

To the end user it would look like our write speed dropped from 1KB/s to 0.46KB/s, since it took us 26 seconds to write 12KB.

Are things starting to make sense now? This is why the Intel X25-M and other SSDs get slower the more you use them, and it’s also why the write speeds drop the most while the read speeds stay about the same. When writing to an empty page the SSD can write very quickly, but when writing to a page that already has data in it there’s additional overhead that must be dealt with thus reducing the write speeds.

The Blind SSD Free Space to the Rescue
Comments Locked

250 Comments

View All Comments

  • OCedHrt - Wednesday, March 18, 2009 - link

    Excellent article. One of the best I've seen.
  • cliffa3 - Wednesday, March 18, 2009 - link

    I can tell a ton of work went into that, and all the history/details are greatly appreciated. I've been checking every week or so throughout February to see if it had been posted, but well worth the wait. As great as SSDs are, I can understand you not wanting to be near one for a while (-: Thanks for all the hard work...especially from the consumer standpoint. And kudos to OCZ for stepping up the way they did...that's (unfortunately) unheard of. Glad to see your no-compromise / report the facts no matter what attitude winning for the consumer. I'm glad at least one manufacturer was able to see (eventually) your intent wasn't to create a commotion, but to just plainly say what needed to be said.
  • sngbrdb - Wednesday, March 18, 2009 - link

    An extremely (as always) informative article; comprehensive and no angle missed. Good stuff!

    From an enthusiast's perspective, OCZ gained 10 levels of trust as a result of Ryan Peterson's response and handling of the Vertex' firmware. Ryan accepted the harsh reality expressed to him from an outside reviewer, risked marketability to rely on Anand's expertise (Anand is *absolutely* correct that 230MB/s is worthless if it comes with stuttering write latency), and resolved the problem in record time.

    This is the rare kind of responsiveness and attitude that translate directly into sales (I'm on my way to price the Vertex now).
  • tshen83 - Wednesday, March 18, 2009 - link

    BUT, still based on Windows Vista.

    I am going to drill this into reviewer's head -> NTFS isn't designed for SSDs.

    There are three problems for properly reviewing SSDs today:

    FileSystem, RAID controller, and SSD controller.

    Each of them can compensate for the SSDs, the question is which one SHOULD be responsible for optimizing random IOs.

    It is very clear that Intel's SSDs have implemented all the nitty gritty stuff like copy on write onto the SSD controller itself. So the OS or FileSystem shouldn't be responsible for performance degradation, however the same cannot be said for other SSDs.

    I am sure results would be difference if this were conducted on Solaris/OpenSolaris ZFS with Adaptec 5405(IOP348 based RAID card). Not to pump Solaris and ZFS, but it is the primary reason why IBM wants to buy SUN, because it is the only File System on the market that can properly operate SSDs and to do so without RAID controllers.

    If Anand really wants to stick to windows still, I think benchmarking on Windows 7 Beta would be slightly better option that Vista. Windows had made a lot of optimizations for rotational based hard disks that it actually makes SSD perform worse.

    The Vertex random write 4K IOPS benchmark doesn't look right at 2.6MB/sec, that is hardly 650 IOs. It should be much higher. It could be the ICH10R controller though.
  • hyc - Wednesday, March 18, 2009 - link

    I'd expect IBM's JFS to be pretty efficient on an SSD as well. Anything that appends and avoids overwriting existing sectors will perform better here.

    Stepping back a bit, I still have a perfectly usable Dothan-based laptop with IDE. Any chance of getting an in-depth review on recent Transcend 128GB IDE SSDs? My new laptop is running fine with a G.Skill Titan 256GB SSD, but when I fire up the older laptop it's unbearable, even with that 7200rpm Hitachi 100GB drive inside.

    By the way, I paid under $2/GB for the 256GB G.Skill Titan; for the work I do with it on Linux it performs fine most of the time. (Just make sure to maximize use of the FS cache.) I don't see the value proposition for the OCZ Vertex or Summit.
  • tshen83 - Wednesday, March 18, 2009 - link

    The random write 4K benchmark isn't right for the Vertex and other SSDs because of the test procedure:

    "The write test was performed over an 8GB range on the drive, while the read test was performed across the whole drive."

    It partially disables any write optimization algorithms on the Vertex. Intel wasn't affected as much.

    Anand, your first article pumping X25-M literally screwed Samsung's SSD manufacturers big time: they lost hundreds of millions of dollars because of your blatant pumping. Yes the random write was a big problem, but so was testing it on a Windows OS with NTFS and integrated SATA controller like ICH9/10 with no ram cache and obviously lack of IO optimizations for SSDs.

    Please redo the review with a proper OS, ie Windows 7 beta or OpenSolaris.

  • Proteusza - Thursday, March 19, 2009 - link

    Yeah, who in their right mind uses Windows and integrated SATA controllers? Oh wait, nearly everyone.

    Since its pretty obvious that you either work for Samsung or one of their partners, I think its laughable that you think this cost them hundreds of millions in sales. How big is the SSD market exactly, and how many potential buyers visit this site? Not enough to cause such an impact if you ask me.

    And the fact remains - had you guys done what OCZ did, and optimized for real world use even if it cost you e-peen in the way of benchmarks, you would have been fine. Its only because you thought you could cheat and swindle consumers that you guys got a bad rep from Anand. Run an honest business, and your customers will thank you. I know that, if I ever considered an SSD, I would either buy Intel or an OCZ Vertex, nothing else. You know why? because they do what they say on the tin. You complain that the X25-M got a glowing review? Make a product as good as it and then Anand will sing your praises, but dont be upset when he tells it like it is.
  • tshen83 - Thursday, March 19, 2009 - link

    Nearly everyone uses Windows and integrated SATA controllers. It still does not negate the fact that neither were optimized for SSD random IO patterns.

    No, I don't work for Samsung or its partners. It didn't cost them hundreds of millions in sales, but it did cost them hundreds of millions in inventory markdowns. Just look at the free falling of price of JMicron and original Samsung based SSDs in the past few months, and multiply by the inventory, that's the loss I was mentioning.

    I am not saying that Intel X25-M is a bad drive. It is good. but there is no reason to use crippled OS File Systems and crippled SATA controller to show off the X25-M's internal copy on write features. When windows 7 comes out of beta(soon), it will be the OS the majority of people will use, and I am just looking forward 6 months when SSD adoption rate will improve more. As to Solaris ZFS, you don't need it if you aren't mentally capable of understanding its elegance.(Most people won't and it is ok)
  • strikeback03 - Thursday, March 19, 2009 - link

    If they had also tested with Solaris/ZFS and reported that the drives worked well there, but 99.x% of users can't take advantage of that, would you have been happier? They may work perfectly well in that scenario, but it is meaningless to most users. Working properly in Vista and OSX is currently a requirement for selling to general consumers. Windows 7 was not even available in beta at the time of the last test, I would expect they will test with it once it launches but for now with the OS/FS they are likely to use most of the available SSDs fail.

    Also, your economic analysis assumes they would have been able to sell all their inventory at the inflated prices they wanted to. Whether or not they received a negative review from sites like Anandtech, word would have gotten out from early adopters that they had problems. Also, they would have moved fewer units at those prices.
  • tshen83 - Thursday, March 19, 2009 - link

    I could really careless if they did review SSD ZFS. I am using it right now and it kicks ass. Next Version of OSX will have ZFS so I guess Apple agrees that ZFS is the way to go here.

    Vista is one of the crappiest OS Microsoft put out in recent memory, maybe besides the Windows ME release. Just look at Vista adoption rates, and you will see why.

    You still don't understand my argument. My argument was that either File System, or RAID controller or SSD controller must implement copy on write.(basically if you have to erase a block to write to it, you are screwed) ZFS implements that in the file system. Adaptec 5 series or any Intel IOP RAID cards also help SSD performance greatly. If you don't use those two, then the SSD controller must implement it(X25-M is in this category.) You only need one of the three to properly handle SSDs to get greatly improved performance. Anandtech's review obviously skips file system optimization by picking Vista, and RAID controller optimization by picking ICH10R. What is left is the poor SSD controller that needs to virtualize the logical space, thus making the review entirely biased toward the X25-M for a good reason.

    It is sad that this is supposedly a review for the Vertex units that OCZ sent to Anand, but it seems to me that it just turned out to be another article defending the X25-M. I know X25-M is a good SSD, but it does not explain why Anand should cripple the OS, Controller so much to do it and then test the SSDs with strange IO queue depth of 3 and during the random write IOPS test, tried to cap the write space to a 8GB confinement. Those settings greatly exaggerate X25-M's internal implementation advantages.

    My economic analysis was based on SSD spot price published on dramexchange.com. Since the release of X25-M's review by Anandtech, all Samsung/JMicron MLC drives(Core, Core v2, Supertalent, etc) have been reduced to spot price of 2 dollars per GB to clear the inventories from the typical 4-5 dollars per GB that they used to command. The inventory markdown can be as high as 200+ dollar per drive and then you multiply that by the inventory that major vendors had, giving you hundreds of millions of dollars of aggregate damage sustained by the group of Samsung/JMicron partners.

Log in

Don't have an account? Sign up now