Impact of ext4’s discard option on my SSD

Solid-state drives (SSDs) are seen as the future of mass storage by many. They are famous for their high performance: extremely low seek times, since there is no head that needs move to a position and then wait for the spinning disk to come around to where it needs to read/write; but also higher throughput of sequential data: My 2,5″ OCZ Vertex LE (100 GB) is rated at 235 MB/s sustained write speed, and read speeds up to 270 MB/s, for example.

There is a caveat though – quoting Wikipedia:

In SSDs, a write operation can be done on the page-level, but due to hardware limitations, erase commands always affect entire blocks. As a result, writing data to SSD media is very fast as long as empty pages can be used, but slows down considerably once previously written pages need to be overwritten. Since an erase of the cells in the page is needed before it can be written again, but only entire blocks can be erased, an overwrite will initiate a read-erase-modify-write cycle: the contents of the entire block have to be stored in cache before it is effectively erased on the flash medium, then the overwritten page is modified in the cache so the cached block is up to date, and only then is the entire block (with updated page) written to the flash medium. This phenomenon is known as write amplification.

So, SSDs are fast at writing, but only when their free space is neatly trimmed. The only component in your software stack that knows which parts of your SSD should be trimmed, is your file system. That is why there is a file system option in ext4 (my current file system of choice), called “discard”. When this option is active, space that is freed up in the file system is reported to the SSD immediately, and then the SSD does the trimming right away. This will make the next write to that part of the SSD as fast as expected. Obviously, trimming takes time – but how much time exactly? I wanted to find out, and did the following: I measured the time to unpack and then delete the kernel sources (36706 files amounting to 493 MB, which is what I call a big bunch of small files). I did it three times with and three times without the “discard” option, and then took the average of those three tries:

Without “discard” option:

  • Unpack: 1.21s
  • Sync: 1.66s (= 172 MB/s)
  • Delete: 0.47s
  • Sync: 0.17s

With “discard” option:

  • Unpack: 1.18s
  • Sync: 1.62s (= 176 MB/s)
  • Delete: 0.48s
  • Sync: 40.41s

So, with “discard” on, deleting a big bunch of small files is 64 times slower on my SSD. For those ~40 seconds any I/O is really slow, so that’s pretty much the time when you get a fresh cup of coffee, or waste time watching the mass storage activity LED.

Don’t enable the “discard” option if you have a similar SSD. A much better way to keep your free space neatly trimmed for good write speeds is, to trigger a complete walk over the file system’s free space, and tell the SSD to trim that all at once. And of course you would do that at times when you don’t actually want to use the system (e.g. in a nightly cron job, or with a script that gets launched during system shutdown). This can be done with the ‘fstrim’ command (that comes with util-linux), which takes around six minutes for my currently 60% filled 95 GB file system.

Update (2011-07-08): I forgot some details that may be interesting:

  • Kernel version:
  • SSD firmware version: 1.32
  • CPU: AMD Phenom II X4 965