Impact of ext4’s discard option on my SSD

Solid-state drives (SSDs) are seen as the future of mass storage by many. They are famous for their high performance: extremely low seek times, since there is no head that needs move to a position and then wait for the spinning disk to come around to where it needs to read/write; but also higher throughput of sequential data: My 2,5″ OCZ Vertex LE (100 GB) is rated at 235 MB/s sustained write speed, and read speeds up to 270 MB/s, for example.

There is a caveat though – quoting Wikipedia:

In SSDs, a write operation can be done on the page-level, but due to hardware limitations, erase commands always affect entire blocks. As a result, writing data to SSD media is very fast as long as empty pages can be used, but slows down considerably once previously written pages need to be overwritten. Since an erase of the cells in the page is needed before it can be written again, but only entire blocks can be erased, an overwrite will initiate a read-erase-modify-write cycle: the contents of the entire block have to be stored in cache before it is effectively erased on the flash medium, then the overwritten page is modified in the cache so the cached block is up to date, and only then is the entire block (with updated page) written to the flash medium. This phenomenon is known as write amplification.

So, SSDs are fast at writing, but only when their free space is neatly trimmed. The only component in your software stack that knows which parts of your SSD should be trimmed, is your file system. That is why there is a file system option in ext4 (my current file system of choice), called “discard”. When this option is active, space that is freed up in the file system is reported to the SSD immediately, and then the SSD does the trimming right away. This will make the next write to that part of the SSD as fast as expected. Obviously, trimming takes time – but how much time exactly? I wanted to find out, and did the following: I measured the time to unpack and then delete the kernel sources (36706 files amounting to 493 MB, which is what I call a big bunch of small files). I did it three times with and three times without the “discard” option, and then took the average of those three tries:

Without “discard” option:

  • Unpack: 1.21s
  • Sync: 1.66s (= 172 MB/s)
  • Delete: 0.47s
  • Sync: 0.17s

With “discard” option:

  • Unpack: 1.18s
  • Sync: 1.62s (= 176 MB/s)
  • Delete: 0.48s
  • Sync: 40.41s

So, with “discard” on, deleting a big bunch of small files is 64 times slower on my SSD. For those ~40 seconds any I/O is really slow, so that’s pretty much the time when you get a fresh cup of coffee, or waste time watching the mass storage activity LED.

Conclusion
Don’t enable the “discard” option if you have a similar SSD. A much better way to keep your free space neatly trimmed for good write speeds is, to trigger a complete walk over the file system’s free space, and tell the SSD to trim that all at once. And of course you would do that at times when you don’t actually want to use the system (e.g. in a nightly cron job, or with a script that gets launched during system shutdown). This can be done with the ‘fstrim’ command (that comes with util-linux), which takes around six minutes for my currently 60% filled 95 GB file system.

Update (2011-07-08): I forgot some details that may be interesting:

  • Kernel version: 2.6.39.2
  • SSD firmware version: 1.32
  • CPU: AMD Phenom II X4 965

My phone’s bash.profile

Here is some automation that I put into my bash.profile. It’s all done with aliases, since with regular shell scripts, I would first have to remount /mnt/sdcard without ‘noexec’. Like this, I just need to open a ConnectBot “Local” connection, type the alias, and press enter(*).

(*) For this to work, the ‘post login automation’ entry of the ConnectBot “Local” profile needs to have
bash --rcfile /sdcard/bash.profile
in it.

Here is what I have in there:

alias bb="busybox"
alias top="bb top"
alias df="bb df"

alias ll="ls -l"
alias n="su -c \"netstat -ntupl\""

alias backupdata="su -c \"rsync -rP --delete --numeric-ids --chmod=u+rwX --exclude Music /sdcard/ pat@192.168.0.2:/data/pat/g2sd/\""

alias postflash="echo \"Mounting /system read-write\" && su -c \"mount -o rw,remount /system\" && echo \"Copying modified keyboard layout files...\" && su -c \"cp /sdcard/vision-keypad-wwe.kcm.bin /system/usr/keychars/\" && su -c \"cp /sdcard/vision-keypad-wwe.kl /system/usr/keylayout/\" && echo \"Deleting awful camera click sound...\" && su -c \"rm /system/media/audio/ui/camera_click.ogg\" && sync && echo \"All done. Please reboot now.\""

clear
uptime
echo

The first couple of aliases should be self-explanatory.
backupdata is, as the name suggests, to get my SD card’s content to my home server.
postflash is for after flashing a new ROM (usually a CyanogenMod nightly build). It gets my modified keyboard layout into place, and deletes the terrible sound file that gets played when I take a photo with the phone’s camera.

emerge output ends up as attachment.bin when sent with nail / Heirloom mailx

I really like the command line mailer Heirloom mailx (formerly nail), and now there is even a current version in portage again (still under the name mail-client/nail, but that doesn’t matter), so that’s even better. I use it on all servers, since it’s just convenient – it can handle attached files, UTF-8 etc. without any problems.

But there was one problem that bothered me for months already: It involved my check_updates.sh script, which basically just calls /usr/bin/emerge -upvDN --nospinner world for the host and all virtual servers, and then sends the output to me.

The problem: emerge’s output always ended up as ‘attachment.bin’, attached to the (otherwise empty) mail, although I piped it into mail -s "Updates for $DATE" root where it should come out as the mail body. I knew that Heirloom mailx does that, as soon as it doesn’t “like” one of the characters in the input, but I couldn’t think of a reason why it would do that with emerge’s supposedly plain-ASCII output.

Today I had enough of it, and fired up hexdump to investigate said ‘attachment.bin’, using the following command:
hexdump -e '1/1 "%03d \n"' attachment.bin | sort -u
It outputs the unique decimal values of any byte occurring in ‘attachment.bin’ as a sorted list. I expected to find something above 127 – but the highest occurring value was 122 (“z”). I then checked the top part of the list, and to my surprise, found 008 (backspace) there. After removing those by piping the output through tr -d '\010' (8dec = 10oct, and tr needs octal values), Heirloom mailx no longer put the text into ‘attachment.bin’. It now appears in the mail’s body, where it belongs.

By the way, those backspaces (when interpreted) change
Calculating dependencies ... done!
to
Calculating dependencies... done!
… so removing them is not a big loss. I’d like to know though, why they are there in the first place, even though the output doesn’t go to a TTY.