The industry quietly achieved an important milestone last week with the releases of MariaDB 5.5.31 and Percona Server 5.5.31-30.3. These two popular MySQL forks became the first enterprise applications to ship with a flag enabling Atomic Writes to replace standard writes.
Why does this matter? Applications and file systems have written thousands of lines of application-specific journaling code over the years to work around the absence of Atomic Writes. To illustrate, let’s look at a simple example from the file system world. When data is appended to a file, two kinds of updates need to be made to disk; (1) writing the appended data itself, and (2) updating the metadata (inode) which keeps track of where a file’s data blocks live. But what happens if only part of these two updates make it to disk before a system crash? Data will be lost and/or the file will be corrupt.
The Birth of Journaling
Hence the birth of various forms of journaling. Through journaling, the file system first writes all the updates to a special scratch-pad on disk. If the system crashes before the updates are completed to the scratch pad, no worries. The partially-written journal entry is just ignored, and life is good. Otherwise, once the system acknowledges that the journal entry has made it safely to disk, the file system proceeds to make the updates to the file’s actual data and metadata. If the system crashes during this second round of updates, no worries, as the journal entry contains the instructions for how to fix the partially updated file.
And so this is the way things have been done for decades—because disks don’t have extra intelligence. But what is the cost of writing everything twice? With exhaustible NAND, how many Program/Erase cycles while are consumed by writing things to NAND twice? And what is the impact to application latency and throughput?
Enter Atomic Writes
Nearing a final vote in the international SCSI standards working group (INCITS T10), Atomic Writes will soon provide the broader industry with a simple, yet powerful new write interface to storage. Returning to the example above, by replacing several write calls with an atomic write, both the data updates and the metadata updates will be sent to disk in a single all-or-nothing envelope. The cost of ensuring that all of the updates are correctly made to disk—or none of them are—is minimal, as smart flash cards of today already have the ingredients to implement this within their log structured flash translation layers.
Flash devices have always been smarter than disks. The time has come for applications to exploit it.