fsync(2) on FreeBSD vs Linux
Even with our modern technology, hard-disk operations tend to be the slowest and most painful part of any modern system. As such, modern operations implement buffering mechanism. In this schema, when an application calls write(2), rather than immediately performing physical disk operations, the operating stores data in a kernel buffer. When the buffer exceeds a certain amount or the when an application falls the fsync(2) system call, the kernel begins writing to the disk.
This scheme is significantly faster, perhaps most demonstrably by the massive performance differential between the GNU vs BSD yes(1), as initially noted here. Note: FreeBSD’s yes(2) has now reached parity with GNU.
So far so good. But what happens when a disk write operation fails? This could be due to a hardware or network failure, but ultimately it is not the fault of the operating system. However, the operating must properly handle the failure.
On Linux, when an application’s fsync(2) call fails, the kernel returns a disk error. However, it then clears the buffer and properly sets the buffer as “dirty” (EIO flag). When the application issues another fsync(2) and the disk succeeds, the kernel clears the error bit, and reports a successful write to the application. As such the previously failed data never hit the disk and, if discarded by the application, the data was lost.
On FreeBSD, when an application’s fsync(2) call fails, the kernel also returns an error. Similar to Linux, it also reports the error to the application. But unlike with Linux, it maintains the “dirty” bit, thus not re-writing over the kernel buffer, until the page buffer is cleared, even if the successive fsync(2) is successful. This way, the page data is not lost.
This is another example of the superiority of FreeBSD over Linux. FreeBSD can better survive a disk failure, while Linux’s implementation is fundamentally broken. In the past I have experienced Linux’s ext4 fail into read-only mode to prevent disk corruption. While that might be a fall-back mechanism, it is not a long-term solution. Instead, userland applications have to keep track of whether the kernel was successful or not. Depending on your perspective, this is a stack violation.
Additionally, any long-term solution to change the behavior of the operating system would mean all user-land applications would potentially break. Linus Torvalds has notoriously stated:
Breaking user programs simply isn't acceptable
In fact, he’s repeated this policy in more colorful language here. So you’re stuck with bad behavior.
Now consider if you want to build an operating system that will run for potentially a hundred years and produce zero errors or catch errors and properly perform exception handling. Go with FreeBSD.
Its worth noting that Illumos (Solaris) properly implements fsync(2), whereas OpenBSD and NetBSD also failed on this issue and I fully anticipate them to fix the problem.