Rob Landley (landley) wrote,
Rob Landley

  • Location:
  • Mood:
  • Music:

Venting about NFS, party elevnty jillion and one.

The design of NFS is utterly horrible. It's a "stateless filesystem server", which is a contradiction in terms because the entire point of a filesystem is to record state.

What they were trying for was scalability, so the server didn't have to keep track of clients. (This meant the clients had to keep lots of extra data about the server, much of it guesswork.

Anyway, the base idea was "remote procedure calls", which just wrapped the system calls used to access files and executed them on the remote server. (NFSv1, which never made it out of Sun, was basically that.) Then they tried piling on additional crap to make this actually work, without ever removing that base assumption that they're transposing system call contexts instead of coming up with an acutal coherent _protocol_.

NFS performance is terrible, so to try to get performance the added disk cache. The clients cache data locally, for reads, writes, and directory information, but NFS clients have no idea what other clients are doing. How do they deal with "cache coherency"? They don't. There's no guarantee your cached data is still accurate, _ever_. They try to substitute file locking instead, but really using NFS you shouldn't have two clients mess with the same file. (So how do you implement O_CREAT|O_EXCL? Hahahahah. Try this: Unix allows deleting files that are still open, but NFS can't wrap its little head around that idea, so it renames the file instead of deleting it as long as it's held open. So if you open a file, delete the directory entry, and then delete everything in that directory... your "rm -rf dir" _fails_. Welcome to NFS. Oh, and dentries get cached but just updating the timestamp on a file isn't necessarily considered worth writing back to the server, so calling "touch" on a file may get discarded. Keep that in mind if you're ever atttempted to use "make" on an NFS mount.)

NFS performance was so bad that they considered it vital to implement the NFS server in the kernel purely for speed reasons. Let's back up and think of web servers: way back when, people thought people needed those in the kernel too, and they implemented khttpd and the tux webserver and so on as kernel modules. But then Zeus came along and Apache got better, and suddenly you had 10,000 clients beating on a web server at once and it pretty much survived, and people realized that putting a webserver in the kernel was a really bad idea, so they yanked it out again. But knfsd remains, and makes khttpd look simple and well-designed and really VITALLY IMPORTANT in comparison.

NFS has three different major versions: v2, v3, and v4. NFSv2 is based on UDP (so a bunch of disassociated datagrams instead of actual streams). v3 can do udp or tcp. v4 is its own little world, and implemented as a separate device driver in Linux. (There's also a 4.1 but I can't bring myself to care.) v3 actualy made NFS suck slightly less, so most people upgraded. v4 made the protocol significantly more complicated without actually fixing most of the fundamental design problems, so a lot of people haven't upgraded yet although some have and the rest make noises about doing so for the same reason people moved from Windows 2000 to Windows XP to Vista Windows 7. "Because it's there", pretty much, and the old one was insecure and no longer necessarily quite maintained. (Yes, I am calling NFS the Windows of network filesystems. Samba actually manages to suck _less_ in measurable ways. We have Sun to blame for NFS, not Microsoft. There's a reason Sun went out of business.)

NFS is built on top of Sun RPC, formatted in XDR (a binary protocol Sun made up), and then the actual RPC transactions bounce between multiple different servers (these days mostly running on the same machine). There's a server to LOOK UP THE OTHER SERVERS. You know how half of sendmail is devoted to this archaic DEC email system that had to do with non-internet proprietary minicomputer LANS? NFS name resolution and mapping user IDs to names was based on sun "yellow pages", which is thankfully dead in a scar-tissue-and-spackle-over-the-hole sort of way.

And the cacheing. The cacheing.

Bear in mind closely that I did not see any actual visual horror at the end. To say that a mental shock was the cause of what I inferred - that last straw which sent me racing out of the lonely Akeley farmhouse and through the wild domed hills of Vermont in a commandeered motor at night - is to ignore the plainest facts of my final experience. Notwithstanding the deep things I saw and heard, and the admitted vividness the impression produced on me by these things, I cannot prove even now whether I was right or wrong in my hideous inference. For after all Akeley's disappearance establishes nothing.

It's evil. Luckily it looks like P9 will cut off its air supply for new deployments as soon as the remaining bits make it into the appropriate trees and ship. Unluckily, it's Cobol-class technology. Existing deployments have spent millions making steam-powered clockwork punch card technology suitable for their purposes and the very idea of replacing all that investment with a cheap ee-leck-trah-nic contraption that dies if you spill your coffee on it is ludicrous, ludicrous I tell you, and they won't stop throwing good money after bad until the old hands retire. (SCSI FOREVER, because whatever new thing gets invented they'll call it SCSI and force it to act like SCSI, and that's why the guys with butterfly nets and straightjackets had to take the cdrecord maintainer away for insisting on assigning controller/target/lun/slice addresses to ATA and USB devices).

Right, I feel better now.
Tags: dullboy
  • Post a new comment


    default userpic

    Your reply will be screened

    Your IP address will be recorded 

    When you submit the form an invisible reCAPTCHA check will be performed.
    You must follow the Privacy Policy and Google Terms of use.