Rob Landley (landley) wrote,
Rob Landley

Design issues... ok, ranting about NFS some more.

So at the design level, the problem is... Ok, the fundamental design problem is "NFS". I need to rant about that for a bit before moving on. I've been reading through the nfs code (on and off) for weeks now, which is an unsettling experience because I STILL don't understand what its designers were smoking.

The NFS design involves the actual transactions between server and client happening in terms of Sun Microsystems Remote Procedure Calls. This is a horrible thing that Sun offered as a standard back in the 1980's (see RFCs 1014 and 1057) and nobody else ever was insane enough to use for anything ever, but sun built NFS on it anyway. NFS beat out various superior technologies (basically everything else) because Sun gave it away for free. (That's right, they open sourced it back in the 80's. Sun started out much less crazy than it became.) The thing you could get for free beat the thing you couldn't get for free, and now it's a legacy technology we're stuck with.

This means that if you're trying to track the network transactions, looking at fs/nfs in the Linux source code isn't even half the story, you have to look at net/sunrpc and include/linux/sunrpc. And the problem _there_ is that there's not just an NFS filesystem (client) in the kernel, there's also an NFS _server_ in there, and the RPC code is incestuously shared by both. So figuring out what bits the mount code actually uses pretty much requires understanding how large chunks of the server works just so you can ELIMINATE those parts. (I've already ranted about how khttpd and the tux web server proved fairly conclusively that putting a server in kernel space is a BAD IDEA. Well putting a server and client for the same thing in the kernel and then trying to make them share code is a worse idea.)

As for there being three different versions of the NFS protocol (v2, v3, and v4), the v2 and v3 implementations more or less share code, for a definition of "share" that involves a lot of interleaving, but at least it's in more or less the same place. The v4 implementation does not: it has its own .c files and huge #ifdef blocks in the headers and such. But it's not cleanly separated either, you can't IGNORE it while looking at just v3. The problem is really that it's several times times the size of the v2/v3 implementation combined. It's this huge, bloated, complicated THING. Nobody actually seems to use the v4 code out in the wild, but you still have to eliminate a lot of suprious noise it throws up via grep (for example, is linux/rxrpc.h relevant? The file is copyright 2007 so probably not. No, grepping for "v4" to try to eliminate stuff doesn't help both because of #ifdef stanzas in files that do involve v3, but also becuase of IPv4 vs IPv6 provides a lot of hits).

There are implementation flaws on top of this. (Tons! Strange incestuous cacheing, scar tissue from changes over the years that never got cleaned up after, insane reimplementation of things that the VFS layer does for other filesystems but THIS one goes out of its way to do by hand in slightly different ways...) But the _design_ flaws make it hard to plow through to start with.

Right, try again next time.
Tags: dullboy
  • Post a new comment


    default userpic

    Your reply will be screened

    Your IP address will be recorded 

    When you submit the form an invisible reCAPTCHA check will be performed.
    You must follow the Privacy Policy and Google Terms of use.