March 14th, 2011

Wrestling with the curse of NFS.

So, with two patches, mounting NFS from within a container (read only) is working fine... but as soon as I do it, the host context can't route to that address anymore. From an entirely different network context, with wget. I can PING it from the host, and the container still works fine but attempts to actually contact it say "no route to host".

It... how... It's completely unrelated. It's not the same container, it's not the same network interface, the routing tables still look fine, AND I CAN PING. HOW DID THAT BREAK?

So now I've got to rip apart the network stack and see what it's doing. Figure out what exactly is going _wong_, and backtrack from there to identify which of the 8 gazillion strange and inadvisable things NFS is doing is triggering this behavior.

I despise NFS.


If I do a wget from the container, it works again. Doing an NFS mount horks the host's network context, and opening a normal network socket from the container's userspace fixes it.

This is epic levels of weird here. 100% reproducible. It happened in 2.6.37 and it's happening in current -git (darn near 2.6.38)...

Weeeeird... The funny thing is I'm ssh'ed into the box on the interface that's broken, and it continues to work fine while "broken", but I can't make _new_ network connections (either incoming or outgoing, I can't ssh a fresh session into the box) while it's screwed up. It's opening sockets, not sending packets, that's failing... Hmmm...
  • Current Mood
    nauseated nauseated
  • Tags