Rob Landley (landley) wrote,
Rob Landley
landley

  • Location:
  • Mood:

Drinking the kool-aid involves drinking the water.

The amount of time it took my digestive system and ambient russian bacteria to really get to know each other seems to have been about three days. (Not feeling at all well.)

Fun with network routing in kvm and containers. There are several different ways to do it, and none of them quite work yet.

There are three levels of OS here, my laptop (physical hardware, running Ubuntu), a KVM instance (hosted on the laptop, running Debian sid plus the bfields-git kernel from the NFS maintainer), and the containers (hosted by the KVM debian sid, running a simple busybox filesystem). So laptop running KVM running containers. The advantage of this is I can trivially reboot the KVM to update the kernel without losing the working context on my laptop (and the 8 gazillion firefox tabs, open terminals, mail client...)

I'm trying to make NFS mounts work inside a container, meaning I need a test environment where the container's network is different from the host (KVM) network. In theory I should be able to get it where "wget" works but "mount -t nfs" does not, and then I can use that to test kernel changes to fix it. (The userspace networking is aware of multiple net contexts, but the in-kernel NFS stuff is using a single static context for everything. That's what I need to change.)

My first attempt was to set up kvm with the normal "-net user" stack, which hooks the virtual KVM e1000 device to a virtual 10.0.2.x lan with a virtual masquerading gateway. I added a port forwarding line (kvm -redir) to make the laptop's "127.0.0.1:9876" connect through to the KVM's "10.0.2.15:22" so I could ssh through the virtual masquerading gateway into the KVM system, and have multiple terminal sessions where cut and paste works. Then inside KVM, I used the macvlan stuff to route ethernet packets to the container.

Unfortunately, the -redir thing apparently doesn't set the destination mac address of the packets forwarded from loopback. Meaning as soon as I launched the container and the macvlan subsystem activated and started routing packets to the kvm's host interface or the container's virtual interface based on mac address... my ssh session froze. Inconvenient, that.

So next I tried ethernet bridging, which basically had the same problem. It's filtering by mac address and qemu's -redir apparently isn't setting the mac address (or isn't doing it right, somehow -- note that the loopback address it's forwarding from doesn't _have_ a mac address. It's just the TCP/IP level not the ethernet level).

Next up, I created two network devices and had them both hooked up to the --net user masquerading lan, so the first one DHCP'd 10.0.2.15 and the second 10.0.2.16. I then told the lxc-create stuff I wanted to move eth1 into the container. This turns out to be a bit incomplete: the interface is still called eth1 (with no eth0) inside the container, and no obvious lxc syntax to rename it. (I can probably do so by running "ip link name" inside the container, although I have no idea if the busybox version supports this yet.) Worse, when I kill the container the eth1 interface vanishes from the system. In the kernel, it gets unregistered instead of reinserted in the host context, and although you can get it back by removing and reinserting the e1000 module, A) this would screw up the other network interface which is also using the e1000 driver, and B) my drivers are statically linked in so I can easily swap out kernels for testing without having to loopback mount my Debian sid ext3 image and copy modules into it each time.

Worse, having two kvm network devices both hooked up to the same virtual gateway means they have the same view of the network and the laptop host can't distinguish them, so I can't use routing rules to block eth0's access to the NFS server because as far as the laptop's concerned it's all just userspace connections originating from the same KVM process. Oh, and for some reason when I do a wget from within the container, the ssh sessions freeze for a few minutes anyway. (They unfreeze again later, but I have no idea what's going on here and don't want to go down the rathole of trying to debug another tangent just now. I think it's eth0/eth1 entanglement within the same -net user and thus most likely a KVM bug not a containers bug, but that's just a guess and even if I did fix that the previous problem is still there.)

So, on to the next approach: creating tun/tap devices for KVM to plug its eth0 and eth1 into. The problem here is that tun/tap's userspace control stuff was created for the User Mode Linux project, which has been badly obsoleted by KVM and is half dead now. What documentation there was for tunctl was a web page on the user mode linux sourceforge page, and although the debian guys created a tunctl man page based on it, they just blindly copied what was in the HTML page. (Note that the URL the man page gives for the original HTML HOWTO page is 404, because the entire UML website moved into an "old" subdirectory.)

I can't even find a clear reference explaining what "tun" and "tap" stand for, or the difference between them. (One is the user-side connection to a process via /dev/net and the other is the network interface plugged into the system, but I have to stop and think which is which and have no idea why they're named differently.)

I _think_ what you do is keep the first (eth0) device the way it is (-net nic,e1000 -net user -redir tcp:9876::22, create a tap device on the laptop (tunctl -u landley), plug KVM into it by appending "-net nic,e1000 -net tap" to the command line... and then figure out why kvm keeps complaining "could not configure /dev/net/tun: no virtual network emulation". I've got a /dev/net/tun and it's world read/writeable. Presumably if the module wasn't loaded the device wouldn't be there, unless udev is being crazy again...
Tags: dullboy
Subscribe
  • Post a new comment

    Error

    default userpic

    Your reply will be screened

    Your IP address will be recorded 

    When you submit the form an invisible reCAPTCHA check will be performed.
    You must follow the Privacy Policy and Google Terms of use.
  • 7 comments