Rob Landley (landley) wrote,
Rob Landley

  • Mood:

I need more test cases.

I've either run into a weird subtle bug in the kernel, or a weird subtle bug in kvm, and I can't tell which it is.

When I set up the "two meanings for the same IP" routing, mounting NFS inside the container (via tun/tap eth1) makes that address say "no route to host" outside the container (via -net user eth0). The two should be orthogonal, but something's getting interfered with.

I can reproduce it with a kvm "-net user" interface and a tun/tap interface, but I can't reproduce it with two tun/tap interfaces attached to kvm. I can reproduce it with nfs access in the emulated kernel, but not from userspace.

Except those previous two statements conflict. KVM doesn't know anything about userspace vs kernel space in the emulated kernel, and the kernel doesn't know about differing virtual implementations behind the e1000 emulated hardware interface.

It's hard to debug a problem with the -net user interface because I can't ping or tracepath through it, so when it's failing to connect I dunno _why_. Which is why I switched eth0 to be another tun/tap interface and tried to replicate the bug there, but so far I can't. Except if the bug _is_ in qemu's -net user, I should be able to reproduce it from userspace in the emulated kernel. KVM has know way of knowing if packets come from userspace or from kernel space inside the emulated system.

Grrr. The worst kind of debugging issue is "I changed something irrelevant and the problem went away". THAT'S NOT HOW DEBUGGING WORKS. You find out what was wrong and fix it, or it resurfaces to bite you again later.

Hmmm, maybe I can upgrade qemu (switch from kvm to qemu and build from source via current qemu-git repository) and see if _that_ makes the user+tap problem go away. Fixing it via upgrading qemu is reasonably strong evidence it was a bug in qemu, and if so it's orthogonal to the NFS patch (and probably fixed upstream already anyway, ubuntu's kvm is a bit old and ubuntu has a history of breaking qemu anyway)...
Tags: dullboy

  • todo list collating.

    My todo list has once again exploded to the point where everything is distracting me from everything else and I'm forgetting what my todo items ARE,…

  • Yay code review. NFS Lifetime rules are still brain-bending.

    Ok, found a workaround for the linux-2.6.39-rc1 hang that Jens Axboe's been distracted from solving for a couple weeks: disable preemption. So I can…

  • Back from a week in Moscow.

    So I got my NFSv3 containerization patches submitted. There are three of them for the basic network namespace support for NFSv3 in what's probably…

  • Post a new comment


    default userpic

    Your reply will be screened

    Your IP address will be recorded 

    When you submit the form an invisible reCAPTCHA check will be performed.
    You must follow the Privacy Policy and Google Terms of use.