Log in

I need more test cases. - The Conversation Pit [entries|archive|friends|userinfo]
Rob Landley

[ website | My Website ]
[ userinfo | livejournal userinfo ]
[ archive | journal archive ]

I need more test cases. [Mar. 24th, 2011|09:20 am]
Rob Landley
[mood |frustratedfrustrated]

I've either run into a weird subtle bug in the kernel, or a weird subtle bug in kvm, and I can't tell which it is.

When I set up the "two meanings for the same IP" routing, mounting NFS inside the container (via tun/tap eth1) makes that address say "no route to host" outside the container (via -net user eth0). The two should be orthogonal, but something's getting interfered with.

I can reproduce it with a kvm "-net user" interface and a tun/tap interface, but I can't reproduce it with two tun/tap interfaces attached to kvm. I can reproduce it with nfs access in the emulated kernel, but not from userspace.

Except those previous two statements conflict. KVM doesn't know anything about userspace vs kernel space in the emulated kernel, and the kernel doesn't know about differing virtual implementations behind the e1000 emulated hardware interface.

It's hard to debug a problem with the -net user interface because I can't ping or tracepath through it, so when it's failing to connect I dunno _why_. Which is why I switched eth0 to be another tun/tap interface and tried to replicate the bug there, but so far I can't. Except if the bug _is_ in qemu's -net user, I should be able to reproduce it from userspace in the emulated kernel. KVM has know way of knowing if packets come from userspace or from kernel space inside the emulated system.

Grrr. The worst kind of debugging issue is "I changed something irrelevant and the problem went away". THAT'S NOT HOW DEBUGGING WORKS. You find out what was wrong and fix it, or it resurfaces to bite you again later.

Hmmm, maybe I can upgrade qemu (switch from kvm to qemu and build from source via current qemu-git repository) and see if _that_ makes the user+tap problem go away. Fixing it via upgrading qemu is reasonably strong evidence it was a bug in qemu, and if so it's orthogonal to the NFS patch (and probably fixed upstream already anyway, ubuntu's kvm is a bit old and ubuntu has a history of breaking qemu anyway)...