Log in

No account? Create an account
The Conversation Pit [entries|archive|friends|userinfo]
Rob Landley

[ website | My Website ]
[ userinfo | livejournal userinfo ]
[ archive | journal archive ]

More fun with containers. [Dec. 22nd, 2010|04:54 am]
Rob Landley
[Current Location |Russian Federation, Долгопрудный]
[mood |jetlagged]
[music |Would be nice.]

Hanging out in Russia (the land of truly enormous spoons) with the engineers at Parallels' main office in Moscow, trying to absorb giant info-dumps and get over jetlag at the same time. They have an energy drink called "Adrenaline Rush" here, which is not bad.

Calling up google.com redirects to google.ru here, with no obvious way to make it not do that. (The non-obvious way to make it not do that turns out to be to visit google.com/en instead of google.com.)

Buying a local sim card for one's phone is apparently the same category of ordeal it is in the US, only with more paperwork and a language barrier. Decided to hold off until _next_ trip for that.

I wonder how you preview to see if you got the lj-cut tag right?

There are several different Linux contianers projects, but the one that made it into the vanilla kernel is loosely based on Google's cgroups stuff. The openvz one is most extensive feature-wise, but its userspace control knobs (the vzctl package) are based on system calls and ioctls that are only in the openvz kernel, not in vanilla. The new project started to add controls for what's in vanilla is the LXC package, which is full of rough edges and coded in the worst IBM "infrastructure in search of a user" style. (Maybe I'm spoiled by embedded development, where unused code is ruthlessly removed. Writing code you don't actually need and aren't currently using, just in case somebody someday might, is considered a BAD thing there.)

So I'm using the LXC package to set up a containers test environment, which is kind of painful. LXC's development history seems to consist largely of patches like this which literally do nothing but make the code bigger and more convoluted. They insist on needing capabilities but their scripts check to make sure they're running as root. You have to read through a huge amount of code to find the parts that are actually _doing_ anything, rather than just wrappers passing data around between themselves and checking that the previous wrapper layer agreed with the current wrapper layer about the format of said data. So when the code does something wrong, finding the relevant bits is the kind of expedition you bring Indiana Jones along for.

Anyway, I built a containers-enabled kernel, built a debian sid chroot and made an 8 gig ext3 image out of it, launched the two of them under kvm, fetched the lxc source and built it inside the kvm system. Then I built a defconfig busybox 1.18.1, and used lxc's bsybox "template" (which is the term they invented for a chroot creation script... did I mention these guys love making up new terminology for no apparent reason?) to create a busybox container:

lxc-create -n 12345 -f doc/examples/lxc-macvlan.conf -t busybox

Using a statically linked busybox on the theory that I'm very familiar with it and it's probably the simplest root filesystem you can do, thus a good place to start.

And I launched said container:

lxc-start -n 12345

And it SORT of worked. Except that the shell prompt I got echoes all the characters I type back at me, but only listens to at most two characters typed per second. If you don't put a half second delay between characters you type, they get dropped.

Another fun detail is that only works the first time you start a container (after creating it). The second time, it fails with:

Failed to symlink /dev/pts/ptmx -> /dev/ptmx file exists.

And as soon as that happens, the host kernel starts spamming the console with "unregister_netdevice: waiting for eth0 to become free. Usage count = 1" and this will continue forever until you kill kvm and restart the Debian Sid host.

Let's recap: I have my laptop (running Ubuntu) running a KVM with a guest system based on Debian Sid, and inside the Debian system I'm playing around with LXC and containers to run a busybox root filesystem. That's two virtualization layers for three systems total: Ubuntu, Sid, Busybox. The second time I try to launch the busybox system, the Sid system gets confused and starts printk() spam to the KVM console that I can't stop until I restart KVM.

Hmmm, the unregister_netdevice thing is fixed by using the vanilla kernel instead of the nfs -git branch. So that's one bug down, a half-dozen to go, and then I can start paying attention to the part all this is supposedly just a test environment for....


[User Picture]From: landley
2010-12-22 11:57 am (UTC)
It needed CONFIG_DEVPTS_MULTIPLE_INSTANCES enabled in the kernel. When that wasn't there, lxc failed with some strange error messages. (The ptmx symlink thing was the error recovery path from failing ot mount devpts. Not a lot of testing on the error recovery paths, it seems.)
(Reply) (Thread)
From: (Anonymous)
2010-12-22 12:46 pm (UTC)


try www.google.com/ncr
stands for no country recognition.
(Reply) (Thread)
[User Picture]From: landley
2010-12-22 01:25 pm (UTC)

Re: google

Oh, cool! Thanks.

That seems to have set some kind of cookie that's persistently not bouncing me to google.ru. Exactly what I wanted.
(Reply) (Parent) (Thread)
[User Picture]From: landley
2010-12-22 01:23 pm (UTC)
Added -redir tcp:9876::22 to the kvm command line and fired up dropbear inside kvm so I could ssh into the kvm system from the ubuntu host, thus getting multiple terminals and not constantly having to restart kvm when the tty became unusable.

Found out that launching lxc-start from the ssh session meant everything I typed was completely ignored. This implies to me that lxc-start is reading form the _host_ /dev/tty, not the one inside the container.

But there is a fix! Run lxc-start in a terminal you don't care about, and then run the "lxc-console -n 12345" command from another terminal, and THEN you get a usable terminal inside the container!

(Did I mention that this LXC package thing is gigantically brittle?)
(Reply) (Thread)
[User Picture]From: landley
2010-12-22 02:11 pm (UTC)
The instant I launch a container, the KVM ethernet stops working. The macvlan stuff is screwing up kvm -redir somehow, possibly the packets aren't being properly labeled with the virtual ethernet address of the KVM eth0 device, and are thus being discarded when the macvlan driver tries to figure out whether they're intended for the KVM host's eth0 or the container target's eth0.

This means my ssh sessions stop working.

So, I lxc-destroy my macvlan container and rebuilt using the veth template instead and that refuses to launch because the kernel KVM is running doesn't have CONFIG_VETH set. Then it failed because I didn't have CONFIG_BRIDGE enabled. And I'd heard previous mention of ebtables (which doesn't show up at all unless you enable CONFIG_NETFILTER_ADVANCED and has over a dozen sub-options I switched all of on).

I then had to run "brctl addbr br0" on the kvm system to create a bridge device.

So now it launches, and my ssh sessions don't stop working... and dhcp in the container doesn't work either. The container's virtual network device and the host's network device aren't connected enough for dhcp to work. But doing "brctl addif br0 eth0" immediately froze my ssh session. (Doing brctl delif br0 eth0 from the kvm console got it back.)

Hmmm... Maybe I need to read linuxjournal's article on ethernet bridging...
(Reply) (Thread)
[User Picture]From: k001
2010-12-22 04:25 pm (UTC)
> I wonder how you preview to see if you got the lj-cut tag right?

AFAIK -- no way :)

> is loosely based on Google's cgroups stuff

Well, you mix apples and oranges here.

CGroups is just a mechanism for grouping processes together and applying various resource controllers to these groups. In short, CGroups are there for the resource management, and (same as with user beancounters) this stuff can be used with or without containers. Google, for example, uses cgroups but they don't use containers (AFAIK).

And containers, on the other side, can live without any resource management (and therefore CGroups). They would step on each others toes, but it's a different story.

(Reply) (Thread)
[User Picture]From: cathyr19355
2010-12-23 02:39 am (UTC)
I wonder how you preview to see if you got the lj-cut tag right?

k001 is right; you can't. The best you can do is post it, then delete it if you've screwed it up and try again (make sure you've saved the text someplace first so you can paste it into a new post). I usually look up the directions on how to do an lj-cut before I try one, or I use an old post where I got it right as a model.
(Reply) (Thread)