Rob Landley (landley) wrote,
Rob Landley

  • Location:
  • Mood:
  • Music:

The learning curve is vertical. I have crampons.

So I spent a couple weeks learning about containers until I actually got them to work, now I'm learning about the implementation of NFS. I plan to post at length about NFS (possibly later today), but first I'm trying to finish out the container stuff by documenting it and scripting its reproduction sequence. (After all, you don't really understand something until you try to teach it. And it's not science unless you can reproduce it.)

Unfortunately, "got it to work once" is not the same as "understand it" or "reliably reproduce it", especially when dealing with a package as brittle and overdesigned as LXC.

I'd very much like to be _done_ with the LXC package and move on, but it's kind of a horrible overdesigned mess full of assumptions.

The lxc-create command will let you skip -f (the device configuration file, basically specifying what
network devices to insert into your container), but if you do it gives you a full screen of text about
how it assumes you know what you're doing, and then waits for you to press a key. I.E. they go out of their
way to handhold you and screw up scripting. (It's not often you see software actually be patronizing to its users. Not open source, anyway.)

The way you launch a process in a container is lxc-start (to run /sbin/init), or lxc-execute (to run an arbitrary command). Once a container's been started, lxc-console will give you another shell prompt in it (after prompting you to log in).

The problem is, this doesn't actually work. The /dev/console and /dev/tty emulation is incredibly buggy. Both lxc-start and lxc-execute die with an I/O error the first time you hit a key. It has very specific assumptions about which /dev/tty devices get opened in what order, and if the code doesn't open them (init=/bin/sh using the existing stdin/stdout/stderr) or if one of them gets opened twice (having init be a shell script that execs /bin/sh < /dev/tty1), you get the I/O error.

The only way I've ever gotten a stable shell prompt via lxc is to lxc-start in one console (which is afterwards unusable due to a different console bug), and then run lxc-console in a second one. The second one gives you a usable shell prompt. That's the one and only magic procedure that's worked so far (as told to me by one of hte russian engineers at Parallels), and even that's really really brittle. (Note that lxc-execute and lxc-start compete for the same ecological niche. If you try to lxc-execute in a started container it dies with "Device or resource busy - unable to remove previous cgroup".)

By "brittle" I mean that to make the lxc-start+lxc-console thing work, you have to build busybox from source. The lxc busybox "template" assumes sbin/init should point to busybox but the prebuilt debian "busybox" package isn't configured to include init. The "busybox-static" package does, which allows lxc-start to humor the insane assumptions of the console code, but then when the lxc-connect command tries to run login it dies with "getty: tty1: can't exec /bin/login" which is just _weird_. (Works fine if I build busybox from source, but not using the busybox-static from debian sid. I don't know why, and strace won't drill through lxc-console.)

So if you want to use their busybox template, you have to rebuild busybox from source. (Or try to replace it with a /bin/sh wrapper script, which doesn't work as explained above. Something init is doing is magic, and the console handling code expects that magic.)

Sigh. The thing about writing documentation is if you do it right, the end result looks easy and obvious. What you don't show is the 90 different ways you tried that _didn't_ work to come up with the one that _did_. I've been resisting digging into the LXC source to try to fix this because I'm _supposed_ to be focusing on NFS and trying to close down this topic rather than open a new tangent...

Oh well, document how to build busybox from source, I suppose. And how to hack the resulting image so /etc/passwd accepts no password for root instead of being impossible for lxc-console to log into, which is itself disgusting because the default install path when you build lxc from source is /usr/local/var/lib/lxc and the path in the lxc debian sid package is /var/lib/lxc, but I guess I document the second since my instructions don't tell you to build it from source...

(If I understood this better, it would probably be easier to write a tool from scratch that didn't suck than try to fix LXC. The design assumptions are big iron all the way, not unixy in the slightest...)
Tags: dullboy
  • Post a new comment


    default userpic

    Your reply will be screened

    Your IP address will be recorded 

    When you submit the form an invisible reCAPTCHA check will be performed.
    You must follow the Privacy Policy and Google Terms of use.