Rob Landley (landley) wrote,
Rob Landley
landley

Making CIFS work in a container, part 2.

In fs/cifs/connect.c the function ipv4_connect() calls sock_create_kern(). That's defined in net/socket.c as:

int sock_create_kern(int family, int type, int protocol, struct socket **res)
{
        return __sock_create(&init_net, family, type, protocol, res, 1);
}


And there's the init_net we need to replace.

The __sock_create() function is export_symbol()ed, and it gets called from a few other places (such as net/sunrpc/xprtsock.c and net/sunrpc/svcsock.c). That's not nearly as many places as sock_create_kern() is called from, but since the init_net and the trailing 1 are the only things this wrapper provides (I.E. it should probably be an inline), calling it directly shouldn't be too bad.

Note that sock_create_kern() is defined right under the userspace equivalent:

int sock_create(int family, int type, int protocol, struct socket **res)
{
        return __sock_create(current->nsproxy->net_ns, family, type, protocol, res, 0);
}
EXPORT_SYMBOL(sock_create);


And _that_ is querying current->nsproxy->net_ns to supply the "net" argument. "current" is:

./asm-generic/current.h:#define get_current() (current_thread_info()->task)
./asm-generic/current.h:#define current get_current()


Which drills down to the network namespace belonging to the current task.

However, reconnects mean we can't blindly substitute one for the other. Although the initial mount has to happen on behalf of a specific process (and thus in process context), if the connection breaks due to a server reboot or masquerading timeout or something, the cifs code will dial out to the server again and reconnect (see connect.c function cifs_reconnect(). This is a _windows_ network filesystem after all, repeated inexplicable total failure is normal and expected, and it just retries).

Those reconnects aren't in the same process context (possibly not in any process context). So we need to cache the network namespace during the initial mount, which means we need to do some kind of reference count.

(Note: cifs_find_tcp_session() might also be impacted, needing to match a net context in order to consider it a match.)

So, net/core/net_namespace.c has get_net(), and I can call that on current->nsproxy->net_ns from cifs_mount(). (Assuming that cifs_mount() always has to be called from process context, which seems likely.) Then call put_net() from umount. Except it has to go in a structure, and TCP_Server_Info is where the IP address currently lives so that's probably where to put the net context.

In cifs_mount(), TCP_Server_Info comes from cifs_get_tcp_session() which calls cifs_find_tcp_session() which increments the srv_count field of TCP_Server_Info if it finds an existing one. If there isn't an existing one, cifs_get_tcp_session() falls through to code that allocates a new one. So TCP_Server_Info needs to grow a ->net field, cifs_find_tcp_session() needs to compare that field to get a match, and cifs_get_tcp_session() needs to assign get_net(current->nsproxy->net_ns) to the net field. Then modify ipv4_net() and ipv6_net()
to use __sock_create() instead of sock_create_kern().

Grrr... I want to add an #ifdef CONFIG_NET_NS around the addition of the net field to TCP_Server_Info (embedded developer, bloat bad!), but adding #ifdefs to kernel code is considered bad form. Ok, so I need to add a couple of static inline functions to the header to guard access to this field. (Fiddliest bit of the patch.)

But hey, I have a patch now! No idea if it builds or works, and my laptop battery's almost out, but the first pass 'tis written! (And there was much rejoicing.)
Tags: dullboy
Subscribe
  • Post a new comment

    Error

    default userpic

    Your reply will be screened

    Your IP address will be recorded 

    When you submit the form an invisible reCAPTCHA check will be performed.
    You must follow the Privacy Policy and Google Terms of use.
  • 0 comments