Thursday, October 18, 2012

Linux kernel file I/O

I had a colleague tell me today that file I/O under Linux kernel is contrived. Let's examine...

One thing worth noting is that you'd want to avoid opening files by name within the kernel. The reason is that you don't want to be in the business of emulating the results of the open syscall, and dealing with the resulting bugs and security holes. The basic idea is to open the file in user space, and pass the file descriptor to your kernel driver:
gfp_t gfp;
struct file *f = fget(fd_from_user_space);
if (!f)
  return -EBADF;

/*
 * Sanity checks. Check f->host->i_mode for
 * inode type, etc.
 */

/* No FS or IO operations. */
old_gfp = mapping_gfp_mask(f->f_mapping);
mapping_set_gfp_mask(f->f_mapping, gfp & ~(__GFP_IO|__GFP_FS));
...
/* do stuff with file */
...
mapping_set_gfp_mask(f->f_mapping, gfp);
fput(f);
Once you have a struct file you can do I/O. Here's basic reading:
u8 *buf = ...; /* kernel buffer */
const int len = ...; /* buffer length */
struct file *f = ...;
loff_t pos = ...;    /* pos within f */

ssize_t bytes_read;
mm_segment_t old_fs = get_fs();

/* f_op->read expects user space buffer, so we need to disable the checks */
set_fs(get_ds());
bytes_read = vfs_read(f, buf, len, &pos);
set_fs(old_fs);
All synchronous of course. Writing is the same, expect, of course, for calling vfs_write. What if you want to perform transformations on the data as it's being read in? Then you need to use the splice API. Something like the following. You might be wondering what's the point of going about it in just such a fashion. In the example below I'm reading into a kernel buffer. If this was reading into a (a user) buffer described by a scatter gather structure (like a BIO) things would me slightly more involved (see drivers/block/loop.c for a good example).
struct actor_data {
  u8 *buf;
  unsigned offset;
};

static int
splice_actor(struct pipe_inode_info *pipe, struct pipe_buffer *buf, struct splice_desc *sd)
{
  struct actor_data *p = sd->u.data;
  u8 *from = kmap_atomic(buf->page, KM_USER0) + buf->offset;
  u8 *to = p->buf + p->offset;

  /* Not doing anything special here. */
  memcpy(to, from, sd->len);

  kunmap_atomic(from, KM_USER0);
  cond_resched();

  p->offset += sd->len;
  return sd->len;
}

static int                                                                                                                                            
direct_splice_actor(struct pipe_inode_info *pipe, struct splice_desc *sd)                                                                          
{
  return __splice_from_pipe(pipe, sd, splice_actor);
}

struct actor_data cookie;
struct splice_desc sd;

int status;
struct file *file = ...;
u8 *buf = ...;    /* kernel buffer */
size_t len = ...; /* buffer length */
loff_t pos = ...; /* pos within file */

cookie.buf = buf;
cookie.offset = 0;

sd.len = 0;
sd.total_len = len;
sd.flags = 0;
sd.pos = pos;
sd.u.data = &cookie;

status = splice_direct_to_actor(file, &sd, direct_splice_actor);
Asynchronous I/O is it's own set of calls, file->f_op->aio_read and file->f_op->aio_write and associated KIOCB routines. I'm getting lazy now, and the best example is do_sync_read within fs/read_write.c anyway.

No comments:

Post a Comment