Linux /proc and seq_file
create_proc_entry removed in Linux kernel 3.10


DRAFT

The create_proc_entry API has been removed from Linux kernel 3.10. Every /proc coding tutorial I’ve seen mentions this function, so I expect this to bite a few people (even though the replacement function proc_create_data has been available for 5+ years). It’s good that this went away, because it helps force buggy code into the light (to be squashed) as I experienced recently.

Races in create_proc_entry

create_proc_entry has some inherent races. The newly created proc entry is live as soon as the function returns. Yet the file operation functions (read, write, etc) and data pointer associated with it can only be set after the function returns the struct proc_dir_entry *. proc_create_data, by contrast, completely initializes the proc_dir_entry before making it live.

Converting to create_proc_data

Suppose you have nasty old code that looks like this:

static int my_proc_read(char *page, char **start, off_t off, int count, int *eof, void *data)
{
    struct my_type *t = (struct my_type *)data;

    if (!data)
        return 0;  /* nevermind; was a race during creation */
    ...
    return len;
}

void my_init(void)
{
    struct proc_dir_entry *entry = create_proc_entry("config", 0644, my_dir);
    entry->data = my_context;
    entry->read_proc = my_proc_read;
}

A quick attempt to update might yield this, which won’t even compile:

static int my_proc_read(char *page, char **start, off_t off, int count, int *eof, void *data)
{
    struct my_type *t = (struct my_type *)data;

    ...
    return len;
}

static const struct file_operations my_proc_fops = {
    .owner = THIS_MODULE,
    .read = my_proc_read,
};

void my_init(void)
{
    struct proc_dir_entry *entry = proc_create_data("config", 0644, my_dir, &my_proc_fops, my_context);
}

The problem is that a struct file_operations has a read member, not a read_proc. All the extra parameters to my_proc_read are to help build up the complete output, potentially spread across multiple calls. Userspace might be reading a character at a time for all we know.

The Bugs Start Scurrying

The ... in the original code elided nasty and buggy code that did some scnprintfs into the user’s buffer, with no thought given to supporting multiple calls. It was possible to ignore this “detail” because, to be honest, create_proc_entry let us. The interface is complicated enough that many tutorials don’t implement it properly “to save space”. Or error handling is “an excercise left to the reader.” And so on.

On the other hand, the type of file_operations.read is simple and well-understood:

ssize_t (*read) (struct file *, char __user *, size_t, loff_t *);

This maps very directly to read(2) in userspace. It is now much harder to ignore the fact that the buffer may need to be filled in piecemeal.

Notice how deprecating poor APIs forces bugs into the light. Good API design is hard.

Seekable Interface

So how to get from here to there? It’s more clear now that we must present a seekable interface, but it’s no easier to do (perhaps even harder since more state must be maintained). It could be done by hand (by paying carefull attention to all the parameters, and probably doing some buffering), but Al Viro did this for us years ago with seq_file. The seq_file abstraction maps between the read and read_proc functions, as well as automatically handles repeated calls to read.

Updated code might look like this, with follow-up notes below:

xxx

Writes from Userspace

The write function of file_operations mirrors the userspace write(2), as you would expect. The seq_file abstraction

Convenience

Plumbing Private “data” Through

How to get the private data member of proc_create_data passed through to single_open? Previously, the data from create_proc_entry was directly passed through to the read method. But now the seq_file stuff is in the middle. What to do?

Your data is set onto the inode->private. The void *data parameter of read_proc is now meaningless.

TODO
PDE_DATA

Writers

seq_file does not apply to writers. So if your proc file accepts writes, you’ll need to handle that yourself. The most common case (probably) is to accept a single write, ignoring the offset. It’s the rare /proc file that really honors seeks while writing.

Locking

The seq_file.txt doc generally waves its hands, saying it doesn’t take any locks and so you can do whatever you want. This isn’t exactly true: To make your data transparently seekable, it must allocate memory. kmalloc doesn’t mix with spinlocks. Therefore, holding a spinlock over seq_printf won’t work:

TODO

But if your code does this, you need to rethink your locking strategy. Consider the get/put model instead of holding a spinlock. Really, manually manipulating an atomic_t and spinlock_t is just unrolling the get/put Linux model. Stop it.

References

seq_file.txt

June 13, 2014
749 words


Categories

Tags
linux kernel

Contact

email

ccoffing on GitHub

ccoffing on LinkedIn