Skip to content

Week 6 — Kernel C Idioms

Goal

Learn the C patterns and abstractions the kernel uses everywhere. Without these, kernel code looks alien even if you know C. After this week, you'll read kernel source fluently.

Why This Matters

The kernel doesn't use libc. No malloc(), no printf(), no errno. It has its own implementations of everything, often with different semantics. Networking code especially is dense with these patterns — every packet path uses sk_buff, linked lists, reference counting, and RCU.


Error Handling: ERR_PTR / IS_ERR / PTR_ERR

The kernel can't throw exceptions. Functions that return pointers use a trick: encode error codes as invalid pointer values (in the top of the address space).

#include <linux/err.h>

// Returning an error from a function that normally returns a pointer:
struct sock *my_func(void)
{
    struct sock *sk = sock_alloc();
    if (!sk)
        return ERR_PTR(-ENOMEM);  // Encode error as pointer
    return sk;
}

// Checking the result:
struct sock *sk = my_func();
if (IS_ERR(sk)) {
    int err = PTR_ERR(sk);       // Extract error code
    pr_err("failed: %d\n", err);
    return err;
}

This pattern is everywhere. Any function returning a pointer might return an ERR_PTR.

Linked Lists

The kernel uses intrusive linked lists — the list node is embedded inside your struct, not the other way around.

#include <linux/list.h>

struct my_device {
    char name[32];
    int status;
    struct list_head list;   // This embeds the list node
};

// Declare and initialize a list head
LIST_HEAD(device_list);

// Add an element
struct my_device *dev = kmalloc(sizeof(*dev), GFP_KERNEL);
INIT_LIST_HEAD(&dev->list);
list_add(&dev->list, &device_list);       // Add to head
list_add_tail(&dev->list, &device_list);  // Add to tail

// Iterate
struct my_device *d;
list_for_each_entry(d, &device_list, list) {
    pr_info("device: %s\n", d->name);
}

// Safe iteration (if you might delete during iteration)
struct my_device *tmp;
list_for_each_entry_safe(d, tmp, &device_list, list) {
    if (d->status == DEAD) {
        list_del(&d->list);
        kfree(d);
    }
}

// Check if empty
if (list_empty(&device_list))
    pr_info("no devices\n");

The magic: list_for_each_entry uses container_of() to get the enclosing struct from the embedded list_head. This is a fundamental kernel pattern.

container_of

// Given a pointer to a member, get a pointer to the containing struct
#define container_of(ptr, type, member) ...

// Example: if you have &dev->list, get back dev:
struct my_device *dev = container_of(list_ptr, struct my_device, list);

Memory Allocation

No malloc() / free(). The kernel has multiple allocators for different situations:

#include <linux/slab.h>

// General purpose allocation
void *ptr = kmalloc(size, GFP_KERNEL);   // May sleep
void *ptr = kmalloc(size, GFP_ATOMIC);   // Never sleeps (use in interrupt context)
kfree(ptr);

// Zeroed allocation
void *ptr = kzalloc(size, GFP_KERNEL);

// Array allocation (overflow-safe)
void *ptr = kcalloc(n, size, GFP_KERNEL);

// Allocate a specific struct (common pattern)
struct sk_buff *skb = kmalloc(sizeof(*skb), GFP_KERNEL);

GFP flags matter:

  • GFP_KERNEL — normal allocation, may sleep waiting for memory. Use in process context.
  • GFP_ATOMIC — never sleeps. Use in interrupt handlers, spinlock sections, softirqs.
  • GFP_NOIO — won't start any I/O. Used in I/O paths to avoid recursion.

Rule of thumb: If you're in a function that might be called from an interrupt or while holding a spinlock, you must use GFP_ATOMIC. Otherwise, use GFP_KERNEL.

Per-CPU Allocation and Slab Caches

For frequently allocated/freed objects (like sk_buff), the kernel uses slab caches:

// Create a cache (done once at init)
struct kmem_cache *my_cache = kmem_cache_create("my_objects",
    sizeof(struct my_obj), 0, 0, NULL);

// Allocate from cache (fast)
struct my_obj *obj = kmem_cache_alloc(my_cache, GFP_KERNEL);

// Free back to cache
kmem_cache_free(my_cache, obj);

sk_buff has its own slab cache — that's why alloc_skb() is fast.

Locking

The kernel is massively concurrent. Locking is non-optional.

Spinlocks — for short, non-sleeping critical sections

#include <linux/spinlock.h>

DEFINE_SPINLOCK(my_lock);

spin_lock(&my_lock);
// Critical section — must not sleep, must be brief
spin_unlock(&my_lock);

// In interrupt context or when you need to disable interrupts:
spin_lock_irqsave(&my_lock, flags);
// ...
spin_unlock_irqrestore(&my_lock, flags);

Mutexes — for longer sections that may sleep

#include <linux/mutex.h>

DEFINE_MUTEX(my_mutex);

mutex_lock(&my_mutex);
// Critical section — may sleep (e.g., kmalloc with GFP_KERNEL)
mutex_unlock(&my_mutex);

RCU (Read-Copy-Update) — for read-heavy data

Networking code uses RCU extensively because packet processing is read-heavy:

#include <linux/rcupdate.h>

// Reader side — extremely cheap, no locks
rcu_read_lock();
struct my_data *data = rcu_dereference(global_ptr);
// Use data...
rcu_read_unlock();

// Writer side — more expensive, but doesn't block readers
struct my_data *new_data = kmalloc(sizeof(*new_data), GFP_KERNEL);
// ... fill in new_data ...
struct my_data *old = rcu_dereference(global_ptr);
rcu_assign_pointer(global_ptr, new_data);
synchronize_rcu();  // Wait for all readers to finish
kfree(old);

Why RCU matters for networking: The routing table, socket hash tables, and device lists are all RCU-protected. Every packet lookup is an RCU reader.

Reference Counting

Many kernel objects use reference counting to manage lifetime:

#include <linux/refcount.h>

struct my_obj {
    refcount_t refcnt;
    // ...
};

// Initialize
refcount_set(&obj->refcnt, 1);

// Take a reference
refcount_inc(&obj->refcnt);

// Release a reference
if (refcount_dec_and_test(&obj->refcnt)) {
    // Last reference — free the object
    kfree(obj);
}

Sockets (struct sock) use sock_hold() / sock_put(). Network devices use dev_hold() / dev_put(). sk_buffs use skb_get() / kfree_skb().

Common Patterns in Networking Code

You'll see these constantly in net/:

// Bail-out pattern
int my_func(struct sk_buff *skb)
{
    if (!pskb_may_pull(skb, sizeof(struct iphdr)))
        goto drop;

    // Process packet...
    return NET_RX_SUCCESS;

drop:
    kfree_skb(skb);
    return NET_RX_DROP;
}

// Netlink error reporting
if (err)
    return netlink_ack(skb, nlh, err, NULL);

Exercises

  1. Open net/ipv4/tcp.c. Find three uses of ERR_PTR / IS_ERR patterns. Trace what errors they're encoding.
  2. Find list_for_each_entry usage in net/core/dev.c. What lists are being iterated?
  3. Search for GFP_ATOMIC in net/. Why is it used instead of GFP_KERNEL? (Hint: look at the calling context.)
  4. Find rcu_read_lock() in the IPv4 routing code (net/ipv4/route.c). What data is being protected?
  5. Look at struct sk_buff in include/linux/skbuff.h. Find the reference counting field. How is it incremented and decremented?

What's Next

Next week you'll write your own kernel module — a self-contained piece of code that loads into a running kernel. This is the standard entry point for kernel development.