Week 6 — Kernel C Idioms¶
Goal¶
Learn the C patterns and abstractions the kernel uses everywhere. Without these, kernel code looks alien even if you know C. After this week, you'll read kernel source fluently.
Why This Matters¶
The kernel doesn't use libc. No malloc(), no printf(), no errno. It has its own
implementations of everything, often with different semantics. Networking code especially
is dense with these patterns — every packet path uses sk_buff, linked lists, reference
counting, and RCU.
Error Handling: ERR_PTR / IS_ERR / PTR_ERR¶
The kernel can't throw exceptions. Functions that return pointers use a trick: encode error codes as invalid pointer values (in the top of the address space).
#include <linux/err.h>
// Returning an error from a function that normally returns a pointer:
struct sock *my_func(void)
{
struct sock *sk = sock_alloc();
if (!sk)
return ERR_PTR(-ENOMEM); // Encode error as pointer
return sk;
}
// Checking the result:
struct sock *sk = my_func();
if (IS_ERR(sk)) {
int err = PTR_ERR(sk); // Extract error code
pr_err("failed: %d\n", err);
return err;
}
This pattern is everywhere. Any function returning a pointer might return an ERR_PTR.
Linked Lists¶
The kernel uses intrusive linked lists — the list node is embedded inside your struct, not the other way around.
#include <linux/list.h>
struct my_device {
char name[32];
int status;
struct list_head list; // This embeds the list node
};
// Declare and initialize a list head
LIST_HEAD(device_list);
// Add an element
struct my_device *dev = kmalloc(sizeof(*dev), GFP_KERNEL);
INIT_LIST_HEAD(&dev->list);
list_add(&dev->list, &device_list); // Add to head
list_add_tail(&dev->list, &device_list); // Add to tail
// Iterate
struct my_device *d;
list_for_each_entry(d, &device_list, list) {
pr_info("device: %s\n", d->name);
}
// Safe iteration (if you might delete during iteration)
struct my_device *tmp;
list_for_each_entry_safe(d, tmp, &device_list, list) {
if (d->status == DEAD) {
list_del(&d->list);
kfree(d);
}
}
// Check if empty
if (list_empty(&device_list))
pr_info("no devices\n");
The magic: list_for_each_entry uses container_of() to get the enclosing struct
from the embedded list_head. This is a fundamental kernel pattern.
container_of¶
// Given a pointer to a member, get a pointer to the containing struct
#define container_of(ptr, type, member) ...
// Example: if you have &dev->list, get back dev:
struct my_device *dev = container_of(list_ptr, struct my_device, list);
Memory Allocation¶
No malloc() / free(). The kernel has multiple allocators for different situations:
#include <linux/slab.h>
// General purpose allocation
void *ptr = kmalloc(size, GFP_KERNEL); // May sleep
void *ptr = kmalloc(size, GFP_ATOMIC); // Never sleeps (use in interrupt context)
kfree(ptr);
// Zeroed allocation
void *ptr = kzalloc(size, GFP_KERNEL);
// Array allocation (overflow-safe)
void *ptr = kcalloc(n, size, GFP_KERNEL);
// Allocate a specific struct (common pattern)
struct sk_buff *skb = kmalloc(sizeof(*skb), GFP_KERNEL);
GFP flags matter:
GFP_KERNEL— normal allocation, may sleep waiting for memory. Use in process context.GFP_ATOMIC— never sleeps. Use in interrupt handlers, spinlock sections, softirqs.GFP_NOIO— won't start any I/O. Used in I/O paths to avoid recursion.
Rule of thumb: If you're in a function that might be called from an interrupt or while
holding a spinlock, you must use GFP_ATOMIC. Otherwise, use GFP_KERNEL.
Per-CPU Allocation and Slab Caches¶
For frequently allocated/freed objects (like sk_buff), the kernel uses slab caches:
// Create a cache (done once at init)
struct kmem_cache *my_cache = kmem_cache_create("my_objects",
sizeof(struct my_obj), 0, 0, NULL);
// Allocate from cache (fast)
struct my_obj *obj = kmem_cache_alloc(my_cache, GFP_KERNEL);
// Free back to cache
kmem_cache_free(my_cache, obj);
sk_buff has its own slab cache — that's why alloc_skb() is fast.
Locking¶
The kernel is massively concurrent. Locking is non-optional.
Spinlocks — for short, non-sleeping critical sections¶
#include <linux/spinlock.h>
DEFINE_SPINLOCK(my_lock);
spin_lock(&my_lock);
// Critical section — must not sleep, must be brief
spin_unlock(&my_lock);
// In interrupt context or when you need to disable interrupts:
spin_lock_irqsave(&my_lock, flags);
// ...
spin_unlock_irqrestore(&my_lock, flags);
Mutexes — for longer sections that may sleep¶
#include <linux/mutex.h>
DEFINE_MUTEX(my_mutex);
mutex_lock(&my_mutex);
// Critical section — may sleep (e.g., kmalloc with GFP_KERNEL)
mutex_unlock(&my_mutex);
RCU (Read-Copy-Update) — for read-heavy data¶
Networking code uses RCU extensively because packet processing is read-heavy:
#include <linux/rcupdate.h>
// Reader side — extremely cheap, no locks
rcu_read_lock();
struct my_data *data = rcu_dereference(global_ptr);
// Use data...
rcu_read_unlock();
// Writer side — more expensive, but doesn't block readers
struct my_data *new_data = kmalloc(sizeof(*new_data), GFP_KERNEL);
// ... fill in new_data ...
struct my_data *old = rcu_dereference(global_ptr);
rcu_assign_pointer(global_ptr, new_data);
synchronize_rcu(); // Wait for all readers to finish
kfree(old);
Why RCU matters for networking: The routing table, socket hash tables, and device lists are all RCU-protected. Every packet lookup is an RCU reader.
Reference Counting¶
Many kernel objects use reference counting to manage lifetime:
#include <linux/refcount.h>
struct my_obj {
refcount_t refcnt;
// ...
};
// Initialize
refcount_set(&obj->refcnt, 1);
// Take a reference
refcount_inc(&obj->refcnt);
// Release a reference
if (refcount_dec_and_test(&obj->refcnt)) {
// Last reference — free the object
kfree(obj);
}
Sockets (struct sock) use sock_hold() / sock_put(). Network devices use
dev_hold() / dev_put(). sk_buffs use skb_get() / kfree_skb().
Common Patterns in Networking Code¶
You'll see these constantly in net/:
// Bail-out pattern
int my_func(struct sk_buff *skb)
{
if (!pskb_may_pull(skb, sizeof(struct iphdr)))
goto drop;
// Process packet...
return NET_RX_SUCCESS;
drop:
kfree_skb(skb);
return NET_RX_DROP;
}
// Netlink error reporting
if (err)
return netlink_ack(skb, nlh, err, NULL);
Exercises¶
- Open
net/ipv4/tcp.c. Find three uses ofERR_PTR/IS_ERRpatterns. Trace what errors they're encoding. - Find
list_for_each_entryusage innet/core/dev.c. What lists are being iterated? - Search for
GFP_ATOMICinnet/. Why is it used instead ofGFP_KERNEL? (Hint: look at the calling context.) - Find
rcu_read_lock()in the IPv4 routing code (net/ipv4/route.c). What data is being protected? - Look at
struct sk_buffininclude/linux/skbuff.h. Find the reference counting field. How is it incremented and decremented?
What's Next¶
Next week you'll write your own kernel module — a self-contained piece of code that loads into a running kernel. This is the standard entry point for kernel development.