Skip to content

Week 9 — Network Drivers: Hardware, Drivers, and the Kernel

Goal

Understand how network drivers bridge hardware and the kernel networking stack. Learn about DMA, ring buffers, PCI/virtio bus, and the driver model. By the end of this week, you'll be able to read and understand a real network driver.

Why This Matters

The netdev mailing list receives many driver patches. Understanding driver architecture lets you review them, find bugs in them, and eventually write or improve them. Even if you focus on protocol work, understanding the driver layer explains why certain APIs exist.


The Driver Model: Registration

Every network driver follows the same lifecycle:

1. Module loads → probe function called
2. Probe: allocate net_device, register with kernel
3. User does "ip link set up" → ndo_open()
4. Packets flow via ndo_start_xmit() and NAPI poll
5. User does "ip link set down" → ndo_stop()
6. Module unloads → remove function, free everything

Bus Systems

Network hardware sits on a bus. The bus tells the kernel "a device is here" and the kernel matches it to a driver.

PCI (physical hardware):

static struct pci_driver my_pci_driver = {
    .name     = "my_nic",
    .id_table = my_pci_ids,      // Vendor/device IDs this driver handles
    .probe    = my_probe,         // Called when device found
    .remove   = my_remove,        // Called when device removed
};

module_pci_driver(my_pci_driver);

Virtio (virtual hardware, what you use in QEMU):

static struct virtio_driver virtio_net_driver = {
    .driver.name  = "virtio_net",
    .id_table     = id_table,
    .probe        = virtnet_probe,
    .remove       = virtnet_remove,
    .feature_table = features,
};

module_virtio_driver(virtio_net_driver);

Look at the real virtio-net driver:

nvim drivers/net/virtio_net.c
# Search for "virtnet_probe" — this is where the device is set up

DMA: How Data Moves

Network hardware uses Direct Memory Access (DMA) to transfer packet data without CPU involvement:

Receive: 1. Driver allocates memory buffers and tells the NIC their physical addresses 2. NIC writes incoming packet data directly into those buffers via DMA 3. NIC signals completion via interrupt 4. Driver reads the data from the buffers

Transmit: 1. Kernel builds sk_buff with packet data 2. Driver maps the sk_buff's data to a physical address (DMA mapping) 3. Driver tells the NIC "send this data from this physical address" 4. NIC reads the data via DMA and transmits 5. NIC signals completion; driver unmaps and frees the sk_buff

// DMA mapping for transmit
#include <linux/dma-mapping.h>

dma_addr_t dma_addr = dma_map_single(dev, skb->data, skb->len, DMA_TO_DEVICE);
if (dma_mapping_error(dev, dma_addr)) {
    // Handle error
}

// Tell hardware about this buffer
write_to_nic_register(dma_addr, skb->len);

// After transmission completes (in completion handler):
dma_unmap_single(dev, dma_addr, skb->len, DMA_TO_DEVICE);
consume_skb(skb);

Ring Buffers (Descriptor Rings)

NICs use circular buffers (rings) to batch-process packets efficiently:

    Producer (driver/NIC) fills entries →

    ┌───┬───┬───┬───┬───┬───┬───┬───┐
    │ 0 │ 1 │ 2 │ 3 │ 4 │ 5 │ 6 │ 7 │
    └───┴───┴───┴───┴───┴───┴───┴───┘
                    ↑               ↑
                consumer         producer
                (reads)          (writes)

    ← Consumer (NIC/driver) reads entries

Each entry (descriptor) typically contains: - Physical address of a data buffer - Length - Status flags (owned by NIC or driver)

For the receive ring: the driver pre-fills entries with empty buffers. The NIC writes packet data into them and marks them complete.

For the transmit ring: the driver fills entries with packet data. The NIC reads and transmits them, marking them complete.

Virtio Rings (Virtqueues)

Virtio uses a standardized ring buffer called a virtqueue. This is what your QEMU VM uses:

nvim drivers/net/virtio_net.c
# Search for "virtqueue" — you'll see send_queue and receive_queue

The virtio-net driver has at least two queues: - receive virtqueue — NIC → driver (incoming packets) - transmit virtqueue — driver → NIC (outgoing packets)

Modern virtio-net supports multiple queue pairs (multi-queue) for better multi-core performance.

Anatomy of virtio_net.c

The driver you actually use in QEMU. Key functions:

// Probe: called when virtio-net device detected
virtnet_probe()
     alloc_etherdev_mq()      // Allocate net_device with multiple TX queues
     register_netdev()         // Make it visible to the kernel

// Open: called when interface brought up
virtnet_open()
     Enable NAPI
     Fill receive ring with empty buffers

// Transmit: called for each outgoing packet
start_xmit()
     Map sk_buff data for DMA
     Add to transmit virtqueue
     Kick the virtqueue (notify hypervisor)

// Receive: NAPI poll
virtnet_poll()
     Process completed receive descriptors
     Build sk_buff for each received packet
     napi_gro_receive()  hand to kernel
     Refill receive ring with new empty buffers

// Close: interface brought down
virtnet_close()
     Disable NAPI
     Free remaining buffers

ethtool Interface

Drivers expose configuration and statistics via ethtool:

# In the guest
ethtool eth0                    # Show driver info, link status
ethtool -S eth0                 # Show detailed statistics
ethtool -i eth0                 # Driver name and version
ethtool -g eth0                 # Ring buffer sizes
ethtool -G eth0 rx 512 tx 512  # Change ring buffer sizes
ethtool -k eth0                 # Show offload features

The driver implements these via the ethtool_ops structure:

static const struct ethtool_ops virtnet_ethtool_ops = {
    .get_drvinfo     = virtnet_get_drvinfo,
    .get_link        = ethtool_op_get_link,
    .get_ringparam   = virtnet_get_ringparam,
    .get_strings     = virtnet_get_strings,
    .get_sset_count  = virtnet_get_sset_count,
    .get_ethtool_stats = virtnet_get_ethtool_stats,
    // ...
};

Offloading: Hardware Assistance

Modern NICs can offload work from the CPU:

  • Checksum offload — NIC computes TCP/UDP/IP checksums
  • TSO (TCP Segmentation Offload) — NIC splits large TCP segments
  • GRO (Generic Receive Offload) — kernel aggregates small packets
  • RSS (Receive Side Scaling) — NIC distributes packets across CPU cores
# Check what your virtio-net supports
ethtool -k eth0 | grep -E 'checksum|segmentation|receive-offload'

The driver advertises capabilities via netdev->features:

dev->features |= NETIF_F_CSUM_MASK;
dev->features |= NETIF_F_TSO;
dev->features |= NETIF_F_GRO;

Network Namespaces

Every net_device belongs to a network namespace. This is how containers get isolated networking:

# Create a namespace
ip netns add test

# Move an interface into it
ip link set eth1 netns test

# Run commands in the namespace
ip netns exec test ip addr show

In the kernel, network namespaces are struct net (defined in include/net/net_namespace.h). Almost every networking function takes a struct net * parameter or accesses it through dev_net(dev) or sock_net(sk).

This matters for driver development because drivers must work correctly with namespaces.

Exercises

  1. Read virtnet_probe() in drivers/net/virtio_net.c. List every major step it takes to set up the device. What happens if any step fails?
  2. Find the ndo_start_xmit implementation in virtio_net.c. Trace what happens to an sk_buff from the moment the function is called until the packet is "sent."
  3. Inside your QEMU guest, run ethtool -S eth0 and cat /proc/net/dev. Compare the statistics. Where do these numbers come from in the driver?
  4. Look at net/core/dev.c, function dev_queue_xmit(). Trace how it calls the driver's transmit function. What happens if the device queue is full?
  5. Create a network namespace in your guest, create a veth pair, move one end into the namespace, and ping between them. Then trace the packet path with ftrace.

What's Next

Next week: kernel testing tools. You'll learn kselftest, kunit, static analysis with sparse and smatch, and get an introduction to syzkaller — the tools that find the bugs you'll fix.