Week 9 — Network Drivers: Hardware, Drivers, and the Kernel¶
Goal¶
Understand how network drivers bridge hardware and the kernel networking stack. Learn about DMA, ring buffers, PCI/virtio bus, and the driver model. By the end of this week, you'll be able to read and understand a real network driver.
Why This Matters¶
The netdev mailing list receives many driver patches. Understanding driver architecture lets you review them, find bugs in them, and eventually write or improve them. Even if you focus on protocol work, understanding the driver layer explains why certain APIs exist.
The Driver Model: Registration¶
Every network driver follows the same lifecycle:
1. Module loads → probe function called
2. Probe: allocate net_device, register with kernel
3. User does "ip link set up" → ndo_open()
4. Packets flow via ndo_start_xmit() and NAPI poll
5. User does "ip link set down" → ndo_stop()
6. Module unloads → remove function, free everything
Bus Systems¶
Network hardware sits on a bus. The bus tells the kernel "a device is here" and the kernel matches it to a driver.
PCI (physical hardware):
static struct pci_driver my_pci_driver = {
.name = "my_nic",
.id_table = my_pci_ids, // Vendor/device IDs this driver handles
.probe = my_probe, // Called when device found
.remove = my_remove, // Called when device removed
};
module_pci_driver(my_pci_driver);
Virtio (virtual hardware, what you use in QEMU):
static struct virtio_driver virtio_net_driver = {
.driver.name = "virtio_net",
.id_table = id_table,
.probe = virtnet_probe,
.remove = virtnet_remove,
.feature_table = features,
};
module_virtio_driver(virtio_net_driver);
Look at the real virtio-net driver:
DMA: How Data Moves¶
Network hardware uses Direct Memory Access (DMA) to transfer packet data without CPU involvement:
Receive: 1. Driver allocates memory buffers and tells the NIC their physical addresses 2. NIC writes incoming packet data directly into those buffers via DMA 3. NIC signals completion via interrupt 4. Driver reads the data from the buffers
Transmit: 1. Kernel builds sk_buff with packet data 2. Driver maps the sk_buff's data to a physical address (DMA mapping) 3. Driver tells the NIC "send this data from this physical address" 4. NIC reads the data via DMA and transmits 5. NIC signals completion; driver unmaps and frees the sk_buff
// DMA mapping for transmit
#include <linux/dma-mapping.h>
dma_addr_t dma_addr = dma_map_single(dev, skb->data, skb->len, DMA_TO_DEVICE);
if (dma_mapping_error(dev, dma_addr)) {
// Handle error
}
// Tell hardware about this buffer
write_to_nic_register(dma_addr, skb->len);
// After transmission completes (in completion handler):
dma_unmap_single(dev, dma_addr, skb->len, DMA_TO_DEVICE);
consume_skb(skb);
Ring Buffers (Descriptor Rings)¶
NICs use circular buffers (rings) to batch-process packets efficiently:
Producer (driver/NIC) fills entries →
┌───┬───┬───┬───┬───┬───┬───┬───┐
│ 0 │ 1 │ 2 │ 3 │ 4 │ 5 │ 6 │ 7 │
└───┴───┴───┴───┴───┴───┴───┴───┘
↑ ↑
consumer producer
(reads) (writes)
← Consumer (NIC/driver) reads entries
Each entry (descriptor) typically contains: - Physical address of a data buffer - Length - Status flags (owned by NIC or driver)
For the receive ring: the driver pre-fills entries with empty buffers. The NIC writes packet data into them and marks them complete.
For the transmit ring: the driver fills entries with packet data. The NIC reads and transmits them, marking them complete.
Virtio Rings (Virtqueues)¶
Virtio uses a standardized ring buffer called a virtqueue. This is what your QEMU VM uses:
The virtio-net driver has at least two queues: - receive virtqueue — NIC → driver (incoming packets) - transmit virtqueue — driver → NIC (outgoing packets)
Modern virtio-net supports multiple queue pairs (multi-queue) for better multi-core performance.
Anatomy of virtio_net.c¶
The driver you actually use in QEMU. Key functions:
// Probe: called when virtio-net device detected
virtnet_probe()
→ alloc_etherdev_mq() // Allocate net_device with multiple TX queues
→ register_netdev() // Make it visible to the kernel
// Open: called when interface brought up
virtnet_open()
→ Enable NAPI
→ Fill receive ring with empty buffers
// Transmit: called for each outgoing packet
start_xmit()
→ Map sk_buff data for DMA
→ Add to transmit virtqueue
→ Kick the virtqueue (notify hypervisor)
// Receive: NAPI poll
virtnet_poll()
→ Process completed receive descriptors
→ Build sk_buff for each received packet
→ napi_gro_receive() — hand to kernel
→ Refill receive ring with new empty buffers
// Close: interface brought down
virtnet_close()
→ Disable NAPI
→ Free remaining buffers
ethtool Interface¶
Drivers expose configuration and statistics via ethtool:
# In the guest
ethtool eth0 # Show driver info, link status
ethtool -S eth0 # Show detailed statistics
ethtool -i eth0 # Driver name and version
ethtool -g eth0 # Ring buffer sizes
ethtool -G eth0 rx 512 tx 512 # Change ring buffer sizes
ethtool -k eth0 # Show offload features
The driver implements these via the ethtool_ops structure:
static const struct ethtool_ops virtnet_ethtool_ops = {
.get_drvinfo = virtnet_get_drvinfo,
.get_link = ethtool_op_get_link,
.get_ringparam = virtnet_get_ringparam,
.get_strings = virtnet_get_strings,
.get_sset_count = virtnet_get_sset_count,
.get_ethtool_stats = virtnet_get_ethtool_stats,
// ...
};
Offloading: Hardware Assistance¶
Modern NICs can offload work from the CPU:
- Checksum offload — NIC computes TCP/UDP/IP checksums
- TSO (TCP Segmentation Offload) — NIC splits large TCP segments
- GRO (Generic Receive Offload) — kernel aggregates small packets
- RSS (Receive Side Scaling) — NIC distributes packets across CPU cores
# Check what your virtio-net supports
ethtool -k eth0 | grep -E 'checksum|segmentation|receive-offload'
The driver advertises capabilities via netdev->features:
Network Namespaces¶
Every net_device belongs to a network namespace. This is how containers get isolated
networking:
# Create a namespace
ip netns add test
# Move an interface into it
ip link set eth1 netns test
# Run commands in the namespace
ip netns exec test ip addr show
In the kernel, network namespaces are struct net (defined in include/net/net_namespace.h).
Almost every networking function takes a struct net * parameter or accesses it
through dev_net(dev) or sock_net(sk).
This matters for driver development because drivers must work correctly with namespaces.
Exercises¶
- Read
virtnet_probe()indrivers/net/virtio_net.c. List every major step it takes to set up the device. What happens if any step fails? - Find the
ndo_start_xmitimplementation in virtio_net.c. Trace what happens to an sk_buff from the moment the function is called until the packet is "sent." - Inside your QEMU guest, run
ethtool -S eth0andcat /proc/net/dev. Compare the statistics. Where do these numbers come from in the driver? - Look at
net/core/dev.c, functiondev_queue_xmit(). Trace how it calls the driver's transmit function. What happens if the device queue is full? - Create a network namespace in your guest, create a veth pair, move one end into the namespace, and ping between them. Then trace the packet path with ftrace.
What's Next¶
Next week: kernel testing tools. You'll learn kselftest, kunit, static analysis with sparse and smatch, and get an introduction to syzkaller — the tools that find the bugs you'll fix.