Week 8 — Linux Networking Architecture¶

Goal¶

Understand how a packet travels through the kernel — from the NIC driver to the socket, and back. Learn the key data structures: sk_buff, net_device, and struct sock. After this week, you'll be able to trace any packet's path through the source code.

Why This Matters¶

This is the core of what you'll work on as a netdev contributor. Every patch you submit will touch some part of this pipeline. Understanding the full picture lets you make changes confidently.

The Packet Journey: Receive Path (Ingress)¶

When a packet arrives at a network interface:

Hardware NIC
  → DMA to ring buffer (driver-managed memory)
    → Driver calls napi_schedule()
      → NAPI poll: driver builds sk_buff from ring buffer
        → netif_receive_skb()
          → Protocol demux (ip_rcv, ipv6_rcv, arp_rcv, ...)
            → Netfilter hooks (PREROUTING)
              → Routing decision
                → ip_local_deliver() (if for us)
                  → Transport protocol (tcp_v4_rcv, udp_rcv, ...)
                    → Socket receive queue
                      → Userspace read()/recv()

Let's trace the key files:

drivers/net/virtio_net.c     → virtio NIC driver (your QEMU device)
net/core/dev.c               → Core network device handling
net/ipv4/ip_input.c          → IPv4 receive path
net/ipv4/tcp_ipv4.c          → TCP receive
net/ipv4/tcp_input.c         → TCP input processing
net/ipv4/udp.c               → UDP receive

The Packet Journey: Transmit Path (Egress)¶

When userspace calls send():

Userspace send()/write()
  → Socket layer (sock_sendmsg)
    → Transport (tcp_sendmsg / udp_sendmsg)
      → Build sk_buff, fill headers
        → IP layer (ip_queue_xmit / ip_local_out)
          → Netfilter hooks (OUTPUT, POSTROUTING)
            → Routing
              → dev_queue_xmit()
                → Traffic control (qdisc)
                  → Driver xmit function (ndo_start_xmit)
                    → DMA to NIC hardware

Key files:

net/ipv4/tcp.c               → tcp_sendmsg()
net/ipv4/ip_output.c         → IP output path
net/core/dev.c               → dev_queue_xmit()
net/sched/                   → Traffic control / queueing disciplines

sk_buff — The Packet¶

struct sk_buff is the most important networking data structure. Every packet in the kernel is represented by an sk_buff.

Open it:

nvim include/linux/skbuff.h

It's huge (~250 fields). Focus on the essential ones:

struct sk_buff {
    /* Linked list management */
    struct sk_buff      *next, *prev;

    /* When was this packet received/sent */
    ktime_t             tstamp;

    /* The network device this packet came from / goes to */
    struct net_device   *dev;

    /* The socket that owns this packet */
    struct sock         *sk;

    /* Pointer to the data */
    unsigned char       *head;     // Start of allocated buffer
    unsigned char       *data;     // Start of current protocol data
    unsigned char       *tail;     // End of current data
    unsigned char       *end;      // End of allocated buffer

    unsigned int        len;       // Total data length
    unsigned int        data_len;  // Length of fragments (non-linear data)

    /* Protocol info */
    __be16              protocol;  // ETH_P_IP, ETH_P_IPV6, etc.

    /* Headers (set as packet moves through layers) */
    union { ... }       headers;   // Points to L2/L3/L4 headers
};

The sk_buff Data Layout¶

head ─────────────────────────────────────── end
  │                                           │
  │  headroom  │  data  ──────── tail  │      │
  │            │  (packet content)     │      │
  │  (for      │                       │      │
  │   prepending│                      │      │
  │   headers) │                       │      │

Why this design? As a packet moves through protocol layers, each layer needs to add or remove headers. Instead of copying data, the kernel just moves the data pointer:

// Receive path: remove Ethernet header, expose IP header
skb_pull(skb, ETH_HLEN);       // data moves forward

// Transmit path: add IP header before existing data
skb_push(skb, sizeof(struct iphdr)); // data moves backward

// Reserve headroom before adding data
skb_reserve(skb, NET_IP_ALIGN + ETH_HLEN + sizeof(struct iphdr));

Key sk_buff Functions¶

// Allocation
struct sk_buff *skb = alloc_skb(size, GFP_KERNEL);
struct sk_buff *skb = netdev_alloc_skb(dev, size);  // For drivers

// Freeing
kfree_skb(skb);           // Packet dropped (counts as drop)
consume_skb(skb);         // Packet consumed normally

// Reference counting
skb_get(skb);             // Increment refcount
kfree_skb(skb);           // Decrement (free if last)

// Cloning (share data, separate metadata)
struct sk_buff *clone = skb_clone(skb, GFP_ATOMIC);

// Accessing headers
struct iphdr *iph = ip_hdr(skb);
struct tcphdr *th = tcp_hdr(skb);
struct ethhdr *eth = eth_hdr(skb);

net_device — The Network Interface¶

struct net_device represents a network interface (eth0, lo, wlan0, etc.).

nvim include/linux/netdevice.h

Key fields:

struct net_device {
    char                name[IFNAMSIZ];     // "eth0", "lo", etc.
    unsigned int        ifindex;            // Unique interface index
    unsigned int        mtu;                // Maximum transmission unit
    unsigned char       dev_addr[MAX_ADDR_LEN]; // Hardware (MAC) address

    /* Statistics */
    struct net_device_stats stats;

    /* Operations — the driver fills these in */
    const struct net_device_ops *netdev_ops;

    /* Receive handling */
    struct napi_struct  napi;
};

net_device_ops — The Driver Interface¶

This is the contract between the kernel and network drivers:

struct net_device_ops {
    int  (*ndo_open)(struct net_device *dev);           // ifconfig up
    int  (*ndo_stop)(struct net_device *dev);           // ifconfig down
    netdev_tx_t (*ndo_start_xmit)(struct sk_buff *skb, // Send a packet
                                   struct net_device *dev);
    int  (*ndo_set_mac_address)(struct net_device *dev, // Change MAC
                                void *addr);
    void (*ndo_get_stats64)(struct net_device *dev,     // Get statistics
                            struct rtnl_link_stats64 *storage);
    // ... many more
};

When the kernel wants to send a packet, it calls dev->netdev_ops->ndo_start_xmit(skb, dev). The driver takes the sk_buff and DMA's its data to the hardware.

NAPI — The Receive Mechanism¶

Older kernels used interrupt-per-packet, which was slow under high load. NAPI (New API) uses a hybrid approach:

NIC raises an interrupt when a packet arrives
Driver disables further interrupts and schedules a NAPI poll
The poll function processes multiple packets in a batch
When the ring buffer is empty, re-enable interrupts

// Driver interrupt handler (simplified)
static irqreturn_t my_driver_interrupt(int irq, void *data)
{
    struct my_device *dev = data;

    // Disable further interrupts
    disable_interrupts(dev);

    // Schedule NAPI poll
    napi_schedule(&dev->napi);

    return IRQ_HANDLED;
}

// NAPI poll function — processes up to 'budget' packets
static int my_driver_poll(struct napi_struct *napi, int budget)
{
    int processed = 0;

    while (processed < budget) {
        struct sk_buff *skb = get_next_packet_from_ring(dev);
        if (!skb)
            break;

        // Hand packet to the kernel
        napi_gro_receive(napi, skb);
        processed++;
    }

    if (processed < budget) {
        napi_complete(napi);
        enable_interrupts(dev);
    }

    return processed;
}

GRO (Generic Receive Offload): napi_gro_receive() aggregates small packets into larger ones before passing them up the stack. This dramatically reduces per-packet overhead.

Protocol Registration¶

The kernel uses a registration pattern for protocol handlers. IPv4 registers like this (simplified from net/ipv4/af_inet.c):

static struct packet_type ip_packet_type = {
    .type = cpu_to_be16(ETH_P_IP),
    .func = ip_rcv,    // Called for every incoming IP packet
};

static int __init inet_init(void)
{
    // Register IPv4 protocol handler
    dev_add_pack(&ip_packet_type);

    // Register TCP as an IP protocol
    inet_add_protocol(&tcp_protocol, IPPROTO_TCP);

    // Register UDP
    inet_add_protocol(&udp_protocol, IPPROTO_UDP);
    // ...
}

When netif_receive_skb() processes a received packet, it checks skb->protocol and calls the matching handler — ip_rcv() for IPv4.

The Socket Layer — Connecting Userspace¶

Userspace interacts with networking through sockets. The kernel side:

// When userspace calls socket(AF_INET, SOCK_STREAM, 0):
//   → inet_create() creates a struct sock
//   → binds it to the TCP protocol operations

struct proto tcp_prot = {
    .name           = "TCP",
    .connect        = tcp_v4_connect,
    .sendmsg        = tcp_sendmsg,
    .recvmsg        = tcp_recvmsg,
    .close          = tcp_close,
    // ...
};

Reading the Code: A Guided Tour¶

Trace a TCP connection from userspace to wire:

# 1. Socket creation
nvim net/ipv4/af_inet.c        # inet_create()

# 2. Connect
nvim net/ipv4/tcp_ipv4.c       # tcp_v4_connect()

# 3. Send data
nvim net/ipv4/tcp.c             # tcp_sendmsg()

# 4. Build TCP segment
nvim net/ipv4/tcp_output.c      # tcp_transmit_skb()

# 5. IP layer
nvim net/ipv4/ip_output.c       # ip_queue_xmit()

# 6. Device output
nvim net/core/dev.c             # dev_queue_xmit()

# 7. Driver
nvim drivers/net/virtio_net.c   # start_xmit()

Do this trace in one sitting. Follow function calls. Use GDB or ftrace (from Week 5) to verify the call chain with a real connection.

Exercises¶

Use ftrace to trace a ping through the receive path. Identify every function from the driver to ICMP reply.
Open include/linux/skbuff.h and find skb_push, skb_pull, skb_reserve. Read their implementations. Draw the head/data/tail pointers for a packet being built.
Look at drivers/net/virtio_net.c. Find the ndo_start_xmit function. What does it do with the sk_buff?
In net/core/dev.c, find netif_receive_skb() and trace the path to ip_rcv(). How does protocol demuxing work?
Look at net/ipv4/tcp_input.c, function tcp_v4_rcv(). This is the entry point for all incoming TCP segments. How does it find the right socket?

What's Next¶

Next week we go deeper into the relationship between drivers and the kernel — how hardware interacts with the networking stack, DMA, interrupts, and the role of the bus (PCI/virtio).