Week 8 — Linux Networking Architecture¶
Goal¶
Understand how a packet travels through the kernel — from the NIC driver to the socket,
and back. Learn the key data structures: sk_buff, net_device, and struct sock. After
this week, you'll be able to trace any packet's path through the source code.
Why This Matters¶
This is the core of what you'll work on as a netdev contributor. Every patch you submit will touch some part of this pipeline. Understanding the full picture lets you make changes confidently.
The Packet Journey: Receive Path (Ingress)¶
When a packet arrives at a network interface:
Hardware NIC
→ DMA to ring buffer (driver-managed memory)
→ Driver calls napi_schedule()
→ NAPI poll: driver builds sk_buff from ring buffer
→ netif_receive_skb()
→ Protocol demux (ip_rcv, ipv6_rcv, arp_rcv, ...)
→ Netfilter hooks (PREROUTING)
→ Routing decision
→ ip_local_deliver() (if for us)
→ Transport protocol (tcp_v4_rcv, udp_rcv, ...)
→ Socket receive queue
→ Userspace read()/recv()
Let's trace the key files:
drivers/net/virtio_net.c → virtio NIC driver (your QEMU device)
net/core/dev.c → Core network device handling
net/ipv4/ip_input.c → IPv4 receive path
net/ipv4/tcp_ipv4.c → TCP receive
net/ipv4/tcp_input.c → TCP input processing
net/ipv4/udp.c → UDP receive
The Packet Journey: Transmit Path (Egress)¶
When userspace calls send():
Userspace send()/write()
→ Socket layer (sock_sendmsg)
→ Transport (tcp_sendmsg / udp_sendmsg)
→ Build sk_buff, fill headers
→ IP layer (ip_queue_xmit / ip_local_out)
→ Netfilter hooks (OUTPUT, POSTROUTING)
→ Routing
→ dev_queue_xmit()
→ Traffic control (qdisc)
→ Driver xmit function (ndo_start_xmit)
→ DMA to NIC hardware
Key files:
net/ipv4/tcp.c → tcp_sendmsg()
net/ipv4/ip_output.c → IP output path
net/core/dev.c → dev_queue_xmit()
net/sched/ → Traffic control / queueing disciplines
sk_buff — The Packet¶
struct sk_buff is the most important networking data structure. Every packet in the
kernel is represented by an sk_buff.
Open it:
It's huge (~250 fields). Focus on the essential ones:
struct sk_buff {
/* Linked list management */
struct sk_buff *next, *prev;
/* When was this packet received/sent */
ktime_t tstamp;
/* The network device this packet came from / goes to */
struct net_device *dev;
/* The socket that owns this packet */
struct sock *sk;
/* Pointer to the data */
unsigned char *head; // Start of allocated buffer
unsigned char *data; // Start of current protocol data
unsigned char *tail; // End of current data
unsigned char *end; // End of allocated buffer
unsigned int len; // Total data length
unsigned int data_len; // Length of fragments (non-linear data)
/* Protocol info */
__be16 protocol; // ETH_P_IP, ETH_P_IPV6, etc.
/* Headers (set as packet moves through layers) */
union { ... } headers; // Points to L2/L3/L4 headers
};
The sk_buff Data Layout¶
head ─────────────────────────────────────── end
│ │
│ headroom │ data ──────── tail │ │
│ │ (packet content) │ │
│ (for │ │ │
│ prepending│ │ │
│ headers) │ │ │
Why this design? As a packet moves through protocol layers, each layer needs to
add or remove headers. Instead of copying data, the kernel just moves the data
pointer:
// Receive path: remove Ethernet header, expose IP header
skb_pull(skb, ETH_HLEN); // data moves forward
// Transmit path: add IP header before existing data
skb_push(skb, sizeof(struct iphdr)); // data moves backward
// Reserve headroom before adding data
skb_reserve(skb, NET_IP_ALIGN + ETH_HLEN + sizeof(struct iphdr));
Key sk_buff Functions¶
// Allocation
struct sk_buff *skb = alloc_skb(size, GFP_KERNEL);
struct sk_buff *skb = netdev_alloc_skb(dev, size); // For drivers
// Freeing
kfree_skb(skb); // Packet dropped (counts as drop)
consume_skb(skb); // Packet consumed normally
// Reference counting
skb_get(skb); // Increment refcount
kfree_skb(skb); // Decrement (free if last)
// Cloning (share data, separate metadata)
struct sk_buff *clone = skb_clone(skb, GFP_ATOMIC);
// Accessing headers
struct iphdr *iph = ip_hdr(skb);
struct tcphdr *th = tcp_hdr(skb);
struct ethhdr *eth = eth_hdr(skb);
net_device — The Network Interface¶
struct net_device represents a network interface (eth0, lo, wlan0, etc.).
Key fields:
struct net_device {
char name[IFNAMSIZ]; // "eth0", "lo", etc.
unsigned int ifindex; // Unique interface index
unsigned int mtu; // Maximum transmission unit
unsigned char dev_addr[MAX_ADDR_LEN]; // Hardware (MAC) address
/* Statistics */
struct net_device_stats stats;
/* Operations — the driver fills these in */
const struct net_device_ops *netdev_ops;
/* Receive handling */
struct napi_struct napi;
};
net_device_ops — The Driver Interface¶
This is the contract between the kernel and network drivers:
struct net_device_ops {
int (*ndo_open)(struct net_device *dev); // ifconfig up
int (*ndo_stop)(struct net_device *dev); // ifconfig down
netdev_tx_t (*ndo_start_xmit)(struct sk_buff *skb, // Send a packet
struct net_device *dev);
int (*ndo_set_mac_address)(struct net_device *dev, // Change MAC
void *addr);
void (*ndo_get_stats64)(struct net_device *dev, // Get statistics
struct rtnl_link_stats64 *storage);
// ... many more
};
When the kernel wants to send a packet, it calls dev->netdev_ops->ndo_start_xmit(skb, dev).
The driver takes the sk_buff and DMA's its data to the hardware.
NAPI — The Receive Mechanism¶
Older kernels used interrupt-per-packet, which was slow under high load. NAPI (New API) uses a hybrid approach:
- NIC raises an interrupt when a packet arrives
- Driver disables further interrupts and schedules a NAPI poll
- The poll function processes multiple packets in a batch
- When the ring buffer is empty, re-enable interrupts
// Driver interrupt handler (simplified)
static irqreturn_t my_driver_interrupt(int irq, void *data)
{
struct my_device *dev = data;
// Disable further interrupts
disable_interrupts(dev);
// Schedule NAPI poll
napi_schedule(&dev->napi);
return IRQ_HANDLED;
}
// NAPI poll function — processes up to 'budget' packets
static int my_driver_poll(struct napi_struct *napi, int budget)
{
int processed = 0;
while (processed < budget) {
struct sk_buff *skb = get_next_packet_from_ring(dev);
if (!skb)
break;
// Hand packet to the kernel
napi_gro_receive(napi, skb);
processed++;
}
if (processed < budget) {
napi_complete(napi);
enable_interrupts(dev);
}
return processed;
}
GRO (Generic Receive Offload): napi_gro_receive() aggregates small packets into
larger ones before passing them up the stack. This dramatically reduces per-packet overhead.
Protocol Registration¶
The kernel uses a registration pattern for protocol handlers. IPv4 registers like this
(simplified from net/ipv4/af_inet.c):
static struct packet_type ip_packet_type = {
.type = cpu_to_be16(ETH_P_IP),
.func = ip_rcv, // Called for every incoming IP packet
};
static int __init inet_init(void)
{
// Register IPv4 protocol handler
dev_add_pack(&ip_packet_type);
// Register TCP as an IP protocol
inet_add_protocol(&tcp_protocol, IPPROTO_TCP);
// Register UDP
inet_add_protocol(&udp_protocol, IPPROTO_UDP);
// ...
}
When netif_receive_skb() processes a received packet, it checks skb->protocol and
calls the matching handler — ip_rcv() for IPv4.
The Socket Layer — Connecting Userspace¶
Userspace interacts with networking through sockets. The kernel side:
// When userspace calls socket(AF_INET, SOCK_STREAM, 0):
// → inet_create() creates a struct sock
// → binds it to the TCP protocol operations
struct proto tcp_prot = {
.name = "TCP",
.connect = tcp_v4_connect,
.sendmsg = tcp_sendmsg,
.recvmsg = tcp_recvmsg,
.close = tcp_close,
// ...
};
Reading the Code: A Guided Tour¶
Trace a TCP connection from userspace to wire:
# 1. Socket creation
nvim net/ipv4/af_inet.c # inet_create()
# 2. Connect
nvim net/ipv4/tcp_ipv4.c # tcp_v4_connect()
# 3. Send data
nvim net/ipv4/tcp.c # tcp_sendmsg()
# 4. Build TCP segment
nvim net/ipv4/tcp_output.c # tcp_transmit_skb()
# 5. IP layer
nvim net/ipv4/ip_output.c # ip_queue_xmit()
# 6. Device output
nvim net/core/dev.c # dev_queue_xmit()
# 7. Driver
nvim drivers/net/virtio_net.c # start_xmit()
Do this trace in one sitting. Follow function calls. Use GDB or ftrace (from Week 5) to verify the call chain with a real connection.
Exercises¶
- Use ftrace to trace a
pingthrough the receive path. Identify every function from the driver to ICMP reply. - Open
include/linux/skbuff.hand findskb_push,skb_pull,skb_reserve. Read their implementations. Draw the head/data/tail pointers for a packet being built. - Look at
drivers/net/virtio_net.c. Find thendo_start_xmitfunction. What does it do with the sk_buff? - In
net/core/dev.c, findnetif_receive_skb()and trace the path toip_rcv(). How does protocol demuxing work? - Look at
net/ipv4/tcp_input.c, functiontcp_v4_rcv(). This is the entry point for all incoming TCP segments. How does it find the right socket?
What's Next¶
Next week we go deeper into the relationship between drivers and the kernel — how hardware interacts with the networking stack, DMA, interrupts, and the role of the bus (PCI/virtio).