Is a kernel TCP/IP implementation viable?

This discussion started as an off-topic digression in #238, triggered by benchmark results from the updated - now completely buffed - NIC drivers, an change in turn driven by the recently implemented TH/HB interrupt handler regime. (@ghaerr, @mellvik)

Not a new discussion at all, it has popped in and out for years, usually triggered by benchmarks, which tend to draw attention to the big difference between  outgoing and incoming file transfer speeds: Outgoing always beating incoming almost by a factor of two, even though outgoing packets are limited to 512 bytes payload, while incoming accepts whatever the sender chooses, usually like 1460 bytes. A 3:1 packet ratio by the looks of it, which is not entirely fair because `ktcp` most of the time (when part of a fast flow) ACKs every incoming packet of this size twice, the second to open up the window that was almost closed by the previous one. Still, the difference between packet count is at least 2:1 in the opposite direction, and the slowness of the incoming flow becomes interesting. 

While not measured empirically at low level, it makes sense to assume that the main reason is scheduling. The incoming packet flow is like
`NIC_buffer->driver_buffer[kernel]->ktcp[app level]->socket-layer[kernel]->application[app layer]`
And while the outgoing flow is exactly the opposite, the outgoing flow is 'contiguous': 'take this data and get it going', the incoming flow is a series of queries and waits, as in 'is there any data available now?'. This query and wait flow gets seriously exacerbated by scheduling: The query gets passed along from level to level until it reaches the driver, while the availability of data (unless it was buffered) first 'kicks' the kernel, which 'kicks' `ktcp`, which `kicks' it back to the kernel, which eventually kicks the app. 

While this - after innumerable hours of coding, testing and discussions over the years - now works amazingly well, it can never be really fast or efficient. The benefit is flexibility: having most of the networking code outside the kernel keeps (or contributes to keep) the kernel lean and mean, and makes it easy to NOT have networking resources consume meagre system resource when not needed.

The validity of this argument has diminished over the years. First, networking - even for systems like ELKS and TLVC - is no longer nice to have, but in many (most?) cases a requirement. Secondly, the 64k code limit for the kernel was broken long time ago. And while our systems do not have 'kernel modules',  we have the capability to free and reuse parts of the kernel after boot. Which means that when booting a networking kernel, most of the networking code space may be released - if desirable - in the startup process. Likewise, network buffers may be located anywhere in memory, not only in the kernel heap.

These changes and capabilities make the idea of a kernel TCP/IP implementation interesting and realistic instead of just a pipe dream. More than interesting in fact, more like tempting – in particular from a development perspective: Lot's of the many rather odd problems we've faced with `ktcp` over the years have originated in the architecture - events out of sync for reasons out of control. As illustrated by the one I've been dealing with today. An old bug occurring when a long listing is running in a `telnet` window and a new (incoming) connection being initiated.  More often than not, the new connection would hang, the running `telnet` would continue but any other networking activity would cause `ktcp` to hang. The problem turns out to be in this pice of code - in `tcp.c`:
```
static void tcp_synrecv(struct iptcp_s *iptcp, struct tcpcb_s *cb)
{
    struct tcphdr_s *h = iptcp->tcph;

    if (h->flags & TF_RST)
        cb->state = TS_LISTEN;          /* FIXME: not valid, should dealloc extra CB */
    else if ((h->flags & TF_ACK) == 0)
        debug_tcp("tcp: NO ACK IN SYNRECV\n");
    else {
        cb->state = TS_ESTABLISHED;
        debug_tcp("TS_ESTABLISHED\n");
        usleep(1000L);                 <----- added today
        tcpdev_notify_accept(cb);
        tcp_established(iptcp, cb);
    }
}
```
The call to `tcpdev_notify_accept()` is assumed to never fail, but it does because the socket we're trying to accept doesn't exist yet. The heavy `telnet` traffic has prevented the connecting client's ACK to the server's SYNACK from reaching the application level and we're stuck. There is no way to handle this - it's a technically impossible situation made possible by the current architecture. And -in this case - instead of figuring out a smart way to manoeuvre out of the situation, adding a 1ms delay fixes it for now, and possibly forever since it's the simplest solution and it's 'cheap'.

`ktcp` is full of situations like this, fixed in smart ways after hours of testing and debugging, all of them adding code - and stability, and all of them created by the architecture. A kernel implementation would eliminate most if not all of them.

Issues to discuss:
- viability (a continuation of the rant above)
- benefits & challenges
- steps/process, which was briefly touched in #238, like starting with the IP layer, keeping TCP in `ktcp` for a while
- code base - ktcp (attractive) or something else

An interesting benefit that comes to mind is a lowered threshold for UDP support and a generalized ICMP implementation - which would simplify the porting of tools like `ping` and support for functions like ICMP redirects.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is a kernel TCP/IP implementation viable? #242

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Is a kernel TCP/IP implementation viable? #242

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions