This discussion started as an off-topic digression in #238, triggered by benchmark results from the updated - now completely buffed - NIC drivers, an change in turn driven by the recently implemented TH/HB interrupt handler regime. (@ghaerr, @Mellvik)
Not a new discussion at all, it has popped in and out for years, usually triggered by benchmarks, which tend to draw attention to the big difference between outgoing and incoming file transfer speeds: Outgoing always beating incoming almost by a factor of two, even though outgoing packets are limited to 512 bytes payload, while incoming accepts whatever the sender chooses, usually like 1460 bytes. A 3:1 packet ratio by the looks of it, which is not entirely fair because ktcp most of the time (when part of a fast flow) ACKs every incoming packet of this size twice, the second to open up the window that was almost closed by the previous one. Still, the difference between packet count is at least 2:1 in the opposite direction, and the slowness of the incoming flow becomes interesting.
While not measured empirically at low level, it makes sense to assume that the main reason is scheduling. The incoming packet flow is like
NIC_buffer->driver_buffer[kernel]->ktcp[app level]->socket-layer[kernel]->application[app layer]
And while the outgoing flow is exactly the opposite, the outgoing flow is 'contiguous': 'take this data and get it going', the incoming flow is a series of queries and waits, as in 'is there any data available now?'. This query and wait flow gets seriously exacerbated by scheduling: The query gets passed along from level to level until it reaches the driver, while the availability of data (unless it was buffered) first 'kicks' the kernel, which 'kicks' ktcp, which `kicks' it back to the kernel, which eventually kicks the app.
While this - after innumerable hours of coding, testing and discussions over the years - now works amazingly well, it can never be really fast or efficient. The benefit is flexibility: having most of the networking code outside the kernel keeps (or contributes to keep) the kernel lean and mean, and makes it easy to NOT have networking resources consume meagre system resource when not needed.
The validity of this argument has diminished over the years. First, networking - even for systems like ELKS and TLVC - is no longer nice to have, but in many (most?) cases a requirement. Secondly, the 64k code limit for the kernel was broken long time ago. And while our systems do not have 'kernel modules', we have the capability to free and reuse parts of the kernel after boot. Which means that when booting a networking kernel, most of the networking code space may be released - if desirable - in the startup process. Likewise, network buffers may be located anywhere in memory, not only in the kernel heap.
These changes and capabilities make the idea of a kernel TCP/IP implementation interesting and realistic instead of just a pipe dream. More than interesting in fact, more like tempting – in particular from a development perspective: Lot's of the many rather odd problems we've faced with ktcp over the years have originated in the architecture - events out of sync for reasons out of control. As illustrated by the one I've been dealing with today. An old bug occurring when a long listing is running in a telnet window and a new (incoming) connection being initiated. More often than not, the new connection would hang, the running telnet would continue but any other networking activity would cause ktcp to hang. The problem turns out to be in this pice of code - in tcp.c:
static void tcp_synrecv(struct iptcp_s *iptcp, struct tcpcb_s *cb)
{
struct tcphdr_s *h = iptcp->tcph;
if (h->flags & TF_RST)
cb->state = TS_LISTEN; /* FIXME: not valid, should dealloc extra CB */
else if ((h->flags & TF_ACK) == 0)
debug_tcp("tcp: NO ACK IN SYNRECV\n");
else {
cb->state = TS_ESTABLISHED;
debug_tcp("TS_ESTABLISHED\n");
usleep(1000L); <----- added today
tcpdev_notify_accept(cb);
tcp_established(iptcp, cb);
}
}
The call to tcpdev_notify_accept() is assumed to never fail, but it does because the socket we're trying to accept doesn't exist yet. The heavy telnet traffic has prevented the connecting client's ACK to the server's SYNACK from reaching the application level and we're stuck. There is no way to handle this - it's a technically impossible situation made possible by the current architecture. And -in this case - instead of figuring out a smart way to manoeuvre out of the situation, adding a 1ms delay fixes it for now, and possibly forever since it's the simplest solution and it's 'cheap'.
ktcp is full of situations like this, fixed in smart ways after hours of testing and debugging, all of them adding code - and stability, and all of them created by the architecture. A kernel implementation would eliminate most if not all of them.
Issues to discuss:
An interesting benefit that comes to mind is a lowered threshold for UDP support and a generalized ICMP implementation - which would simplify the porting of tools like ping and support for functions like ICMP redirects.
This discussion started as an off-topic digression in #238, triggered by benchmark results from the updated - now completely buffed - NIC drivers, an change in turn driven by the recently implemented TH/HB interrupt handler regime. (@ghaerr, @Mellvik)
Not a new discussion at all, it has popped in and out for years, usually triggered by benchmarks, which tend to draw attention to the big difference between outgoing and incoming file transfer speeds: Outgoing always beating incoming almost by a factor of two, even though outgoing packets are limited to 512 bytes payload, while incoming accepts whatever the sender chooses, usually like 1460 bytes. A 3:1 packet ratio by the looks of it, which is not entirely fair because
ktcpmost of the time (when part of a fast flow) ACKs every incoming packet of this size twice, the second to open up the window that was almost closed by the previous one. Still, the difference between packet count is at least 2:1 in the opposite direction, and the slowness of the incoming flow becomes interesting.While not measured empirically at low level, it makes sense to assume that the main reason is scheduling. The incoming packet flow is like
NIC_buffer->driver_buffer[kernel]->ktcp[app level]->socket-layer[kernel]->application[app layer]And while the outgoing flow is exactly the opposite, the outgoing flow is 'contiguous': 'take this data and get it going', the incoming flow is a series of queries and waits, as in 'is there any data available now?'. This query and wait flow gets seriously exacerbated by scheduling: The query gets passed along from level to level until it reaches the driver, while the availability of data (unless it was buffered) first 'kicks' the kernel, which 'kicks'
ktcp, which `kicks' it back to the kernel, which eventually kicks the app.While this - after innumerable hours of coding, testing and discussions over the years - now works amazingly well, it can never be really fast or efficient. The benefit is flexibility: having most of the networking code outside the kernel keeps (or contributes to keep) the kernel lean and mean, and makes it easy to NOT have networking resources consume meagre system resource when not needed.
The validity of this argument has diminished over the years. First, networking - even for systems like ELKS and TLVC - is no longer nice to have, but in many (most?) cases a requirement. Secondly, the 64k code limit for the kernel was broken long time ago. And while our systems do not have 'kernel modules', we have the capability to free and reuse parts of the kernel after boot. Which means that when booting a networking kernel, most of the networking code space may be released - if desirable - in the startup process. Likewise, network buffers may be located anywhere in memory, not only in the kernel heap.
These changes and capabilities make the idea of a kernel TCP/IP implementation interesting and realistic instead of just a pipe dream. More than interesting in fact, more like tempting – in particular from a development perspective: Lot's of the many rather odd problems we've faced with
ktcpover the years have originated in the architecture - events out of sync for reasons out of control. As illustrated by the one I've been dealing with today. An old bug occurring when a long listing is running in atelnetwindow and a new (incoming) connection being initiated. More often than not, the new connection would hang, the runningtelnetwould continue but any other networking activity would causektcpto hang. The problem turns out to be in this pice of code - intcp.c:The call to
tcpdev_notify_accept()is assumed to never fail, but it does because the socket we're trying to accept doesn't exist yet. The heavytelnettraffic has prevented the connecting client's ACK to the server's SYNACK from reaching the application level and we're stuck. There is no way to handle this - it's a technically impossible situation made possible by the current architecture. And -in this case - instead of figuring out a smart way to manoeuvre out of the situation, adding a 1ms delay fixes it for now, and possibly forever since it's the simplest solution and it's 'cheap'.ktcpis full of situations like this, fixed in smart ways after hours of testing and debugging, all of them adding code - and stability, and all of them created by the architecture. A kernel implementation would eliminate most if not all of them.Issues to discuss:
ktcpfor a whileAn interesting benefit that comes to mind is a lowered threshold for UDP support and a generalized ICMP implementation - which would simplify the porting of tools like
pingand support for functions like ICMP redirects.