How We Improved the Performance of a Userspace TCP Stack in Go by 5X
(coder.com)
from tedu to programming on 06 Jun 2024 00:44
https://azorius.net/g/programming/p/ZHsl64223KL7Nj4qmh-How-We-Improved-the-Performance-of-a-Userspace-T
from tedu to programming on 06 Jun 2024 00:44
https://azorius.net/g/programming/p/ZHsl64223KL7Nj4qmh-How-We-Improved-the-Performance-of-a-Userspace-T
Fortunately, I’m far from the first person to have noticed this problem. Researchers Ha & Rhee described the issue and an algorithmic solution in their 2011 paper Taming the elephants: New TCP slow start. Their proposed algorithm, called HyStart (short for hybrid start), was eventually implemented in the Linux kernel and a slightly modified version (called HyStart++) implemented in Windows and described in an RFC. HyStart works by tracking slight variations in the round-trip-time to detect network congestion before packets are dropped.
gVisor, the TCP stack we use in Coder, had not implemented HyStart. Google uses gVisor in their data centers, and with extremely short RTTs between nodes, they hadn’t seen the stuttering TCP connections. But, they were very happy to accept a PR when I reached out, and so we implemented and upstreamed HyStart to gVisor earlier this spring.
If the gVisor stack can send data faster than WireGuard can encrypt, then something needs to put pressure on the stack to get it to back off, and as we discussed in the last section this is exactly what TCP congestion control is designed to do when it detects dropped packets.
However, this is all happening with the same process on a single node, not in some distributed system of interacting routers and switches that make up the Internet. In this context, dropping packets is a very heavyweight solution to the problem of putting back pressure on the TCP stack.
threaded - newest