autotldr@lemmings.world
on 07 Mar 2024 06:10
nextcollapse
This is the best summary I could come up with:
This effort has been around optimizing cacheline consumption and adding safeguards to ensure future changes don’t regress.
In turn this optimizing of core networking structures is causing TCP performance with many concurrent connections to increase by as much as 40% or more!
This patch series attempts to reorganize the core networking stack variables to minimize cacheline consumption during the phase of data transfer.
Meanwhile new Ethernet driver hardware support in Linux 6.8 includes the Octeon CN10K devices, Broadcom 5760X P7, Qualcomm SM8550 SoC, and Texas Instrument DP83TG720S PHY.
NVIDIA Mellanox Ethernet data center switches can also now enjoy firmware updates without a reboot.
The full list of new networking patches for the Linux 6.8 kernel merge window can be found via today’s pull request.
The original article contains 387 words, the summary contains 124 words. Saved 68%. I’m a bot and I’m open source!
just_another_person@lemmy.world
on 07 Mar 2024 08:23
nextcollapse
Thought from the headline this was going to be tcp_bbr related, but now. This is a welcome surprise.
acockworkorange@mander.xyz
on 07 Mar 2024 12:19
nextcollapse
Why is there a massive chip hanging precariously on a disorganized network rack?
BananaTrifleViolin@lemmy.world
on 07 Mar 2024 12:30
collapse
Because a picture speaks a thousand words. In this case it’s a thousand words of gibberish.
Reordering members can lead to better packing and a smaller memory footprint, due to how alignment works. If you’re iterating a large number of objects, having smaller objects is very favorable in terms of cache locality; you get fewer cache misses, and prefetching is more effective.
For the curious: pahole is a very useful tool for this type of code analysis.
markus99@lemmy.world
on 07 Mar 2024 21:40
nextcollapse
Not a surprise, considering the amount of data and processes the kernel manages.
Buddahriffic@lemmy.world
on 08 Mar 2024 05:14
collapse
Oh to increase cache hits?
Edit: Ok I read the article, yes more cache hits. It’s neat how they put more context for the title in the link in case one gets curious about it!
gerdesj@lemmy.ml
on 07 Mar 2024 22:45
nextcollapse
9th Jan …
“A hell of an improvement especially for the AMD EPYC servers”
Look closely at the stats in the headers of those three tables of test results. The NICs have different line speeds and the L3 cache sizes are different too. IPv4 and 6 for one and only IPv6 for the other.
Not exactly like for like!
themoken@startrek.website
on 08 Mar 2024 00:16
nextcollapse
This isn’t a benchmark of those systems, it’s showing that the code didn’t regress on either hardware set with some anecdotal data. It makes sense they’re not like for like.
wargreymon2023@sopuli.xyz
on 08 Mar 2024 05:54
nextcollapse
Okay, it is up to ~40%, but the underlying changes is fundamental.
quinkin@lemmy.world
on 08 Mar 2024 00:20
nextcollapse
Why would you compare between the tables? It’s the relative change in each line that is of interest.
Each table contains one column with the patches and one column without the patches - the hardware is unchanged. The different tables are to measure the impact of the patches across different hardware.
aBundleOfFerrets@sh.itjust.works
on 24 Apr 2024 22:20
collapse
threaded - newest
This is the best summary I could come up with:
This effort has been around optimizing cacheline consumption and adding safeguards to ensure future changes don’t regress.
In turn this optimizing of core networking structures is causing TCP performance with many concurrent connections to increase by as much as 40% or more!
This patch series attempts to reorganize the core networking stack variables to minimize cacheline consumption during the phase of data transfer.
Meanwhile new Ethernet driver hardware support in Linux 6.8 includes the Octeon CN10K devices, Broadcom 5760X P7, Qualcomm SM8550 SoC, and Texas Instrument DP83TG720S PHY.
NVIDIA Mellanox Ethernet data center switches can also now enjoy firmware updates without a reboot.
The full list of new networking patches for the Linux 6.8 kernel merge window can be found via today’s pull request.
The original article contains 387 words, the summary contains 124 words. Saved 68%. I’m a bot and I’m open source!
Thought from the headline this was going to be tcp_bbr related, but now. This is a welcome surprise.
Why is there a massive chip hanging precariously on a disorganized network rack?
Because a picture speaks a thousand words. In this case it’s a thousand words of gibberish.
I watched a video on this, the way they managed it was by reordering variables in structs. That’s kinda insane
Can you link video?
Probably this is the video www.youtube.com/watch?v=qo1FFNUVB-Q
Here is an alternative Piped link(s):
https://www.piped.video/watch?v=qo1FFNUVB-Q
Piped is a privacy-respecting open-source alternative frontend to YouTube.
I’m open-source; check me out at GitHub.
Reordering members can lead to better packing and a smaller memory footprint, due to how alignment works. If you’re iterating a large number of objects, having smaller objects is very favorable in terms of cache locality; you get fewer cache misses, and prefetching is more effective.
For the curious: pahole is a very useful tool for this type of code analysis.
Not a surprise, considering the amount of data and processes the kernel manages.
Oh to increase cache hits?
Edit: Ok I read the article, yes more cache hits. It’s neat how they put more context for the title in the link in case one gets curious about it!
9th Jan …
“A hell of an improvement especially for the AMD EPYC servers”
Look closely at the stats in the headers of those three tables of test results. The NICs have different line speeds and the L3 cache sizes are different too. IPv4 and 6 for one and only IPv6 for the other.
Not exactly like for like!
This isn’t a benchmark of those systems, it’s showing that the code didn’t regress on either hardware set with some anecdotal data. It makes sense they’re not like for like.
Okay, it is up to ~40%, but the underlying changes is fundamental.
Why would you compare between the tables? It’s the relative change in each line that is of interest.
Each table contains one column with the patches and one column without the patches - the hardware is unchanged. The different tables are to measure the impact of the patches across different hardware.
Good lord the comments on this one are a mess