Computing Adler32 Checksums at 41 GB/s
(wooo.sh)
from tedu@inks.tedunangst.com to inks@inks.tedunangst.com on 30 Apr 2024 04:32
https://inks.tedunangst.com/l/5113
from tedu@inks.tedunangst.com to inks@inks.tedunangst.com on 30 Apr 2024 04:32
https://inks.tedunangst.com/l/5113
While looking through the fpng source code, I noticed that its vectorized adler32 implementation seemed somewhat complicated, especially given how simple the scalar version of adler32 is. I was curious to see if I could come up with a simpler method, and in doing so, I came up with an algorithm that can be up to 7x faster than fpng’s version, and 109x faster than the simple scalar version.
threaded - newest