The Alder Lake SHLX anomaly
(tavianator.com)
from tedu@inks.tedunangst.com to inks@inks.tedunangst.com on 03 Jan 09:54
https://inks.tedunangst.com/l/5146
from tedu@inks.tedunangst.com to inks@inks.tedunangst.com on 03 Jan 09:54
https://inks.tedunangst.com/l/5146
It seems like SHLX performs differently depending on how the shift count register is initialized. If you use a 64-bit instruction with an immediate, performance is slow. This is also true for instructions like INC (which is similar to ADD with a 1 immediate). On the other hand, 32-bit instructions, and 64-bit instructions without immediates (even no-op ones), make it fast. All of these ways to initialize RCX lead to 1-cycle latency:
threaded - newest