IY
Size: a a a
IY
АО
chriselrod: The "Fast instruction, slow instruction" would be a good place to show off llvm-mca.
chriselrod: I think approximate inverses are an interesting example. Several instructions combined (approximate inverse + a couple Newton-Raphson iterations to improve accuracy) are much faster than division.
But, that in practice it will often slow code down, and llvm-mca shows why both are the case:
Floating point division is a lot slower, but it uses the CPU's floating point divider. The approximate inverse and Newton-Raphson iterations use different execution ports. If your code is doing a lot of other (non-division) things at the same time, suddenly that slow division is "free" because it doesn't compete for execution resources, while the approximate inverse does.
АО
АО
AM
AZ
RS
АО
АО
АО
АО
АО
RS
RS
АО
АО
АО
АО
AP