The latest developments in CPU architecture have raised questions about the implementation of integer multiplication instructions. A recent blog post by fgiesen explores possible explanations for the peculiar characteristics of these instructions. According to the author, Intel's big cores may be using a 16-bit multiplier-centric datapath with two uops that are slight variations of the PMADDWD flow. This design would be consistent with available data and the somewhat odd characteristics of the instruction.
However, the author notes that this design is not necessarily the most efficient or straightforward approach. In fact, AMD has taken a different route by incorporating actual 32x32->32 multipliers in their CPUs. The blog post also touches on the origins of VNNI instructions and the potential influence of AltiVec-era instruction patents.
Source: https://fgiesen.wordpress.com/2024/10/26/why-those-particular-integer-multiplies/