Range Finding with Vector Instructions
Index
<<
>>
Intel SSE4.1 (2008)
pmins*
and
pmaxs*
vector instructions
works on 16 bytes at a time
L: pminsd xmm1,xmmword ptr [rax] pmaxsd xmm2,xmmword ptr [rax] add rax,10h dec rcx jne L
Timings:
serial
v
vector
1 byte ints
30.7
2 byte ints
11.4
4 byte ints
5.52