Range Finding with Vector Instructions Index   <<   >>

 
Intel SSE4.1 (2008) pmins* and pmaxs* vector instructions
works on 16 bytes at a time

L:  pminsd  xmm1,xmmword ptr [rax]
    pmaxsd  xmm2,xmmword ptr [rax]
    add     rax,10h
    dec     rcx
    jne     L

Timings:
 
serial v vector
  1 byte ints   30.7
  2 byte ints   11.4
  4 byte ints   5.52