# 汇编指令与Intrinsics指令的对应关系汇总

## 1、赋值指令：movq

`__m128i _mm_loadl_epi64 (__m128i const* mem_addr)`

#include "emmintrin.h"  //SSE2

Instruction: movq xmm, m64

CPUID Flags: SSE2

Load 64-bit integer from memory into the first element of dst.

```dst[63:0] := MEM[mem_addr+63:mem_addr]
dst[MAX:64] := 0```

`<pre name="code" class="html">__m128i _mm_add_epi16 (__m128i a, __m128i b)`

__m128i _mm_add_epi16 (__m128i a, __m128i b)

#include "emmintrin.h"

CPUID Flags: SSE2

Add packed 16-bit integers in a and b, and store the results in dst.

```<pre name="code" class="html">FOR j := 0 to 7
i := j*16
dst[i+15:i] := a[i+15:i] + b[i+15:i]
ENDFOR```

## 3、减法指令：psubw

`<pre name="code" class="html">__m128i _mm_sub_epi16 (__m128i a, __m128i b)`

__m128i _mm_sub_epi16 (__m128i a, __m128i b)

#include "emmintrin.h"

Instruction: psubw xmm, xmm

CPUID Flags: SSE2

Subtract packed 16-bit integers in b from packed 16-bit integers in a, and store the results in dst.

```<pre name="code" class="html">FOR j := 0 to 7
i := j*16
dst[i+15:i] := a[i+15:i] - b[i+15:i]
ENDFOR```

## 4、低64位按字交错排列指令：punpcklwd

`<pre name="code" class="html">__m128i _mm_unpacklo_epi16 (__m128i a, __m128i b)`

__m128i _mm_unpacklo_epi16 (__m128i a, __m128i b)

#include "emmintrin.h"

Instruction: punpcklwd xmm, xmm

CPUID Flags: SSE2

Unpack and interleave 16-bit integers from the low half of a and b, and store the results in dst.

```<pre name="code" class="html">INTERLEAVE_WORDS(src1[127:0], src2[127:0]){
dst[15:0] := src1[15:0]
dst[31:16] := src2[15:0]
dst[47:32] := src1[31:16]
dst[63:48] := src2[31:16]
dst[79:64] := src1[47:32]
dst[95:80] := src2[47:32]
dst[111:96] := src1[63:48]
dst[127:112] := src2[63:48]
RETURN dst[127:0]
}

dst[127:0] := INTERLEAVE_WORDS(a[127:0], b[127:0])```

## 5、高64位按字交错排列指令：punpckhwd

`<pre name="code" class="html">__m128i _mm_unpackhi_epi16 (__m128i a, __m128i b)`

__m128i _mm_unpackhi_epi16 (__m128i a, __m128i b)

#include "emmintrin.h"

Instruction: punpckhwd xmm, xmm

CPUID Flags: SSE2

Unpack and interleave 16-bit integers from the high half of a and b, and store the results in dst.

```<pre name="code" class="html">INTERLEAVE_HIGH_WORDS(src1[127:0], src2[127:0]){
dst[15:0] := src1[79:64]
dst[31:16] := src2[79:64]
dst[47:32] := src1[95:80]
dst[63:48] := src2[95:80]
dst[79:64] := src1[111:96]
dst[95:80] := src2[111:96]
dst[111:96] := src1[127:112]
dst[127:112] := src2[127:112]
RETURN dst[127:0]
}

dst[127:0] := INTERLEAVE_HIGH_WORDS(a[127:0], b[127:0])```

## 6、低64位按双字交错排列指令：punpckldq

`<pre name="code" class="html">__m128i _mm_unpacklo_epi32 (__m128i a, __m128i b)`

__m128i _mm_unpacklo_epi32 (__m128i a, __m128i b)

#include "emmintrin.h"

Instruction: punpckldq xmm, xmm

CPUID Flags: SSE2

Unpack and interleave 32-bit integers from the low half of a and b, and store the results in dst.

```<pre name="code" class="html">INTERLEAVE_DWORDS(src1[127:0], src2[127:0]){
dst[31:0] := src1[31:0]
dst[63:32] := src2[31:0]
dst[95:64] := src1[63:32]
dst[127:96] := src2[63:32]
RETURN dst[127:0]
}

dst[127:0] := INTERLEAVE_DWORDS(a[127:0], b[127:0])```

## 7、高64位按双字交错排列指令：punpckhdq

`<pre name="code" class="html">__m128i _mm_unpackhi_epi32 (__m128i a, __m128i b)`

__m128i _mm_unpackhi_epi32 (__m128i a, __m128i b)

#include "emmintrin.h"

Instruction: punpckhdq xmm, xmm

CPUID Flags: SSE2

Unpack and interleave 32-bit integers from the high half of a and b, and store the results in dst.

```<pre name="code" class="html">INTERLEAVE_HIGH_DWORDS(src1[127:0], src2[127:0]){
dst[31:0] := src1[95:64]
dst[63:32] := src2[95:64]
dst[95:64] := src1[127:96]
dst[127:96] := src2[127:96]
RETURN dst[127:0]
}

dst[127:0] := INTERLEAVE_HIGH_DWORDS(a[127:0], b[127:0])```