image/svg+xmlPADDB/PADDW/PADDD/PADDQ—Add Packed IntegersOpcode/InstructionOp / En64/32 bit Mode SupportCPUID Feature FlagDescriptionNP 0F FC /r1PADDB mm, mm/m64AV/VMMXAdd packed byte integers from mm/m64 and mm.NP 0F FD /r1PADDW mm, mm/m64AV/VMMXAdd packed word integers from mm/m64 and mm.NP 0F FE /r1PADDD mm, mm/m64AV/VMMXAdd packed doubleword integers from mm/m64 and mm.NP 0F D4 /r1PADDQ mm, mm/m64AV/VMMXAdd packed quadword integers from mm/m64 and mm.66 0F FC /rPADDB xmm1, xmm2/m128AV/VSSE2Add packed byte integers from xmm2/m128 and xmm1.66 0F FD /rPADDW xmm1, xmm2/m128AV/VSSE2Add packed word integers from xmm2/m128 and xmm1.66 0F FE /rPADDD xmm1, xmm2/m128AV/VSSE2Add packed doubleword integers from xmm2/m128 and xmm1.66 0F D4 /rPADDQ xmm1, xmm2/m128AV/VSSE2Add packed quadword integers from xmm2/m128 and xmm1.VEX.128.66.0F.WIG FC /rVPADDB xmm1, xmm2, xmm3/m128BV/VAVXAdd packed byte integers from xmm2, and xmm3/m128 and store in xmm1.VEX.128.66.0F.WIG FD /rVPADDW xmm1, xmm2, xmm3/m128BV/VAVXAdd packed word integers from xmm2, xmm3/m128 and store in xmm1.VEX.128.66.0F.WIG FE /rVPADDD xmm1, xmm2, xmm3/m128BV/VAVXAdd packed doubleword integers from xmm2, xmm3/m128 and store in xmm1.VEX.128.66.0F.WIG D4 /rVPADDQ xmm1, xmm2, xmm3/m128BV/VAVXAdd packed quadword integers from xmm2, xmm3/m128 and store in xmm1.VEX.256.66.0F.WIG FC /rVPADDB ymm1, ymm2, ymm3/m256BV/VAVX2Add packed byte integers from ymm2, and ymm3/m256 and store in ymm1.VEX.256.66.0F.WIG FD /rVPADDW ymm1, ymm2, ymm3/m256BV/VAVX2Add packed word integers from ymm2, ymm3/m256 and store in ymm1.VEX.256.66.0F.WIG FE /rVPADDD ymm1, ymm2, ymm3/m256BV/VAVX2Add packed doubleword integers from ymm2, ymm3/m256 and store in ymm1.VEX.256.66.0F.WIG D4 /rVPADDQ ymm1, ymm2, ymm3/m256BV/VAVX2Add packed quadword integers from ymm2, ymm3/m256 and store in ymm1.EVEX.128.66.0F.WIG FC /rVPADDB xmm1 {k1}{z}, xmm2, xmm3/m128CV/VAVX512VLAVX512BWAdd packed byte integers from xmm2, and xmm3/m128 and store in xmm1 using writemask k1.EVEX.128.66.0F.WIG FD /rVPADDW xmm1 {k1}{z}, xmm2, xmm3/m128CV/VAVX512VLAVX512BWAdd packed word integers from xmm2, and xmm3/m128 and store in xmm1 using writemask k1.EVEX.128.66.0F.W0 FE /rVPADDD xmm1 {k1}{z}, xmm2, xmm3/m128/m32bcstDV/VAVX512VLAVX512FAdd packed doubleword integers from xmm2, and xmm3/m128/m32bcst and store in xmm1 using writemask k1.EVEX.128.66.0F.W1 D4 /rVPADDQ xmm1 {k1}{z}, xmm2, xmm3/m128/m64bcstDV/VAVX512VLAVX512FAdd packed quadword integers from xmm2, and xmm3/m128/m64bcst and store in xmm1 using writemask k1.EVEX.256.66.0F.WIG FC /rVPADDB ymm1 {k1}{z}, ymm2, ymm3/m256CV/VAVX512VLAVX512BWAdd packed byte integers from ymm2, and ymm3/m256 and store in ymm1 using writemask k1.EVEX.256.66.0F.WIG FD /rVPADDW ymm1 {k1}{z}, ymm2, ymm3/m256CV/VAVX512VLAVX512BWAdd packed word integers from ymm2, and ymm3/m256 and store in ymm1 using writemask k1.

image/svg+xmlInstruction Operand EncodingDescription Performs a SIMD add of the packed integers from the source operand (second operand) and the destination operand (first operand), and stores the packed integer results in the destination operand. See Figure 9-4 in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1, for an illustration of a SIMD operation. Overflow is handled with wraparound, as described in the following paragraphs.The PADDB and VPADDB instructions add packed byte integers from the first source operand and second source operand and store the packed integer results in the destination operand. When an individual result is too large to be represented in 8 bits (overflow), the result is wrapped around and the low 8 bits are written to the destination operand (that is, the carry is ignored).The PADDW and VPADDW instructions add packed word integers from the first source operand and second source operand and store the packed integer results in the destination operand. When an individual result is too large to be represented in 16 bits (overflow), the result is wrapped around and the low 16 bits are written to the destination operand (that is, the carry is ignored).The PADDD and VPADDD instructions add packed doubleword integers from the first source operand and second source operand and store the packed integer results in the destination operand. When an individual result is too large to be represented in 32 bits (overflow), the result is wrapped around and the low 32 bits are written to the destination operand (that is, the carry is ignored).The PADDQ and VPADDQ instructions add packed quadword integers from the first source operand and second source operand and store the packed integer results in the destination operand. When a quadword result is too EVEX.256.66.0F.W0 FE /rVPADDD ymm1 {k1}{z}, ymm2, ymm3/m256/m32bcstDV/VAVX512VLAVX512FAdd packed doubleword integers from ymm2, ymm3/m256/m32bcst and store in ymm1 using writemask k1.EVEX.256.66.0F.W1 D4 /rVPADDQ ymm1 {k1}{z}, ymm2, ymm3/m256/m64bcstDV/VAVX512VLAVX512FAdd packed quadword integers from ymm2, ymm3/m256/m64bcst and store in ymm1 using writemask k1.EVEX.512.66.0F.WIG FC /rVPADDB zmm1 {k1}{z}, zmm2, zmm3/m512CV/VAVX512BWAdd packed byte integers from zmm2, and zmm3/m512 and store in zmm1 using writemask k1.EVEX.512.66.0F.WIG FD /rVPADDW zmm1 {k1}{z}, zmm2, zmm3/m512CV/VAVX512BWAdd packed word integers from zmm2, and zmm3/m512 and store in zmm1 using writemask k1.EVEX.512.66.0F.W0 FE /rVPADDD zmm1 {k1}{z}, zmm2, zmm3/m512/m32bcst DV/VAVX512FAdd packed doubleword integers from zmm2, zmm3/m512/m32bcst and store in zmm1 using writemask k1.EVEX.512.66.0F.W1 D4 /rVPADDQ zmm1 {k1}{z}, zmm2, zmm3/m512/m64bcstDV/VAVX512FAdd packed quadword integers from zmm2, zmm3/m512/m64bcst and store in zmm1 using writemask k1.NOTES:1. See note in Section 2.4, “AVX and SSE Instruction Exception Specification” in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 2A and Section 22.25.3, “Exception Conditions of Legacy SIMD Instructions Operating on MMX Registers” in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3A.Op/EnTuple TypeOperand 1Operand 2Operand 3Operand 4ANAModRM:reg (r, w)ModRM:r/m (r)NANABNAModRM:reg (w)VEX.vvvv (r)ModRM:r/m (r)NACFull MemModRM:reg (w)EVEX.vvvv (r)ModRM:r/m (r)NADFullModRM:reg (w)EVEX.vvvv (r)ModRM:r/m (r)NAOpcode/InstructionOp / En64/32 bit Mode SupportCPUID Feature FlagDescription

image/svg+xmllarge to be represented in 64 bits (overflow), the result is wrapped around and the low 64 bits are written to the destination operand (that is, the carry is ignored).Note that the (V)PADDB, (V)PADDW, (V)PADDD and (V)PADDQ instructions can operate on either unsigned or signed (two's complement notation) packed integers; however, it does not set bits in the EFLAGS register to indi-cate overflow and/or a carry. To prevent undetected overflow conditions, software must control the ranges of values operated on.EVEX encoded VPADDD/Q: The first source operand is a ZMM/YMM/XMM register. The second source operand is a ZMM/YMM/XMM register, a 512/256/128-bit memory location or a 512/256/128-bit vector broadcasted from a 32/64-bit memory location. The destination operand is a ZMM/YMM/XMM register updated according to the writemask.EVEX encoded VPADDB/W: The first source operand is a ZMM/YMM/XMM register. The second source operand is a ZMM/YMM/XMM register, a 512/256/128-bit memory location. The destination operand is a ZMM/YMM/XMM register updated according to the writemask.VEX.256 encoded version: The first source operand is a YMM register. The second source operand is a YMM register or a 256-bit memory location. The destination operand is a YMM register. the upper bits (MAXVL-1:256) of the destination are cleared.VEX.128 encoded version: The first source operand is an XMM register. The second source operand is an XMM register or 128-bit memory location. The destination operand is an XMM register. The upper bits (MAXVL-1:128) of the corresponding ZMM register destination are zeroed.128-bit Legacy SSE version: The first source operand is an XMM register. The second operand can be an XMM register or an 128-bit memory location. The destination is not distinct from the first source XMM register and the upper bits (MAXVL-1:128) of the corresponding ZMM register destination are unmodified.OperationPADDB (with 64-bit operands)DEST[7:0] := DEST[7:0] + SRC[7:0]; (* Repeat add operation for 2nd through 7th byte *)DEST[63:56] := DEST[63:56] + SRC[63:56];PADDW (with 64-bit operands)DEST[15:0] := DEST[15:0] + SRC[15:0];(* Repeat add operation for 2nd and 3th word *)DEST[63:48] := DEST[63:48] + SRC[63:48];PADDD (with 64-bit operands)DEST[31:0] := DEST[31:0] + SRC[31:0];DEST[63:32] := DEST[63:32] + SRC[63:32];PADDQ (with 64-Bit operands)DEST[63:0] := DEST[63:0] + SRC[63:0];PADDB (Legacy SSE instruction)DEST[7:0] := DEST[7:0] + SRC[7:0]; (* Repeat add operation for 2nd through 15th byte *)DEST[127:120] := DEST[127:120] + SRC[127:120];DEST[MAXVL-1:128] (Unmodified)PADDW (Legacy SSE instruction)DEST[15:0] := DEST[15:0] + SRC[15:0];(* Repeat add operation for 2nd through 7th word *)DEST[127:112] := DEST[127:112] + SRC[127:112];DEST[MAXVL-1:128] (Unmodified)

image/svg+xmlPADDD (Legacy SSE instruction)DEST[31:0] := DEST[31:0] + SRC[31:0];(* Repeat add operation for 2nd and 3th doubleword *)DEST[127:96] := DEST[127:96] + SRC[127:96];DEST[MAXVL-1:128] (Unmodified)PADDQ (Legacy SSE instruction)DEST[63:0] := DEST[63:0] + SRC[63:0];DEST[127:64] := DEST[127:64] + SRC[127:64];DEST[MAXVL-1:128] (Unmodified)VPADDB (VEX.128 encoded instruction)DEST[7:0] := SRC1[7:0] + SRC2[7:0]; (* Repeat add operation for 2nd through 15th byte *)DEST[127:120] := SRC1[127:120] + SRC2[127:120];DEST[MAXVL-1:128] := 0;VPADDW (VEX.128 encoded instruction)DEST[15:0] := SRC1[15:0] + SRC2[15:0];(* Repeat add operation for 2nd through 7th word *)DEST[127:112] := SRC1[127:112] + SRC2[127:112];DEST[MAXVL-1:128] := 0;VPADDD (VEX.128 encoded instruction)DEST[31:0] := SRC1[31:0] + SRC2[31:0];(* Repeat add operation for 2nd and 3th doubleword *)DEST[127:96] := SRC1[127:96] + SRC2[127:96];DEST[MAXVL-1:128] := 0;VPADDQ (VEX.128 encoded instruction)DEST[63:0] := SRC1[63:0] + SRC2[63:0];DEST[127:64] := SRC1[127:64] + SRC2[127:64];DEST[MAXVL-1:128] := 0;VPADDB (VEX.256 encoded instruction)DEST[7:0] := SRC1[7:0] + SRC2[7:0]; (* Repeat add operation for 2nd through 31th byte *)DEST[255:248] := SRC1[255:248] + SRC2[255:248];VPADDW (VEX.256 encoded instruction)DEST[15:0] := SRC1[15:0] + SRC2[15:0];(* Repeat add operation for 2nd through 15th word *)DEST[255:240] := SRC1[255:240] + SRC2[255:240];VPADDD (VEX.256 encoded instruction)DEST[31:0] := SRC1[31:0] + SRC2[31:0];(* Repeat add operation for 2nd and 7th doubleword *)DEST[255:224] := SRC1[255:224] + SRC2[255:224];VPADDQ (VEX.256 encoded instruction)DEST[63:0] := SRC1[63:0] + SRC2[63:0];DEST[127:64] := SRC1[127:64] + SRC2[127:64];DEST[191:128] := SRC1[191:128] + SRC2[191:128];DEST[255:192] := SRC1[255:192] + SRC2[255:192];

image/svg+xmlVPADDB (EVEX encoded versions)(KL, VL) = (16, 128), (32, 256), (64, 512)FOR j := 0 TO KL-1i := j * 8IF k1[j] OR *no writemask*THEN DEST[i+7:i] := SRC1[i+7:i] + SRC2[i+7:i]ELSE IF *merging-masking*; merging-maskingTHEN *DEST[i+7:i] remains unchanged*ELSE *zeroing-masking*; zeroing-maskingDEST[i+7:i] = 0FIFI;ENDFOR;DEST[MAXVL-1:VL] := 0VPADDW (EVEX encoded versions)(KL, VL) = (8, 128), (16, 256), (32, 512)FOR j := 0 TO KL-1i := j * 16IF k1[j] OR *no writemask*THEN DEST[i+15:i] := SRC1[i+15:i] + SRC2[i+15:i]ELSE IF *merging-masking*; merging-maskingTHEN *DEST[i+15:i] remains unchanged*ELSE *zeroing-masking*; zeroing-maskingDEST[i+15:i] = 0FIFI;ENDFOR;DEST[MAXVL-1:VL] := 0VPADDD (EVEX encoded versions) (KL, VL) = (4, 128), (8, 256), (16, 512)FOR j := 0 TO KL-1i := j * 32IF k1[j] OR *no writemask*THEN IF (EVEX.b = 1) AND (SRC2 *is memory*)THEN DEST[i+31:i] := SRC1[i+31:i] + SRC2[31:0]ELSE DEST[i+31:i] := SRC1[i+31:i] + SRC2[i+31:i]FI;ELSE IF *merging-masking*; merging-maskingTHEN *DEST[i+31:i] remains unchanged*ELSE *zeroing-masking*; zeroing-maskingDEST[i+31:i] := 0FIFI;ENDFOR;DEST[MAXVL-1:VL] := 0

image/svg+xmlVPADDQ (EVEX encoded versions) (KL, VL) = (2, 128), (4, 256), (8, 512)FOR j := 0 TO KL-1i := j * 64IF k1[j] OR *no writemask*THEN IF (EVEX.b = 1) AND (SRC2 *is memory*)THEN DEST[i+63:i] := SRC1[i+63:i] + SRC2[63:0]ELSE DEST[i+63:i] := SRC1[i+63:i] + SRC2[i+63:i]FI;ELSE IF *merging-masking*; merging-maskingTHEN *DEST[i+63:i] remains unchanged*ELSE *zeroing-masking*; zeroing-maskingDEST[i+63:i] := 0FIFI;ENDFOR;DEST[MAXVL-1:VL] := 0Intel C/C++ Compiler Intrinsic EquivalentsVPADDB__m512i _mm512_add_epi8 ( __m512i a, __m512i b)VPADDW__m512i _mm512_add_epi16 ( __m512i a, __m512i b)VPADDB__m512i _mm512_mask_add_epi8 ( __m512i s, __mmask64 m, __m512i a, __m512i b)VPADDW__m512i _mm512_mask_add_epi16 ( __m512i s, __mmask32 m, __m512i a, __m512i b)VPADDB__m512i _mm512_maskz_add_epi8 (__mmask64 m, __m512i a, __m512i b)VPADDW__m512i _mm512_maskz_add_epi16 (__mmask32 m, __m512i a, __m512i b)VPADDB__m256i _mm256_mask_add_epi8 (__m256i s, __mmask32 m, __m256i a, __m256i b)VPADDW__m256i _mm256_mask_add_epi16 (__m256i s, __mmask16 m, __m256i a, __m256i b)VPADDB__m256i _mm256_maskz_add_epi8 (__mmask32 m, __m256i a, __m256i b)VPADDW__m256i _mm256_maskz_add_epi16 (__mmask16 m, __m256i a, __m256i b)VPADDB__m128i _mm_mask_add_epi8 (__m128i s, __mmask16 m, __m128i a, __m128i b)VPADDW__m128i _mm_mask_add_epi16 (__m128i s, __mmask8 m, __m128i a, __m128i b)VPADDB__m128i _mm_maskz_add_epi8 (__mmask16 m, __m128i a, __m128i b)VPADDW__m128i _mm_maskz_add_epi16 (__mmask8 m, __m128i a, __m128i b)VPADDD __m512i _mm512_add_epi32( __m512i a, __m512i b);VPADDD __m512i _mm512_mask_add_epi32(__m512i s, __mmask16 k, __m512i a, __m512i b);VPADDD __m512i _mm512_maskz_add_epi32( __mmask16 k, __m512i a, __m512i b);VPADDD __m256i _mm256_mask_add_epi32(__m256i s, __mmask8 k, __m256i a, __m256i b);VPADDD __m256i _mm256_maskz_add_epi32( __mmask8 k, __m256i a, __m256i b);VPADDD __m128i _mm_mask_add_epi32(__m128i s, __mmask8 k, __m128i a, __m128i b);VPADDD __m128i _mm_maskz_add_epi32( __mmask8 k, __m128i a, __m128i b);VPADDQ __m512i _mm512_add_epi64( __m512i a, __m512i b);VPADDQ __m512i _mm512_mask_add_epi64(__m512i s, __mmask8 k, __m512i a, __m512i b);VPADDQ __m512i _mm512_maskz_add_epi64( __mmask8 k, __m512i a, __m512i b);VPADDQ __m256i _mm256_mask_add_epi64(__m256i s, __mmask8 k, __m256i a, __m256i b);VPADDQ __m256i _mm256_maskz_add_epi64( __mmask8 k, __m256i a, __m256i b);VPADDQ __m128i _mm_mask_add_epi64(__m128i s, __mmask8 k, __m128i a, __m128i b);VPADDQ __m128i _mm_maskz_add_epi64( __mmask8 k, __m128i a, __m128i b);PADDB __m128i _mm_add_epi8 (__m128i a,__m128i b );PADDW __m128i _mm_add_epi16 ( __m128i a, __m128i b);PADDD __m128i _mm_add_epi32 ( __m128i a, __m128i b);PADDQ __m128i _mm_add_epi64 ( __m128i a, __m128i b);

image/svg+xmlVPADDB __m256i _mm256_add_epi8 (__m256ia,__m256i b );VPADDW __m256i _mm256_add_epi16 ( __m256i a, __m256i b);VPADDD __m256i _mm256_add_epi32 ( __m256i a, __m256i b);VPADDQ __m256i _mm256_add_epi64 ( __m256i a, __m256i b);PADDB __m64 _mm_add_pi8(__m64 m1, __m64 m2)PADDW __m64 _mm_add_pi16(__m64 m1, __m64 m2)PADDD __m64 _mm_add_pi32(__m64 m1, __m64 m2)PADDQ __m64 _mm_add_si64(__m64 m1, __m64 m2)SIMD Floating-Point ExceptionsNoneOther ExceptionsNon-EVEX-encoded instruction, see Table2-21, “Type 4 Class Exception Conditions”.EVEX-encoded VPADDD/Q, see Table2-49, “Type E4 Class Exception Conditions”.EVEX-encoded VPADDB/W, see Exceptions Type E4.nb in Table2-49, “Type E4 Class Exception Conditions”.

This UNOFFICIAL reference was generated from the official Intel® 64 and IA-32 Architectures Software Developer’s Manual by a dumb script. There is no guarantee that some parts aren't mangled or broken and is distributed WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.