image/svg+xmlVPDPBUSDS — Multiply and Add Unsigned and Signed Bytes with SaturationInstruction Operand EncodingDescriptionMultiplies the individual unsigned bytes of the first source operand by the corresponding signed bytes of the second source operand, producing intermediate signed word results. The word results are then summed and accumulated in the destination dword element size operand. If the intermediate sum overflows a 32b signed number the result is saturated to either 0x7FFF_FFFF for positive numbers of 0x8000_0000 for negative numbers.This instruction supports memory fault suppression.Opcode/InstructionOp/En64/32 bit Mode SupportCPUID Feature FlagDescriptionVEX.128.66.0F38.W0 51 /rVPDPBUSDS xmm1, xmm2, xmm3/m128AV/VAVX-VNNIMultiply groups of 4 pairs signed bytes in xmm3/m128 with corresponding unsigned bytes of xmm2, summing those products and adding them to doubleword result, with signed saturation in xmm1.VEX.256.66.0F38.W0 51 /rVPDPBUSDS ymm1, ymm2, ymm3/m256AV/VAVX-VNNIMultiply groups of 4 pairs signed bytes in ymm3/m256 with corresponding unsigned bytes of ymm2, summing those products and adding them to doubleword result, with signed saturation in ymm1.EVEX.128.66.0F38.W0 51 /rVPDPBUSDS xmm1{k1}{z}, xmm2, xmm3/m128/m32bcstBV/VAVX512_VNNIAVX512VLMultiply groups of 4 pairs signed bytes in xmm3/m128/m32bcst with corresponding unsigned bytes of xmm2, summing those products and adding them to doubleword result, with signed saturation in xmm1, under writemask k1.EVEX.256.66.0F38.W0 51 /rVPDPBUSDS ymm1{k1}{z}, ymm2, ymm3/m256/m32bcstBV/VAVX512_VNNIAVX512VLMultiply groups of 4 pairs signed bytes in ymm3/m256/m32bcst with corresponding unsigned bytes of ymm2, summing those products and adding them to doubleword result, with signed saturation in ymm1, under writemask k1.EVEX.512.66.0F38.W0 51 /rVPDPBUSDS zmm1{k1}{z}, zmm2, zmm3/m512/m32bcstBV/VAVX512_VNNIMultiply groups of 4 pairs signed bytes in zmm3/m512/m32bcst with corresponding unsigned bytes of zmm2, summing those products and adding them to doubleword result, with signed saturation in zmm1, under writemask k1.Op/EnTupleOperand 1Operand 2Operand 3Operand 4ANAModRM:reg (r, w)VEX.vvvv (r)ModRM:r/m (r)NABFullModRM:reg (r, w)EVEX.vvvv (r)ModRM:r/m (r)NA

image/svg+xmlOperationVPDPBUSDS dest, src1, src2 (VEX encoded versions)VL=(128, 256)KL=VL/32ORIGDEST := DESTFOR i := 0 TO KL-1:// Extending to 16b // src1extend := ZERO_EXTEND// src2extend := SIGN_EXTENDp1word := src1extend(SRC1.byte[4*i+0]) * src2extend(SRC2.byte[4*i+0])p2word := src1extend(SRC1.byte[4*i+1]) * src2extend(SRC2.byte[4*i+1])p3word := src1extend(SRC1.byte[4*i+2]) * src2extend(SRC2.byte[4*i+2])p4word := src1extend(SRC1.byte[4*i+3]) * src2extend(SRC2.byte[4*i+3])DEST.dword[i] := SIGNED_DWORD_SATURATE(ORIGDEST.dword[i] + p1word + p2word + p3word + p4word)DEST[MAX_VL-1:VL] := 0VPDPBUSDS dest, src1, src2 (EVEX encoded versions)(KL,VL)=(4,128), (8,256), (16,512)ORIGDEST := DESTFOR i := 0 TO KL-1:IF k1[i] or *no writemask*:// Byte elements of SRC1 are zero-extended to 16b and// byte elements of SRC2 are sign extended to 16b before multiplication.IF SRC2 is memory and EVEX.b == 1:t := SRC2.dword[0]ELSE:t := SRC2.dword[i]p1word := ZERO_EXTEND(SRC1.byte[4*i]) * SIGN_EXTEND(t.byte[0])p2word := ZERO_EXTEND(SRC1.byte[4*i+1]) * SIGN_EXTEND(t.byte[1])p3word := ZERO_EXTEND(SRC1.byte[4*i+2]) * SIGN_EXTEND(t.byte[2])p4word := ZERO_EXTEND(SRC1.byte[4*i+3]) *SIGN_EXTEND(t.byte[3])DEST.dword[i] := SIGNED_DWORD_SATURATE(ORIGDEST.dword[i] + p1word + p2word + p3word + p4word)ELSE IF *zeroing*:DEST.dword[i] := 0ELSE: // Merge masking, dest element unchangedDEST.dword[i] := ORIGDEST.dword[i]DEST[MAX_VL-1:VL] := 0Intel C/C++ Compiler Intrinsic EquivalentVPDPBUSDS __m128i _mm_dpbusds_avx_epi32(__m128i, __m128i, __m128i);VPDPBUSDS __m128i _mm_dpbusds_epi32(__m128i, __m128i, __m128i);VPDPBUSDS __m128i _mm_mask_dpbusds_epi32(__m128i, __mmask8, __m128i, __m128i);VPDPBUSDS __m128i _mm_maskz_dpbusds_epi32(__mmask8, __m128i, __m128i, __m128i);VPDPBUSDS __m256i _mm256_dpbusds_avx_epi32(__m256i, __m256i, __m256i);VPDPBUSDS __m256i _mm256_dpbusds_epi32(__m256i, __m256i, __m256i);VPDPBUSDS __m256i _mm256_mask_dpbusds_epi32(__m256i, __mmask8, __m256i, __m256i);VPDPBUSDS __m256i _mm256_maskz_dpbusds_epi32(__mmask8, __m256i, __m256i, __m256i);VPDPBUSDS __m512i _mm512_dpbusds_epi32(__m512i, __m512i, __m512i);VPDPBUSDS __m512i _mm512_mask_dpbusds_epi32(__m512i, __mmask16, __m512i, __m512i);VPDPBUSDS __m512i _mm512_maskz_dpbusds_epi32(__mmask16, __m512i, __m512i, __m512i);

image/svg+xmlSIMD Floating-Point ExceptionsNone.Other ExceptionsNon-EVEX-encoded instruction, see Table2-21, “Type 4 Class Exception Conditions”.EVEX-encoded instruction, see Table2-49, “Type E4 Class Exception Conditions”.

This UNOFFICIAL reference was generated from the official Intel® 64 and IA-32 Architectures Software Developer’s Manual by a dumb script. There is no guarantee that some parts aren't mangled or broken and is distributed WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.