image/svg+xml VPSHLDV — Concatenate and Variable Shift Packed Data Left Logical Instruction Operand Encoding Description Concatenate packed data, extract result shifted to the left by variable value. This instruction supports memory fault suppression. Opcode/ Instruction Op/ En 64/32 bit Mode Support CPUID Feature Flag Description EVEX.128.66.0F38.W1 70 /r VPSHLDVW xmm1{k1}{z}, xmm2, xmm3/m128 AV/VAVX512_VBMI2 AVX512VL Concatenate xmm1 and xmm2, extract result shifted to the left by value in xmm3/m128 into xmm1. EVEX.256.66.0F38.W1 70 /r VPSHLDVW ymm1{k1}{z}, ymm2, ymm3/m256 AV/VAVX512_VBMI2 AVX512VL Concatenate ymm1 and ymm2, extract result shifted to the left by value in xmm3/m256 into ymm1. EVEX.512.66.0F38.W1 70 /r VPSHLDVW zmm1{k1}{z}, zmm2, zmm3/m512 AV/VAVX512_VBMI2Concatenate zmm1 and zmm2, extract result shifted to the left by value in zmm3/m512 into zmm1. EVEX.128.66.0F38.W0 71 /r VPSHLDVD xmm1{k1}{z}, xmm2, xmm3/m128/m32bcst BV/VAVX512_VBMI2 AVX512VL Concatenate xmm1 and xmm2, extract result shifted to the left by value in xmm3/m128 into xmm1. EVEX.256.66.0F38.W0 71 /r VPSHLDVD ymm1{k1}{z}, ymm2, ymm3/m256/m32bcst BV/VAVX512_VBMI2 AVX512VL Concatenate ymm1 and ymm2, extract result shifted to the left by value in xmm3/m256 into ymm1. EVEX.512.66.0F38.W0 71 /r VPSHLDVD zmm1{k1}{z}, zmm2, zmm3/m512/m32bcst BV/VAVX512_VBMI2Concatenate zmm1 and zmm2, extract result shifted to the left by value in zmm3/m512 into zmm1. EVEX.128.66.0F38.W1 71 /r VPSHLDVQ xmm1{k1}{z}, xmm2, xmm3/m128/m64bcst BV/VAVX512_VBMI2 AVX512VL Concatenate xmm1 and xmm2, extract result shifted to the left by value in xmm3/m128 into xmm1. EVEX.256.66.0F38.W1 71 /r VPSHLDVQ ymm1{k1}{z}, ymm2, ymm3/m256/m64bcst BV/VAVX512_VBMI2 AVX512VL Concatenate ymm1 and ymm2, extract result shifted to the left by value in xmm3/m256 into ymm1. EVEX.512.66.0F38.W1 71 /r VPSHLDVQ zmm1{k1}{z}, zmm2, zmm3/m512/m64bcst BV/VAVX512_VBMI2Concatenate zmm1 and zmm2, extract result shifted to the left by value in zmm3/m512 into zmm1. Op/EnTupleOperand 1Operand 2Operand 3Operand 4 AFull MemModRM:reg (r, w)EVEX.vvvv (r)ModRM:r/m (r)NA BFullModRM:reg (r, w)EVEX.vvvv (r)ModRM:r/m (r)NA image/svg+xml Operation FUNCTION concat(a,b): IF words: d.word[1] := a d.word[0] := b return d ELSE IF dwords: q.dword[1] := a q.dword[0] := b return q ELSE IF qwords: o.qword[1] := a o.qword[0] := b return o VPSHLDVW DEST, SRC2, SRC3 (KL, VL) = (8, 128), (16, 256), (32, 512) FOR j := 0 TO KL-1: IF MaskBit(j) OR *no writemask*: tmp := concat(DEST.word[j], SRC2.word[j]) << (SRC3.word[j] & 15) DEST.word[j] := tmp.word[1] ELSE IF *zeroing*: DEST.word[j] := 0 *ELSE DEST.word[j] remains unchanged* DEST[MAX_VL-1:VL] := 0 VPSHLDVD DEST, SRC2, SRC3 (KL, VL) = (4, 128), (8, 256), (16, 512) FOR j := 0 TO KL-1: IF SRC3 is broadcast memop: tsrc3 := SRC3.dword[0] ELSE: tsrc3 := SRC3.dword[j] IF MaskBit(j) OR *no writemask*: tmp := concat(DEST.dword[j], SRC2.dword[j]) << (tsrc3 & 31) DEST.dword[j] := tmp.dword[1] ELSE IF *zeroing*: DEST.dword[j] := 0 *ELSE DEST.dword[j] remains unchanged* DEST[MAX_VL-1:VL] := 0 image/svg+xml VPSHLDVQ DEST, SRC2, SRC3 (KL, VL) = (2, 128), (4, 256), (8, 512) FOR j := 0 TO KL-1: IF SRC3 is broadcast memop: tsrc3 := SRC3.qword[0] ELSE: tsrc3 := SRC3.qword[j] IF MaskBit(j) OR *no writemask*: tmp := concat(DEST.qword[j], SRC2.qword[j]) << (tsrc3 & 63) DEST.qword[j] := tmp.qword[1] ELSE IF *zeroing*: DEST.qword[j] := 0 *ELSE DEST.qword[j] remains unchanged* DEST[MAX_VL-1:VL] := 0 Intel C/C++ Compiler Intrinsic Equivalent VPSHLDVW __m128i _mm_shldv_epi16(__m128i, __m128i, __m128i); VPSHLDVW __m128i _mm_mask_shldv_epi16(__m128i, __mmask8, __m128i, __m128i); VPSHLDVW __m128i _mm_maskz_shldv_epi16(__mmask8, __m128i, __m128i, __m128i); VPSHLDVW __m256i _mm256_shldv_epi16(__m256i, __m256i, __m256i); VPSHLDVW __m256i _mm256_mask_shldv_epi16(__m256i, __mmask16, __m256i, __m256i); VPSHLDVW __m256i _mm256_maskz_shldv_epi16(__mmask16, __m256i, __m256i, __m256i); VPSHLDVQ __m512i _mm512_shldv_epi64(__m512i, __m512i, __m512i); VPSHLDVQ __m512i _mm512_mask_shldv_epi64(__m512i, __mmask8, __m512i, __m512i); VPSHLDVQ __m512i _mm512_maskz_shldv_epi64(__mmask8, __m512i, __m512i, __m512i); VPSHLDVW __m128i _mm_shldv_epi16(__m128i, __m128i, __m128i); VPSHLDVW __m128i _mm_mask_shldv_epi16(__m128i, __mmask8, __m128i, __m128i); VPSHLDVW __m128i _mm_maskz_shldv_epi16(__mmask8, __m128i, __m128i, __m128i); VPSHLDVW __m256i _mm256_shldv_epi16(__m256i, __m256i, __m256i); VPSHLDVW __m256i _mm256_mask_shldv_epi16(__m256i, __mmask16, __m256i, __m256i); VPSHLDVW __m256i _mm256_maskz_shldv_epi16(__mmask16, __m256i, __m256i, __m256i); VPSHLDVW __m512i _mm512_shldv_epi16(__m512i, __m512i, __m512i); VPSHLDVW __m512i _mm512_mask_shldv_epi16(__m512i, __mmask32, __m512i, __m512i); VPSHLDVW __m512i _mm512_maskz_shldv_epi16(__mmask32, __m512i, __m512i, __m512i); VPSHLDVD __m128i _mm_shldv_epi32(__m128i, __m128i, __m128i); VPSHLDVD __m128i _mm_mask_shldv_epi32(__m128i, __mmask8, __m128i, __m128i); VPSHLDVD __m128i _mm_maskz_shldv_epi32(__mmask8, __m128i, __m128i, __m128i); VPSHLDVD __m256i _mm256_shldv_epi32(__m256i, __m256i, __m256i); VPSHLDVD __m256i _mm256_mask_shldv_epi32(__m256i, __mmask8, __m256i, __m256i); VPSHLDVD __m256i _mm256_maskz_shldv_epi32(__mmask8, __m256i, __m256i, __m256i); VPSHLDVD __m512i _mm512_shldv_epi32(__m512i, __m512i, __m512i); VPSHLDVD __m512i _mm512_mask_shldv_epi32(__m512i, __mmask16, __m512i, __m512i); VPSHLDVD __m512i _mm512_maskz_shldv_epi32(__mmask16, __m512i, __m512i, __m512i); SIMD Floating-Point Exceptions None. Other Exceptions See Type E4. This UNOFFICIAL reference was generated from the official Intel® 64 and IA-32 Architectures Software Developer’s Manual by a dumb script. There is no guarantee that some parts aren't mangled or broken and is distributed WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE .