image/svg+xmlV4FMADDPS/V4FNMADDPS — Packed Single-Precision Floating-Point Fused Multiply-Add (4-iterations)Instruction Operand EncodingDescriptionThis instruction computes 4 sequential packed fused single-precision floating-point multiply-add instructions with a sequentially selected memory operand in each of the four steps.In the above box, the notation of “+3” is used to denote that the instruction accesses 4 source registers based on that operand; sources are consecutive, start in a multiple of 4 boundary, and contain the encoded register operand.This instruction supports memory fault suppression. The entire memory operand is loaded if any of the 16 lowest significant mask bits is set to 1 or if a “no masking” encoding is used.The tuple type Tuple1_4X implies that four 32-bit elements (16 bytes) are referenced by the memory operation portion of this instruction.Rounding is performed at every FMA (fused multiply and add) boundary. Exceptions are also taken sequentially. Pre- and post-computational exceptions of the first FMA take priority over the pre- and post-computational excep-tions of the second FMA, etc.Opcode/InstructionOp/En64/32 bit Mode SupportCPUID Feature FlagDescriptionEVEX.512.F2.0F38.W0 9A /rV4FMADDPS zmm1{k1}{z}, zmm2+3, m128AV/VAVX512_4FMAPSMultiply packed single-precision floating-point values from source register block indicated by zmm2 by values from m128 and accumulate the result in zmm1.EVEX.512.F2.0F38.W0 AA /rV4FNMADDPS zmm1{k1}{z}, zmm2+3, m128AV/VAVX512_4FMAPSMultiply and negate packed single-precision floating-point values from source register block indicated by zmm2 by values from m128 and accumulate the result in zmm1.Op/EnTupleOperand 1Operand 2Operand 3Operand 4ATuple1_4XModRM:reg (r, w)EVEX.vvvv (r)ModRM:r/m (r)NA

image/svg+xmlOperationsrc_reg_id is the 5 bit index of the vector register specified in the instruction as the src1 register.define NFMA_PS(kl, vl, dest, k1, msrc, regs_loaded, src_base, posneg):tmpdest := dest// reg[] is an array representing the SIMD register file.FOR j := 0 to regs_loaded-1:FOR i := 0 to kl-1:IF k1[i] or *no writemask*:IF posneg = 0:tmpdest.single[i] := RoundFPControl_MXCSR(tmpdest.single[i] - reg[src_base + j ].single[i] * msrc.single[j])ELSE:tmpdest.single[i] := RoundFPControl_MXCSR(tmpdest.single[i] + reg[src_base + j ].single[i] * msrc.single[j])ELSE IF *zeroing*:tmpdest.single[i] := 0dest := tmpdstdest[MAX_VL-1:VL] := 0V4FMADDPS and V4FNMADDPS dest{k1}, src1, msrc (AVX512)KL, VL = (16,512)regs_loaded := 4src_base := src_reg_id & ~3 // for src1 operandposneg := 0 if negative form, 1 otherwiseNFMA_PS(kl, vl, dest, k1, msrc, regs_loaded, src_base, posneg)Intel C/C++ Compiler Intrinsic EquivalentV4FMADDPS __m512 _mm512_4fmadd_ps( __m512, __m512x4, __m128 *);V4FMADDPS __m512 _mm512_mask_4fmadd_ps(__m512, __mmask16, __m512x4, __m128 *);V4FMADDPS __m512 _mm512_maskz_4fmadd_ps(__mmask16, __m512, __m512x4, __m128 *);V4FNMADDPS __m512 _mm512_4fnmadd_ps(__m512, __m512x4, __m128 *);V4FNMADDPS __m512 _mm512_mask_4fnmadd_ps(__m512, __mmask16, __m512x4, __m128 *);V4FNMADDPS __m512 _mm512_maskz_4fnmadd_ps(__mmask16, __m512, __m512x4, __m128 *);SIMD Floating-Point ExceptionsOverflow, Underflow, Invalid, Precision, Denormal.Other ExceptionsSee Type E2; additionally#UDIf the EVEX broadcast bit is set to 1.#UDIf the MODRM.mod = 0b11.

This UNOFFICIAL reference was generated from the official Intel® 64 and IA-32 Architectures Software Developer’s Manual by a dumb script. There is no guarantee that some parts aren't mangled or broken and is distributed WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.