[Mesa-dev] [PATCH 04/22] i965/fs: add lowering step to duplicate sources with stride 0.
Francisco Jerez
currojerez at riseup.net
Wed Jan 11 06:07:00 UTC 2017
Hi Matt,
Matt Turner <mattst88 at gmail.com> writes:
> On Sun, Jan 8, 2017 at 10:53 PM, Matt Turner <mattst88 at gmail.com> wrote:
>> On 01/05, Samuel Iglesias Gonsálvez wrote:
>>>
>>> From: "Juan A. Suarez Romero" <jasuarez at igalia.com>
>>>
>>> When dealing with DF uniforms with just 1 component, we set stride 0 to
>>> use the value along the operation. However, when duplicating the
>>> regioning parameters in IVB/VLV, we are violating the regioning
>>> restrictions.
>>>
>>> So instead of using the value with stride 0, we just duplicate it in a
>>> register, and then use the register instead, avoiding a DF with stride 0.
>>> ---
>>> src/mesa/drivers/dri/i965/brw_fs.cpp | 63
>>> ++++++++++++++++++++++++++++++++++++
>>> src/mesa/drivers/dri/i965/brw_fs.h | 1 +
>>> 2 files changed, 64 insertions(+)
>>>
>>> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp
>>> b/src/mesa/drivers/dri/i965/brw_fs.cpp
>>> index eb3b4aa..78f2124 100644
>>> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
>>> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
>>> @@ -2168,6 +2168,62 @@ fs_visitor::lower_constant_loads()
>>> invalidate_live_intervals();
>>> }
>>>
>>> +/**
>>> + * When dealing with double-precision floats (DF) in IVB/VLV, we need to
>>> + * duplicate the regioning parameters. This means that for a DF scalar
>>> + * (regioning <0,1,0>) we will end up using regioning <0,2,0>. But
>>> according
>>> + * to General Restrictions on Regioning Parameters (Ivy PRM, Vol. 4 Part
>>> 3,
>>> + * page 69), if VertStride = HorzStride = 0, Width must be 1 regardless
>>> of the
>>> + * value of ExecSize. So we would be violating the restriction. To
>>> overcome
>>> + * it, this lowering step duplicates the scalar in a couple of registers,
>>> + * reading it as two floats to avoid the restriction.
>>
>>
>> Huh, I would have thought that a <0,1,0>DF region would have done what
>> we wanted, without the need to double any of the region parameters.
>>
>> I haven't tested yet, so I'll play with it tomorrow and see if it blows
>> up.
>
> Indeed, it looks like gX<0,2,1>DF works for scalar sources.
>
> The following patch seems to work for me in place of 04/22.
>
> I would expect if we had an instruction like
>
> add(16) gX<1>DF gY<8,8,1>DF gZ<0,2,1>DF
>
> that gX would write 2 registers and gY would read two registers, but
> gZ would read only one register -- and that would violate the
> restriction that an instruction that writes two registers must also
> read two registers.
>
> But it seems that we never generate a DF instruction with exec_size=16
> because of
>
> 4554 /* Note that in IVB/VLV for instructions that handles DF,
> we will duplicate
> 4555 * the exec_size. So take this value for calculus purposes.
> 4556 */
> 4557 unsigned exec_size = inst->exec_size;
> 4558 if (devinfo->gen == 7 && !devinfo->is_haswell &&
> inst->exec_data_size() == 8)
> 4559 exec_size *= 2;
>
That's not the reason they aren't generated. The hunk above had the
opposite of the effect they intended, and didn't double the execution
size value used by any regioning restriction calculation, it only raised
the upper bound that the resulting lowered execution size is clamped to.
That's about half of the reason I NAK'ed the patch.
The reason they aren't generating any 16-wide double float instructions
is that they would violate other regioning restrictions implemented in
get_fpu_lowered_simd_width(), in particular the one that forces
non-force_writemask_all double float instructions to SIMD4 due to the
Gen7 exec mask decompression bug.
> ... which I think Curro NAK'd elsewhere.
>
> I think what we want is to use 0,2,1 regions for scalar sources, and
> split exec_size=16 (i.e., 8 DFs) instructions when they use scalar
> sources in order to avoid violating the restriction. Otherwise, we
> should emit exec_size=16 DF instructions where possible.
Agreed -- And thanks for testing it out on real hardware. We'll
definitely need a new restriction check in get_fpu_lowered_simd_width()
that clamps to 4 the execution size of any instruction with a DF scalar
source on pre-HSW hardware for the patch below to be fully correct.
That way the SIMD lowering pass will get rid of any compressed
instructions with scalar source (whether force_writemask_all or not) and
we avoid tickling the Gen7 region decompression bug.
> From 67bab74148e110171b8f3a999dc1d79616013108 Mon Sep 17 00:00:00 2001
> From: Matt Turner <mattst88 at gmail.com>
> Date: Tue, 10 Jan 2017 19:33:22 -0800
> Subject: [PATCH] i965: Use <0,2,1> region for scalar DF sources on IVB/BYT.
>
> On HSW+, scalar DF sources can be accessed using the normal <0,1,0>
> region, but on IVB and BYT DF regions must be programmed in terms of
> floats. A <0,2,1> region accomplishes this.
> ---
> src/mesa/drivers/dri/i965/brw_eu_emit.c | 26 ++++++++++++++++++++------
> 1 file changed, 20 insertions(+), 6 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c b/src/mesa/drivers/dri/i965/brw_eu_emit.c
> index 2843385..5f81b7a 100644
> --- a/src/mesa/drivers/dri/i965/brw_eu_emit.c
> +++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c
> @@ -495,9 +495,16 @@ brw_set_src0(struct brw_codegen *p, brw_inst *inst, struct brw_reg reg)
> brw_inst_set_src0_width(devinfo, inst, BRW_WIDTH_1);
> brw_inst_set_src0_vstride(devinfo, inst, BRW_VERTICAL_STRIDE_0);
> } else {
> - brw_inst_set_src0_hstride(devinfo, inst, reg.hstride);
> - brw_inst_set_src0_width(devinfo, inst, reg.width);
> - brw_inst_set_src0_vstride(devinfo, inst, reg.vstride);
> + if (devinfo->gen == 7 && !devinfo->is_haswell &&
> + reg.type == BRW_REGISTER_TYPE_DF && has_scalar_region(reg)) {
> + brw_inst_set_src0_vstride(devinfo, inst, BRW_VERTICAL_STRIDE_0);
> + brw_inst_set_src0_width(devinfo, inst, BRW_WIDTH_2);
> + brw_inst_set_src0_hstride(devinfo, inst, BRW_HORIZONTAL_STRIDE_1);
> + } else {
> + brw_inst_set_src0_vstride(devinfo, inst, reg.vstride);
> + brw_inst_set_src0_width(devinfo, inst, reg.width);
> + brw_inst_set_src0_hstride(devinfo, inst, reg.hstride);
> + }
> }
> } else {
> brw_inst_set_src0_da16_swiz_x(devinfo, inst,
> @@ -577,9 +584,16 @@ brw_set_src1(struct brw_codegen *p, brw_inst *inst, struct brw_reg reg)
> brw_inst_set_src1_width(devinfo, inst, BRW_WIDTH_1);
> brw_inst_set_src1_vstride(devinfo, inst, BRW_VERTICAL_STRIDE_0);
> } else {
> - brw_inst_set_src1_hstride(devinfo, inst, reg.hstride);
> - brw_inst_set_src1_width(devinfo, inst, reg.width);
> - brw_inst_set_src1_vstride(devinfo, inst, reg.vstride);
> + if (devinfo->gen == 7 && !devinfo->is_haswell &&
> + reg.type == BRW_REGISTER_TYPE_DF && has_scalar_region(reg)) {
> + brw_inst_set_src1_vstride(devinfo, inst, BRW_VERTICAL_STRIDE_0);
> + brw_inst_set_src1_width(devinfo, inst, BRW_WIDTH_2);
> + brw_inst_set_src1_hstride(devinfo, inst, BRW_HORIZONTAL_STRIDE_1);
> + } else {
> + brw_inst_set_src1_vstride(devinfo, inst, reg.vstride);
> + brw_inst_set_src1_width(devinfo, inst, reg.width);
> + brw_inst_set_src1_hstride(devinfo, inst, reg.hstride);
> + }
> }
> } else {
> brw_inst_set_src1_da16_swiz_x(devinfo, inst,
> --
> 2.7.3
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 212 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20170110/b7333592/attachment.sig>
More information about the mesa-dev
mailing list