[Mesa-dev] [PATCH 04/22] i965/fs: add lowering step to duplicate sources with stride 0.

Matt Turner mattst88 at gmail.com
Wed Jan 11 04:37:36 UTC 2017


On Sun, Jan 8, 2017 at 10:53 PM, Matt Turner <mattst88 at gmail.com> wrote:
> On 01/05, Samuel Iglesias Gonsálvez wrote:
>>
>> From: "Juan A. Suarez Romero" <jasuarez at igalia.com>
>>
>> When dealing with DF uniforms with just 1 component, we set stride 0 to
>> use the value along the operation. However, when duplicating the
>> regioning parameters in IVB/VLV, we are violating the regioning
>> restrictions.
>>
>> So instead of using the value with stride 0, we just duplicate it in a
>> register, and then use the register instead, avoiding a DF with stride 0.
>> ---
>> src/mesa/drivers/dri/i965/brw_fs.cpp | 63
>> ++++++++++++++++++++++++++++++++++++
>> src/mesa/drivers/dri/i965/brw_fs.h   |  1 +
>> 2 files changed, 64 insertions(+)
>>
>> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp
>> b/src/mesa/drivers/dri/i965/brw_fs.cpp
>> index eb3b4aa..78f2124 100644
>> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
>> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
>> @@ -2168,6 +2168,62 @@ fs_visitor::lower_constant_loads()
>>    invalidate_live_intervals();
>> }
>>
>> +/**
>> + * When dealing with double-precision floats (DF) in IVB/VLV, we need to
>> + * duplicate the regioning parameters. This means that for a DF scalar
>> + * (regioning <0,1,0>) we will end up using regioning <0,2,0>. But
>> according
>> + * to General Restrictions on Regioning Parameters (Ivy PRM, Vol. 4 Part
>> 3,
>> + * page 69), if VertStride = HorzStride = 0, Width must be 1 regardless
>> of the
>> + * value of ExecSize. So we would be violating the restriction. To
>> overcome
>> + * it, this lowering step duplicates the scalar in a couple of registers,
>> + * reading it as two floats to avoid the restriction.
>
>
> Huh, I would have thought that a <0,1,0>DF region would have done what
> we wanted, without the need to double any of the region parameters.
>
> I haven't tested yet, so I'll play with it tomorrow and see if it blows
> up.

Indeed, it looks like gX<0,2,1>DF works for scalar sources.

The following patch seems to work for me in place of 04/22.

I would expect if we had an instruction like

add(16)  gX<1>DF  gY<8,8,1>DF  gZ<0,2,1>DF

that gX would write 2 registers and gY would read two registers, but
gZ would read only one register -- and that would violate the
restriction that an instruction that writes two registers must also
read two registers.

But it seems that we never generate a DF instruction with exec_size=16
because of

4554       /* Note that in IVB/VLV for instructions that handles DF,
we will duplicate
4555        * the exec_size. So take this value for calculus purposes.
4556        */
4557       unsigned exec_size = inst->exec_size;
4558       if (devinfo->gen == 7 && !devinfo->is_haswell &&
inst->exec_data_size() == 8)
4559          exec_size *= 2;

... which I think Curro NAK'd elsewhere.

I think what we want is to use 0,2,1 regions for scalar sources, and
split exec_size=16 (i.e., 8 DFs) instructions when they use scalar
sources in order to avoid violating the restriction. Otherwise, we
should emit exec_size=16 DF instructions where possible.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-i965-Use-0-2-1-region-for-scalar-DF-sources-on-IVB-B.patch
Type: text/x-patch
Size: 3052 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20170110/9a55642f/attachment-0001.bin>


More information about the mesa-dev mailing list