[Mesa-dev] [PATCH 7/9] i965/vec4: avoid dependency control around Align1 instructions

Thu Nov 19 11:35:27 PST 2015

On Thu, Nov 19, 2015 at 2:07 PM, Connor Abbott <cwabbott0 at gmail.com> wrote:
> On Thu, Nov 19, 2015 at 1:54 PM, Matt Turner <mattst88 at gmail.com> wrote:
>> On Thu, Nov 19, 2015 at 7:31 AM, Connor Abbott <cwabbott0 at gmail.com> wrote:
>>> On Thu, Nov 19, 2015 at 6:40 AM, Matt Turner <mattst88 at gmail.com> wrote:
>>>> On Thu, Nov 19, 2015 at 2:05 AM, Iago Toral Quiroga <itoral at igalia.com> wrote:
>>>>> From: Connor Abbott <connor.w.abbott at intel.com>
>>>>>
>>>>> It appears that not only math instructions, but also MOV_BYTES or
>>>>> any instruction that uses Align1 mode cannot be in the middle
>>>>> of a dependency control sequence or the GPU will hang (at least on my
>>>>> BDW). This fixes GPU hangs in some fp64 tests.
>>>>
>>>> I'm pretty surprised by this assessment. Doubtful even.
>>>>
>>>>> Reviewed-by: Iago Toral Quiroga <itoral at igalia.com>
>>>>> ---
>>>>>  src/mesa/drivers/dri/i965/brw_vec4.cpp | 17 ++++++++++++-----
>>>>>  1 file changed, 12 insertions(+), 5 deletions(-)
>>>>>
>>>>> diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp b/src/mesa/drivers/dri/i965/brw_vec4.cpp
>>>>> index 3bcd5cb..bc0a33b 100644
>>>>> --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
>>>>> +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
>>>>> @@ -838,6 +838,17 @@ vec4_visitor::is_dep_ctrl_unsafe(const vec4_instruction *inst)
>>>>>     }
>>>>>
>>>>>     /*
>>>>> +    * Instructions that use Align1 mode cause the GPU to hang when inserted
>>>>> +    * between a NoDDClr and NoDDChk in Align16 mode. Discovered empirically.
>>>>> +    */
>>>>> +
>>>>> +   if (inst->opcode == VEC4_OPCODE_PACK_BYTES ||
>>>>> +       inst->opcode == VEC4_OPCODE_MOV_BYTES ||
>>>>
>>>> PACK_BYTES sets depctrl itself in the generator, and at the time I
>>>> added it I made a test that did
>>>>
>>>>   vec4 foo = vec4(packUnorm4x8(...),
>>>>                   packUnorm4x8(...),
>>>>                   packUnorm4x8(...),
>>>>                   packUnorm4x8(...))
>>>>
>>>> and confirmed that it set depctrl properly on the whole sequence.
>>>> There could of course be bugs somewhere, but the "hardware doesn't
>>>> work if you mix align1 and align16 depctrl" seems unlikely.
>>>>
>>>> Do you know of a test that this affects?
>>>
>>> This only affects FP64 tests, since there we use an align1 mov to do
>>> double-to-float and float-to-double. However, I tried commenting out
>>> emit_nir_code() and just doing essentially:
>>>
>>> emit(MOV(...))->force_writemask_all = true;
>>> emit(VEC4_OPCODE_PACK_BYTES, ...);
>>> emit(MOV(...))->force_writemask_all = true;
>>>
>>> and on my BDW it hanged. In case it's not clear: this isn't about
>>> setting depctrl on the instruction itself, it just can't be inside of
>>> a depctrl sequence (which we were already disallowing for math
>>> instructions anyways).
>>
>> Very weird. I'll take a look. So I understand, are the MOV
>> instructions writing different channels of the same register? And
>> VEC4_OPCODE_PACK_BYTES is writing to a different or the same register
>> as the MOVs? (I saw your fixup reply)
>
> Actually, I had them writing the same thing so the second overwrote
> the first one.

Err, just to be clear... in the actual test that led to me discovering
this, the instructions (not mov's but a bfi and a mov IIRC) were
writing different components of the same register. In my hacked-up
test to see if align1 in between a depctrl sequence was the problem, I
just had them both write to the same thing since it was easier. In any
case, I don't think it should matter too much.

> The PACK_BYTES/MOV_BYTES/F2D/D2F (I think I tested all
> of them, not sure) were operating on completely different registers,
> and in the FP64 test that actually hung the GPU they were as well.
> Using d2f since it's simpler and I remember what the operands are
> (it's just an align1 mov with a dest stride of 2), the test code was
> something like:
>
> mov g50, g51 { no_dd_clear }
> d2f g52, g54
> mov g50, g53 { no_dd_check }
>
> and changing the d2f to a normal align16 mov or commenting it out
> prevented the hang. It would be interesting to see if a math
> instruction instead of d2f also hangs.
>
>>
>> By the way, the math code is too heavy handed as far as I know. The
>> BDW+ docs say that the MATH instruction itself cannot take dependency
>> control hints (and empirically earlier platforms seem to have problems
>> with this as well, see
>> tests/shaders/dependency-hints/exp2.shader_test) -- nothing about a
>> math instruction being in the middle of a NoDDC* block. The person who
>> implemented the math did the minimal amount of work to fix the
>> problem.
>>
>> The PRM also says:
>>
>> """
>> Instructions other than send, may use this control as long as
>> operations that have different pipeline latencies are not mixed. The
>> operations that have longer latencies are:
>>
>> Opcodes pln, lrp, dp*.
>> Operations involving double precision computation.
>> Integer DW multiplication where both source operands are DWs.
>> """
>>
>> I would say that mixing a double-precision operation and something
>> else might cause problems, but that seems like we have a different
>> problem thus far.
>
> Yeah, these are all just mov's so I would expect that section to
> apply. It still seems like we're not taking it into account, though...