[Mesa-dev] [PATCH 7/9] i965/vec4: avoid dependency control around Align1 instructions

Connor Abbott cwabbott0 at gmail.com
Thu Nov 19 12:18:24 PST 2015


On Thu, Nov 19, 2015 at 3:11 PM, Matt Turner <mattst88 at gmail.com> wrote:
> On Thu, Nov 19, 2015 at 11:35 AM, Connor Abbott <cwabbott0 at gmail.com> wrote:
>> On Thu, Nov 19, 2015 at 2:07 PM, Connor Abbott <cwabbott0 at gmail.com> wrote:
>>> On Thu, Nov 19, 2015 at 1:54 PM, Matt Turner <mattst88 at gmail.com> wrote:
>>>> On Thu, Nov 19, 2015 at 7:31 AM, Connor Abbott <cwabbott0 at gmail.com> wrote:
>>>>> On Thu, Nov 19, 2015 at 6:40 AM, Matt Turner <mattst88 at gmail.com> wrote:
>>>>>> On Thu, Nov 19, 2015 at 2:05 AM, Iago Toral Quiroga <itoral at igalia.com> wrote:
>>>>>>> From: Connor Abbott <connor.w.abbott at intel.com>
>>>>>>>
>>>>>>> It appears that not only math instructions, but also MOV_BYTES or
>>>>>>> any instruction that uses Align1 mode cannot be in the middle
>>>>>>> of a dependency control sequence or the GPU will hang (at least on my
>>>>>>> BDW). This fixes GPU hangs in some fp64 tests.
>>>>>>
>>>>>> I'm pretty surprised by this assessment. Doubtful even.
>>>>>>
>>>>>>> Reviewed-by: Iago Toral Quiroga <itoral at igalia.com>
>>>>>>> ---
>>>>>>>  src/mesa/drivers/dri/i965/brw_vec4.cpp | 17 ++++++++++++-----
>>>>>>>  1 file changed, 12 insertions(+), 5 deletions(-)
>>>>>>>
>>>>>>> diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp b/src/mesa/drivers/dri/i965/brw_vec4.cpp
>>>>>>> index 3bcd5cb..bc0a33b 100644
>>>>>>> --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
>>>>>>> +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
>>>>>>> @@ -838,6 +838,17 @@ vec4_visitor::is_dep_ctrl_unsafe(const vec4_instruction *inst)
>>>>>>>     }
>>>>>>>
>>>>>>>     /*
>>>>>>> +    * Instructions that use Align1 mode cause the GPU to hang when inserted
>>>>>>> +    * between a NoDDClr and NoDDChk in Align16 mode. Discovered empirically.
>>>>>>> +    */
>>>>>>> +
>>>>>>> +   if (inst->opcode == VEC4_OPCODE_PACK_BYTES ||
>>>>>>> +       inst->opcode == VEC4_OPCODE_MOV_BYTES ||
>>>>>>
>>>>>> PACK_BYTES sets depctrl itself in the generator, and at the time I
>>>>>> added it I made a test that did
>>>>>>
>>>>>>   vec4 foo = vec4(packUnorm4x8(...),
>>>>>>                   packUnorm4x8(...),
>>>>>>                   packUnorm4x8(...),
>>>>>>                   packUnorm4x8(...))
>>>>>>
>>>>>> and confirmed that it set depctrl properly on the whole sequence.
>>>>>> There could of course be bugs somewhere, but the "hardware doesn't
>>>>>> work if you mix align1 and align16 depctrl" seems unlikely.
>>>>>>
>>>>>> Do you know of a test that this affects?
>>>>>
>>>>> This only affects FP64 tests, since there we use an align1 mov to do
>>>>> double-to-float and float-to-double. However, I tried commenting out
>>>>> emit_nir_code() and just doing essentially:
>>>>>
>>>>> emit(MOV(...))->force_writemask_all = true;
>>>>> emit(VEC4_OPCODE_PACK_BYTES, ...);
>>>>> emit(MOV(...))->force_writemask_all = true;
>>>>>
>>>>> and on my BDW it hanged. In case it's not clear: this isn't about
>>>>> setting depctrl on the instruction itself, it just can't be inside of
>>>>> a depctrl sequence (which we were already disallowing for math
>>>>> instructions anyways).
>>>>
>>>> Very weird. I'll take a look. So I understand, are the MOV
>>>> instructions writing different channels of the same register? And
>>>> VEC4_OPCODE_PACK_BYTES is writing to a different or the same register
>>>> as the MOVs? (I saw your fixup reply)
>>>
>>> Actually, I had them writing the same thing so the second overwrote
>>> the first one.
>>
>> Err, just to be clear... in the actual test that led to me discovering
>> this, the instructions (not mov's but a bfi and a mov IIRC) were
>> writing different components of the same register. In my hacked-up
>> test to see if align1 in between a depctrl sequence was the problem, I
>> just had them both write to the same thing since it was easier. In any
>> case, I don't think it should matter too much.
>
> Interesting. If it was a BFI and a MOV writing different components of
> the same register, I can see that failing. Though I haven't measured,
> 3-src instructions usually have 2+ extra cycles of latency and with a
> 2-cycle issue time, a BFI followed immediately by a MOV would mean
> that both of them retired in the same cycle. That would seem bad if
> one of them was supposed to signal the scoreboard.
> ×

They weren't right next to each other, though -- there was a sequence
of 5-6 instructions between them, one of which happens to be an align1
mov which (according to my testing) hangs the GPU in between any
depctrl sequence.


More information about the mesa-dev mailing list