[Mesa-dev] [PATCH 1/3] i965: Add reads_accumulator_implicitly() function.

Matt Turner mattst88 at gmail.com
Wed Apr 9 15:16:30 PDT 2014


On Wed, Apr 9, 2014 at 3:06 PM, Eric Anholt <eric at anholt.net> wrote:
> Matt Turner <mattst88 at gmail.com> writes:
>
>> ---
>>  src/mesa/drivers/dri/i965/brw_shader.cpp | 16 ++++++++++++++++
>>  src/mesa/drivers/dri/i965/brw_shader.h   |  1 +
>>  2 files changed, 17 insertions(+)
>>
>> diff --git a/src/mesa/drivers/dri/i965/brw_shader.cpp b/src/mesa/drivers/dri/i965/brw_shader.cpp
>> index f194437..c8796b3 100644
>> --- a/src/mesa/drivers/dri/i965/brw_shader.cpp
>> +++ b/src/mesa/drivers/dri/i965/brw_shader.cpp
>> @@ -664,6 +664,22 @@ backend_instruction::can_do_saturate() const
>>  }
>>
>>  bool
>> +backend_instruction::reads_accumulator_implicitly() const
>> +{
>> +   switch (opcode) {
>> +   case BRW_OPCODE_MAC:
>> +   case BRW_OPCODE_MACH:
>> +   /* FINISHME: Enable these if we ever start emitting them.
>> +    * case BRW_OPCODE_SADA:
>> +    * case BRW_OPCODE_SADA2:
>> +    */
>
> Let's just uncomment SADA2 right away to prevent pain in the future.
> SAD2 doesn't read the acc, though.
>
> Other than that, the first 2 patches are:
>
> Reviewed-by: Eric Anholt <eric at anholt.net>
>
> I think scheduling is still broken in the last one, because you're
> removing the barrier deps on implicit-accumulator opcodes and replacing
> them with explicit dependencies, but you're not tracking the accumulator
> updates by almost-all-instructions pre-gen6.  The scheduler would be
> free to slip in some unrelated instruction after the MUL in the
> following snippet from brw_vec4_visitor.cpp:

Ah, that is true.

I went looking for text about this, since I didn't know about it until
you mentioned it recently. I see in the GM45 docs a 'Accumulator
Disable' bit in cr0. I wonder whether all of the false
write-after-write dependencies on the accumulator actually cause
stalls, and if so whether we should attempt to disable accumulator
writes. We don't seem to have any cases where we rely on implicit
accumulator updates that we couldn't replace with explicit accumulator
destinations.

>
>             emit(MUL(acc, op[0], op[1]));
>             emit(MACH(dst_null_d(), op[0], op[1]));
>             emit(MOV(result_dst, src_reg(acc)));
>
> (err, why are we doing MACH and MOV instead of just MACH into
> result_dst?)

mach writes the *high* 32-bits of the result into its destination (so
useful for *mulExtended()).


More information about the mesa-dev mailing list