[Mesa-dev] [PATCH 6/6] i965/gen7: Add instruction latency estimates for untyped atomics and reads.
Matt Turner
mattst88 at gmail.com
Fri Nov 1 11:02:38 PDT 2013
On Fri, Nov 1, 2013 at 10:31 AM, Paul Berry <stereotype441 at gmail.com> wrote:
> On 29 October 2013 16:37, Francisco Jerez <currojerez at riseup.net> wrote:
>>
>> The latency information has been obtained empirically from
>> measurements taken on Haswell and Ivy Bridge.
>> ---
>> .../drivers/dri/i965/brw_schedule_instructions.cpp | 41
>> ++++++++++++++++++++++
>> 1 file changed, 41 insertions(+)
>>
>> diff --git a/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp
>> b/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp
>> index 944b5c8..cbfaabe 100644
>> --- a/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp
>> +++ b/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp
>> @@ -329,6 +329,47 @@ schedule_node::set_latency_gen7(bool is_haswell)
>> latency = 200;
>> break;
>>
>> + case SHADER_OPCODE_UNTYPED_ATOMIC:
>> + /* Test code:
>> + * mov(8) g112<1>ud 0x00000000ud { align1 WE_all
>> 1Q };
>> + * mov(1) g112.7<1>ud g1.7<0,1,0>ud { align1 WE_all
>> };
>> + * mov(8) g113<1>ud 0x00000000ud { align1
>> WE_normal 1Q };
>> + * send(8) g4<1>ud g112<8,8,1>ud
>> + * data (38, 5, 6) mlen 2 rlen 1 { align1
>> WE_normal 1Q };
>> + *
>> + * Running it 100 times as fragment shader on a 128x128 quad
>> + * gives an average latency of 13867 cycles per atomic op,
>> + * standard deviation 3%. Note that this is a rather
>> + * pessimistic estimate, the actual latency in cases with few
>> + * collisions between threads and favorable pipelining has been
>> + * seen to be reduced by a factor of 100.
>> + */
>> + latency = 14000;
>
>
> Wow, that's a really huge latency. Given your argument in the comment, I
> suspect that in practice, shaders that use atomic counters are going to be a
> lot closer to the "few collisions between threads and favorable pipelining"
> case than they are going to be to this pessimistic estimate. Personally,
> I'd be inclined to make the latency the same as
> SHADER_OPCODE_UNTYPED_SURFACE_READ.
>
> But I'm not an expert on scheduling latencies so I'll defer to Eric and
> Matt. Consider this patch:
That seems reasonable to me. Once the latency is an order of magnitude
more than any other instruction, it kind of stops mattering for
scheduling purposes.
Either way:
Reviewed-by: Matt Turner <mattst88 at gmail.com>
More information about the mesa-dev
mailing list