[Mesa-dev] [RFC] ARB_shader_clock, hardware counters and i965

Emil Velikov emil.l.velikov at gmail.com
Mon Oct 5 10:06:15 PDT 2015

On 5 October 2015 at 17:11, Connor Abbott <cwabbott0 at gmail.com> wrote:
> On Mon, Oct 5, 2015 at 11:36 AM, Emil Velikov <emil.l.velikov at gmail.com> wrote:
>> Hi all,
>> I am looking at ARB_shader_clock with i965 in mind.
>> So far I've got the most of the infra/plumbing, and a fancy a new intrinsic :)
>> On the hardware side, I was thinking about using the Observability
>> Architecture (OA) counters. The fun part is that those tend to vary
>> quite a bit based on the hardware generation. So far I'm leaning
>> towards:
>>  - "Count of XXX threads dispatched to EUs" for BRW and later.
>>  - "XXX Shader Active Time" for earlier (SNB-HSW/VLV) hardware.
>> Do there sound appropriate, or should we opt for the various knobs in
>> 'Flexible EU event counters' ? Is there some alternative piece of
>> hardware in i965, which I can use ?
>> Going for OA has a small catch. Reading through the PRM, it is not
>> obvious if one can track the same source twice (the
>> GL_AMD_performance_monitor implementation comes to mind). I'm about to
>> take a closer look into brw_performance_monitor.[ch] shortly, but if
>> any gotchas/fancy interactions come to mind let me know.
>> Thanks
>> Emil
>> P.S. Does anyone recall the consensus wrt adding the 2015 extensions
>> to GL3.txt ?
>> _______________________________________________
>> mesa-dev mailing list
>> mesa-dev at lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
> Hi Emil,
> I don't think you want to use the OA counters to implement
> ARB_shader_clock. They're not exposed to the shader directly, AFAIK,
> and they only measure things on a per-invocation granularity, whereas
> the intent of ARB_shader_clock is to be able to measure the number of
> cycles that individual operations take with very low latency. Instead,
> you should read from the ARF performance register -- see page 822 of
> Vol 7 ("3D Media GPGPU") of the Broadwell PRM (page 858 of the PDF)
> for more details.
I knew that there should be nicer piece of hardware for this, but
could not find it looking through the spec. The timestamp register
looks exactly like the thing we need there.

> Another interesting thing is that you can atomically read from that
> register and also get a bit that say whether there was some event,
> such as a context switch, since the last time you read it that would
> make your measurement invalid. It might be useful to expose this
> through a GLSL extension as another set of overloads:
> uint64_t clockARB(out bool valid); //once we get int64 support
> uvec2 clock2x32ARB(out bool valid);
Are you thinking about writing up another extension, or should we just
wire things internally as someone else does it for us ? Would you have
any preference how to handle things when a context switch has occurred
(for the official functions) ?

> and a corresponding NIR intrinsic that outputs an extra component
> that's a boolean (i.e. 0 or ~0). That would help with implementing
> something like INTEL_DEBUG=shader_time generically with less outliers
> to throw away.
Exposing it via INTEL_DEBUG will be great, but first I'd stick getting
the extension bits in place.


More information about the mesa-dev mailing list