[Mesa-dev] [RFC] ARB_shader_clock, hardware counters and i965

Connor Abbott cwabbott0 at gmail.com
Mon Oct 5 09:11:01 PDT 2015


On Mon, Oct 5, 2015 at 11:36 AM, Emil Velikov <emil.l.velikov at gmail.com> wrote:
> Hi all,
>
> I am looking at ARB_shader_clock with i965 in mind.
>
> So far I've got the most of the infra/plumbing, and a fancy a new intrinsic :)
>
> On the hardware side, I was thinking about using the Observability
> Architecture (OA) counters. The fun part is that those tend to vary
> quite a bit based on the hardware generation. So far I'm leaning
> towards:
>  - "Count of XXX threads dispatched to EUs" for BRW and later.
>  - "XXX Shader Active Time" for earlier (SNB-HSW/VLV) hardware.
>
> Do there sound appropriate, or should we opt for the various knobs in
> 'Flexible EU event counters' ? Is there some alternative piece of
> hardware in i965, which I can use ?
>
>
> Going for OA has a small catch. Reading through the PRM, it is not
> obvious if one can track the same source twice (the
> GL_AMD_performance_monitor implementation comes to mind). I'm about to
> take a closer look into brw_performance_monitor.[ch] shortly, but if
> any gotchas/fancy interactions come to mind let me know.
>
> Thanks
> Emil
>
> P.S. Does anyone recall the consensus wrt adding the 2015 extensions
> to GL3.txt ?
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Hi Emil,

I don't think you want to use the OA counters to implement
ARB_shader_clock. They're not exposed to the shader directly, AFAIK,
and they only measure things on a per-invocation granularity, whereas
the intent of ARB_shader_clock is to be able to measure the number of
cycles that individual operations take with very low latency. Instead,
you should read from the ARF performance register -- see page 822 of
Vol 7 ("3D Media GPGPU") of the Broadwell PRM (page 858 of the PDF)
for more details.

Another interesting thing is that you can atomically read from that
register and also get a bit that say whether there was some event,
such as a context switch, since the last time you read it that would
make your measurement invalid. It might be useful to expose this
through a GLSL extension as another set of overloads:

uint64_t clockARB(out bool valid); //once we get int64 support
uvec2 clock2x32ARB(out bool valid);

and a corresponding NIR intrinsic that outputs an extra component
that's a boolean (i.e. 0 or ~0). That would help with implementing
something like INTEL_DEBUG=shader_time generically with less outliers
to throw away.

Connor


More information about the mesa-dev mailing list