[Mesa-dev] [PATCH 00/15] GL_AMD_performance_monitor

Tue Apr 14 14:02:44 PDT 2015

Hi Samuel,

On Tue, Mar 31, 2015 at 5:56 PM, Samuel Pitoiset
<samuel.pitoiset at gmail.com> wrote:
> Hello Robert,
>
> Sorry for the delay, I just saw your message few days ago, and I probably
> removed the mail by mistake too...

And then I was on holiday; so more delay :-)

>
> I have never heard about your work on this area, happy to know right now. :)
>
> Well, regarding the backend stuff, I would prefer to keep the same for both
> GL_AMD_performance_monitor and INTEL_performance_query.

My experience with the Intel backend where I initially aimed to update
both extensions behind one backend is that it was quite a hindrance
and there wasn't a clear benefit to it when there isn't really any
substantial code to speak of in the core infrastructure to share
between the extensions.

We should be careful not to talk cross purposes here though. In my
mind having orthogonal frontends and even different backend interfaces
wouldn't preclude a driver implementing both extensions with a unified
backend if desirable, there would just be two separate sets of entry
points for the frontend to interact with that unified backend.

Some of the issues I came across were:

The current design expects a common description of counters and their
types, but the current implementation doesn't fully support
INTEL_performance_query semantic types. Fixing this is awkward
because neither extension has data/semantic types that are a strict
subset of the other so to support all the types I imagine we'd also
need to introduce some mechanism for black/white listing counters for
each extension if we want to keep a common description. Then if we
wanted to utilize the full range of types for both extensions I have a
feeling a lot of the counters would end up being exclusively declared
for one extension or the other which would negate some of the benefit
of having a common structure.

The current infrastructure seems somewhat biased towards implementing
AMD_performance_monitor with the concept of groups and counter
selection which doesn't exist in the INTEL_performance_query extension
and that seems unfortunate when the selection mechanism looks to make
the allocation and tracking of query objects more costly/complex if we
don't need it for the INTEL_performance_query extension.

There's no substantial utility code associated with the core
infrastructure that the backends benefit from to help justify sharing
a backend for multiple extensions. The core support just does simple
frontend validation of user arguments to normalize things and handle
gl errors consistently before interacting with the backend so in
practice the INTEL_performance_query and AMD_performance_monitor code
is rather orthogonal.

I think the only things that connect the two extensions currently are
the shared declaration of counters and a tiny amount of utility code
for allocating/freeing monitor objects. Given the issue with the
counter types I found things became simpler if the counter
descriptions were instead moved into the backend. Given that
INTEL_performance_query doesn't need any active group/counter state
per object, the common object allocator also isn't ideal. So making
both of these changes (which seem to make sense even without the goal
of separating the extension) is enough to make the frontends
completely orthogonal. I also really like that with the counter
declarations in the backend that it's free to use whatever data
structures are appropriate for the various counters. As opposed to
statically declared arrays describing our counters, I needed to update
our backend to programatically build up the lists of available
counters and counter descriptions also necessarily became more
detailed so it was nice that this work could be self contained in the
backend and we can describe our Observation Architecture counters
differently from our pipeline statistics counters.

My thinking a.t.m is that if the current AMD_perfmon backend
architecture seems to be ok for your needs then it could be for the
best that the extensions can be easily made orthogonal so we can
develop support for both extensions without stepping on each other's
toes. Later if it's desirable to support both extensions in any driver
we can always evaluate what opportunities there are to have a common
backend interface if that could simplify things.

> Currently, I'm trying to implement GL_AMD_perfmon as a state tracker which
> is based on the query interface of Gallium and this looks quite good. Only
> minor changes in the current interface are required to do that.
>
> At this time, most of hardware performance counters are *only* exposed
> through the Gallium HUD and I think it's not very helpful for a large number
> of applications.
> I'm pretty sure that GL_AMD_perfmon will be very useful for exposing GPU
> counters and this is also a requirement for a GSoC project this year.
>
> So, with respect to your work, my question is : why do you want to get rid
> of AMD_perfmon in favour of INTEL_perf_query ?

>From my pov, the priority is to at least have one extension that works
fully and can expose our Observation Architecture counters. Currently
neither of our backends is usable in practice so we aren't exactly
getting rid of AMD_perfmon in favour of INTEL_perf_query because
neither extension really works for us yet.

A difficulty for us has been that that we've only relatively recently
learned how to configure our Gen graphics Observation Architecture
performance counters and considering how our supporting kernel
interface works it makes quite a big difference to how our backend
needs to work which wasn't possible to consider for the first
implementation.

So to start with it's a question of picking one extension to focus on,
and the INTEL_performance_query extension is a slightly better match
for the performance counters we can get from Gen graphics, it's also
slightly simpler and can express a bit more with its data/semantic
types.

I didn't start out with the plan of dropping our AMD_perfmon backend,
but as I hit issues and looked to evolve the INTEL_perf_query support
I started to see more and more that the current design was quite an
impediment but also saw there was very little really connecting the
two extensions. So from a practical point of view it was just simpler
to draw a line between the two extensions and only have one extension
to worry about.

> Don't you think that the AMD extension is also useful as the INTEL one?

I suppose here, usefulness is mainly dependent on what tooling we can
enable with these extensions.

In terms of the data exposed for tools, the extensions would expose
more or less the same data if we exposed both extensions which would
only be useful in the case of tools that only support one extension or
the other. The INTEL extension has some more data/semantic types so
maybe it has the edge in terms of what tools will want but there's not
really much in it.

I've been experimenting with a tool called gputop
(https://github.com/rib/gputop) based on INTEL_performance_query as a
way to test my work and Mark Janes has also been experimenting with a
UI for fips based on INTEL_performance_query so we at least have some
toys to start with based on INTEL_performance_query.

Based on developing gputop, I see that neither extension is perfect
really as we have more meta data about our counters than can be
expressed by either extension. For example:

I'd like to be able to report a stable uuid or unique name for
counters that tools can trust wont ever change so tools can be made to
understand the semantics of specific counters to help implement things
like automatic bottleneck analysis. Currently we can only report a
short + long name for counters which we we want to be human readable
but might want to change them to improve readability. I'm hoping to
compromise here and guarantee that our short names will be a stable
part of the api for tools but its not guaranteed by either extension.

We don't have a well specified way to report maximum throughputs e.g.
for bandwidth values just because the INTEL spec only technically only
expects drivers to report maximum values for 'raw' counters.

For some of our counters (e.g. sampler bottleneck) we have information
about what threshold should really be highlighted as 'bad' to users
which tools would benefit from, but neither extension gives us a way
to report this.

Ok, I hope this helps explain some of what I've found while working on this.

Depending on any further feedback here; I'm currently thinking I'll
rebase my series soon, dropping the patch that removed all
AMD_performance_monitor support and instead I'll just have a patch
removing the Intel backend. Hopefully I can send out and RFC series
relatively soon, cleaned up a bit more, updated against my latest drm
perf interface and with support for some of our more interesting
counters.

Regards,
- Robert

>
> Best regards,
> Samuel Pitoiset.
>
>