[Mesa-dev] [PATCH 00/15] GL_AMD_performance_monitor

Sun May 3 03:04:41 PDT 2015

On 04/14/2015 11:02 PM, Robert Bragg wrote:
> Hi Samuel,
>
> On Tue, Mar 31, 2015 at 5:56 PM, Samuel Pitoiset
> <samuel.pitoiset at gmail.com> wrote:
>> Hello Robert,
>>
>> Sorry for the delay, I just saw your message few days ago, and I probably
>> removed the mail by mistake too...
> And then I was on holiday; so more delay :-)
>
>> I have never heard about your work on this area, happy to know right now. :)
>>
>> Well, regarding the backend stuff, I would prefer to keep the same for both
>> GL_AMD_performance_monitor and INTEL_performance_query.
> My experience with the Intel backend where I initially aimed to update
> both extensions behind one backend is that it was quite a hindrance
> and there wasn't a clear benefit to it when there isn't really any
> substantial code to speak of in the core infrastructure to share
> between the extensions.
>
> We should be careful not to talk cross purposes here though. In my
> mind having orthogonal frontends and even different backend interfaces
> wouldn't preclude a driver implementing both extensions with a unified
> backend if desirable, there would just be two separate sets of entry
> points for the frontend to interact with that unified backend.
>
> Some of the issues I came across were:
>
> The current design expects a common description of counters and their
> types, but the current implementation doesn't fully support
> INTEL_performance_query semantic types. Fixing this is awkward
> because neither extension has data/semantic types that are a strict
> subset of the other so to support all the types I imagine we'd also
> need to introduce some mechanism for black/white listing counters for
> each extension if we want to keep a common description. Then if we
> wanted to utilize the full range of types for both extensions I have a
> feeling a lot of the counters would end up being exclusively declared
> for one extension or the other which would negate some of the benefit
> of having a common structure.
>
> The current infrastructure seems somewhat biased towards implementing
> AMD_performance_monitor with the concept of groups and counter
> selection which doesn't exist in the INTEL_performance_query extension
> and that seems unfortunate when the selection mechanism looks to make
> the allocation and tracking of query objects more costly/complex if we
> don't need it for the INTEL_performance_query extension.
>
> There's no substantial utility code associated with the core
> infrastructure that the backends benefit from to help justify sharing
> a backend for multiple extensions. The core support just does simple
> frontend validation of user arguments to normalize things and handle
> gl errors consistently before interacting with the backend so in
> practice the INTEL_performance_query and AMD_performance_monitor code
> is rather orthogonal.
>
> I think the only things that connect the two extensions currently are
> the shared declaration of counters and a tiny amount of utility code
> for allocating/freeing monitor objects. Given the issue with the
> counter types I found things became simpler if the counter
> descriptions were instead moved into the backend. Given that
> INTEL_performance_query doesn't need any active group/counter state
> per object, the common object allocator also isn't ideal. So making
> both of these changes (which seem to make sense even without the goal
> of separating the extension) is enough to make the frontends
> completely orthogonal. I also really like that with the counter
> declarations in the backend that it's free to use whatever data
> structures are appropriate for the various counters. As opposed to
> statically declared arrays describing our counters, I needed to update
> our backend to programatically build up the lists of available
> counters and counter descriptions also necessarily became more
> detailed so it was nice that this work could be self contained in the
> backend and we can describe our Observation Architecture counters
> differently from our pipeline statistics counters.
>
> My thinking a.t.m is that if the current AMD_perfmon backend
> architecture seems to be ok for your needs then it could be for the
> best that the extensions can be easily made orthogonal so we can
> develop support for both extensions without stepping on each other's
> toes. Later if it's desirable to support both extensions in any driver
> we can always evaluate what opportunities there are to have a common
> backend interface if that could simplify things.
>
>> Currently, I'm trying to implement GL_AMD_perfmon as a state tracker which
>> is based on the query interface of Gallium and this looks quite good. Only
>> minor changes in the current interface are required to do that.
>>
>> At this time, most of hardware performance counters are *only* exposed
>> through the Gallium HUD and I think it's not very helpful for a large number
>> of applications.
>> I'm pretty sure that GL_AMD_perfmon will be very useful for exposing GPU
>> counters and this is also a requirement for a GSoC project this year.
>>
>> So, with respect to your work, my question is : why do you want to get rid
>> of AMD_perfmon in favour of INTEL_perf_query ?
>  From my pov, the priority is to at least have one extension that works
> fully and can expose our Observation Architecture counters. Currently
> neither of our backends is usable in practice so we aren't exactly
> getting rid of AMD_perfmon in favour of INTEL_perf_query because
> neither extension really works for us yet.
>
> A difficulty for us has been that that we've only relatively recently
> learned how to configure our Gen graphics Observation Architecture
> performance counters and considering how our supporting kernel
> interface works it makes quite a big difference to how our backend
> needs to work which wasn't possible to consider for the first
> implementation.
>
> So to start with it's a question of picking one extension to focus on,
> and the INTEL_performance_query extension is a slightly better match
> for the performance counters we can get from Gen graphics, it's also
> slightly simpler and can express a bit more with its data/semantic
> types.
>
> I didn't start out with the plan of dropping our AMD_perfmon backend,
> but as I hit issues and looked to evolve the INTEL_perf_query support
> I started to see more and more that the current design was quite an
> impediment but also saw there was very little really connecting the
> two extensions. So from a practical point of view it was just simpler
> to draw a line between the two extensions and only have one extension
> to worry about.
>
>> Don't you think that the AMD extension is also useful as the INTEL one?
> I suppose here, usefulness is mainly dependent on what tooling we can
> enable with these extensions.
>
> In terms of the data exposed for tools, the extensions would expose
> more or less the same data if we exposed both extensions which would
> only be useful in the case of tools that only support one extension or
> the other. The INTEL extension has some more data/semantic types so
> maybe it has the edge in terms of what tools will want but there's not
> really much in it.
>
> I've been experimenting with a tool called gputop
> (https://github.com/rib/gputop) based on INTEL_performance_query as a
> way to test my work and Mark Janes has also been experimenting with a
> UI for fips based on INTEL_performance_query so we at least have some
> toys to start with based on INTEL_performance_query.
>
> Based on developing gputop, I see that neither extension is perfect
> really as we have more meta data about our counters than can be
> expressed by either extension. For example:
>
> I'd like to be able to report a stable uuid or unique name for
> counters that tools can trust wont ever change so tools can be made to
> understand the semantics of specific counters to help implement things
> like automatic bottleneck analysis. Currently we can only report a
> short + long name for counters which we we want to be human readable
> but might want to change them to improve readability. I'm hoping to
> compromise here and guarantee that our short names will be a stable
> part of the api for tools but its not guaranteed by either extension.
>
> We don't have a well specified way to report maximum throughputs e.g.
> for bandwidth values just because the INTEL spec only technically only
> expects drivers to report maximum values for 'raw' counters.
>
> For some of our counters (e.g. sampler bottleneck) we have information
> about what threshold should really be highlighted as 'bad' to users
> which tools would benefit from, but neither extension gives us a way
> to report this.
>
> Ok, I hope this helps explain some of what I've found while working on this.
>
> Depending on any further feedback here; I'm currently thinking I'll
> rebase my series soon, dropping the patch that removed all
> AMD_performance_monitor support and instead I'll just have a patch
> removing the Intel backend. Hopefully I can send out and RFC series
> relatively soon, cleaned up a bit more, updated against my latest drm
> perf interface and with support for some of our more interesting
> counters.

Thanks for this long explanation, I now clearly understand your point of 
view
regarding INTEL_performance_query and GL_AMD_performance_monitor.

I agree with you that we shouldn't share the same backend for these two 
extensions
because they are quite different.

Anyways, GL_AMD_performance_monitor doesn't fit well with performance 
counters on
Nouveau because this extension doesn't allow to expose multi-passes 
event (same problem
with the Gallium HUD btw). So, I also work on the implementation of 
nvidia-perfkit
for Nouveau which is well designed for the underlying hardware. :)

> Regards,
> - Robert
>
>> Best regards,
>> Samuel Pitoiset.
>>
>>