[Mesa-dev] [PATCH 1/3] mesa: Add core support for the GL_AMD_performance_monitor extension.

Sat Sep 21 08:19:47 PDT 2013

On Fri, Sep 20, 2013 at 3:42 PM, Kenneth Graunke <kenneth at whitecape.org> wrote:
> On 09/20/2013 07:55 AM, Brian Paul wrote:
>> On Thu, Sep 19, 2013 at 5:27 PM, Kenneth Graunke <kenneth at whitecape.org> wrote:
> [snip]
>>> diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h
>>> index 6d700ec..70dba6e 100644
>>> --- a/src/mesa/main/mtypes.h
>>> +++ b/src/mesa/main/mtypes.h
>>> @@ -1778,6 +1778,89 @@ struct gl_transform_feedback_state
>>>
>>>
>>>  /**
>>> + * A "performance monitor" as described in AMD_performance_monitor.
>>> + */
>>> +struct gl_perf_monitor_object
>>> +{
>>> +   GLboolean Active;
>>> +
>>> +   /**
>>> +    * A list of groups with currently active counters.
>>> +    *
>>> +    * ActiveGroups[g] == n if there are n counters active from group 'g'.
>>> +    */
>>> +   unsigned *ActiveGroups;
>>
>> GLuint?
>>
>
> I chose 'unsigned' over 'GLuint' here since it's only used internally,
> and never exposed by the API.  But I can switch if you prefer that.

OK, I didn't realize that was the distinction being made.  My impulse
is to either use all GL types in a struct or not (Mesa vs. gallium for
example).  Your way is fine too.

I'm also trying to be watchful of signed vs. unsigned fields and loop
counters.  MSVC likes to complain about signed/unsigned comparisons.

>
>>> +
>>> +   /**
>>> +    * An array of bitsets, subscripted by group ID, then indexed by counter ID.
>>> +    *
>>> +    * Checking whether counter 'c' in group 'g' is active can be done via:
>>> +    *
>>> +    *    BITSET_TEST(ActiveCounters[g], c)
>>> +    */
>>> +   GLuint **ActiveCounters;
>>
>> GLbitfield?
>
> The type here is actually BITSET_WORD (from bitset.h), which is a
> #define for GLuint.  Including bitset.h from mtypes.h led to all kinds
> of problems, so I just used GLuint.
>
> It seems like we could do something better, but I'm not sure what.

OK, that's fine then.

>>> +};
>>> +
>>> +
>>> +union gl_perf_monitor_counter_value
>>> +{
>>> +   float f;
>>> +   uint64_t u64;
>>> +   uint32_t u32;
>>> +};
>>> +
>>> +
>>> +struct gl_perf_monitor_counter
>>> +{
>>> +   /** Human readable name for the counter. */
>>> +   const char *Name;
>>> +
>>> +   /**
>>> +    * Data type of the counter.  Valid values are FLOAT, UNSIGNED_INT,
>>> +    * UNSIGNED_INT64_AMD, and PERCENTAGE_AMD.
>>> +    */
>>> +   GLenum Type;
>>> +
>>> +   /** Minimum counter value. */
>>> +   union gl_perf_monitor_counter_value Minimum;
>>> +
>>> +   /** Maximum counter value. */
>>> +   union gl_perf_monitor_counter_value Maximum;
>>> +};
>>> +
>>> +
>>> +struct gl_perf_monitor_group
>>> +{
>>> +   /** Human readable name for the group. */
>>> +   const char *Name;
>>> +
>>> +   /**
>>> +    * Maximum number of counters in this group which can be active at the
>>> +    * same time.
>>> +    */
>>> +   GLint MaxActiveCounters;
>>
>> GLuint?
>
> That would make sense, but for some reason the AMD_performance_monitor
> extension exposes this value as a GLint:
>
> void GetPerfMonitorCountersAMD(uint group, int *numCounters,
>                                int *maxActiveCounters, sizei countersSize,
>                                uint *counters)
>
> I figured it should probably match...

Well, when I'm reading the code and see GLint, I think "ok, that value
could be negative".  But that's not true here so I guess I'd still
prefer GLuint/unsigned.

>
>>> +
>>> +   /** Array of counters within this group. */
>>> +   const struct gl_perf_monitor_counter *Counters;
>>> +   GLint NumCounters;

Same story with NumCounters??

>>> +};
>>> +
>>> +
>>> +/**
>>> + * Context state for AMD_performance_monitor.
>>> + */
>>> +struct gl_perf_monitor_state
>>> +{
>>> +   /** Array of performance monitor groups (indexed by group ID) */
>>> +   const struct gl_perf_monitor_group *Groups;
>>> +   GLint NumGroups;
>>
>> GLuint?
>
> Likewise, the extension exposes this as a GLint:
>
> void GetPerfMonitorGroupsAMD(int *numGroups, sizei groupsSize,
>                              uint *groups)
>
> I don't know why...GLuint would have made more sense.  Of course, nobody
> is going to have enough groups for it to make a difference :)

Yeah, but like I said, it matters when one's reading the code and
trying to get an understanding of what the legal values for a variable
might be.

>>> +void GLAPIENTRY
>>> +_mesa_GetPerfMonitorGroupStringAMD(GLuint group, GLsizei bufSize,
>>> +                                   GLsizei *length, GLchar *groupString)
>>> +{
>>> +   GET_CURRENT_CONTEXT(ctx);
>>> +
>>> +   const struct gl_perf_monitor_group *group_obj = get_group(ctx, group);
>>> +
>>> +   if (group_obj == NULL) {
>>> +      _mesa_error(ctx, GL_INVALID_VALUE, "glGetPerfMonitorGroupStringAMD");
>>> +      return;
>>> +   }
>>> +
>>> +   if (bufSize == 0) {
>>> +      /* Return the number of characters that would be required to hold the
>>> +       * group string, excluding the null terminator.
>>> +       */
>>> +      if (length != NULL)
>>> +         *length = strlen(group_obj->Name);
>>> +   } else {
>>> +      if (length != NULL)
>>> +         *length = MIN2(strlen(group_obj->Name), bufSize);
>>> +      if (groupString != NULL)
>>> +         strncpy(groupString, group_obj->Name, bufSize);
>>
>> I think you need to replace bufSize there with the value you put in *length.
>
> It shouldn't matter.  The names should all be proper NULL-terminated C
> strings, so the strncpy will automatically stop at
> strlen(group_obj->Name).  Passing bufSize to strncpy catches the case
> where the buffer is smaller than the group name.

Yes, you're right.  For some reason I read strncpy() as memcpy().
Probably because that's what's used in the objectlabel.c code which I
was fixing last week.

>>> +void GLAPIENTRY
>>> +_mesa_BeginPerfMonitorAMD(GLuint monitor)
>>> +{
>>> +   GET_CURRENT_CONTEXT(ctx);
>>> +
>>> +   struct gl_perf_monitor_object *m = lookup_monitor(ctx, monitor);
>>> +
>>> +   if (m == NULL) {
>>> +      _mesa_error(ctx, GL_INVALID_VALUE,
>>> +                  "glBeginPerfMonitorAMD(invalid monitor)");
>>> +      return;
>>> +   }
>>> +
>>> +   /* "INVALID_OPERATION error will be generated if BeginPerfMonitorAMD is
>>> +    *  called when a performance monitor is already active."
>>> +    */
>>> +   if (m->Active) {
>>> +      _mesa_error(ctx, GL_INVALID_OPERATION,
>>> +                  "glBeginPerfMonitor(already active)");
>>> +      return;
>>> +   }
>>> +
>>> +   /* The driver should either set m->Active to true or raise an error:
>>> +    * "INVALID_OPERATION error will be generated if BeginPerfMonitorAMD is
>>> +    *  unable to begin monitoring with the currently selected counters."
>>> +    */
>>> +   ctx->Driver.BeginPerfMonitor(ctx, m);
>>
>> Set m->Active = true here?  We set it to false in the End function below.
>
> See the above comment.  The spec allows drivers to arbitrarily refuse to
> start monitoring due to unknown constraints.  It's sort of a cop-out;
> the extension is general, and each GPU has weird restrictions on
> performance counters that would be hard to expose.
>
> So the driver needs to set m->Active or raise an error.
>
> In contrast, it should always be possible to /stop/ counting, so I made
> the core code set m->Active = false.  I could move that to the driver
> for consistency, I suppose.  Do you have a preference?

Sorry, I totally glanced over the comment.  How about having
BeginPerfMonitor() return a success/failure boolean?  Then, the caller
can do:

m->Active = ctx->Driver.BeginPerfMonitor(ctx, m);
if (!m->Active) {
   _mesa_error(...);
   return;
}

So that the _mesa_error() call doesn't have to be duplicated in all
drivers?  In general, we try to do as much error checking/recording as
possible in core Mesa rather than in the drivers.

-Brian