[Mesa-dev] [PATCH 06/15] mesa/st: implement GL_AMD_performance_monitor v4

Kenneth Graunke kenneth at whitecape.org
Wed Jul 9 12:21:49 PDT 2014


On Wednesday, July 09, 2014 04:34:40 PM Samuel Pitoiset wrote:
> From: Christoph Bumiller <e0425955 at student.tuwien.ac.at>
> 
> This is based on the original patch of Christoph Bumiller.
> (source: http://people.freedesktop.org/~chrisbmr/perfmon.diff)
> 
> Drivers must implement get_driver_query_group_info and
> get_driver_query_info in order to enable this extension.
> 
> V2: (Samuel Pitoiset)
>  - fix compilation
>  - improve the original code
>  - rewrite some parts of the original code
> 
> V3:
>  - do not use pipe_driver_query_info::min_value which is always set to 0
> 
> V4:
>  - only enable AMD_performance_monitor if the driver implements
>  pipe_screen::get_driver_query_group_info and get_driver_query_info
>  - make use of PIPE_DRIVER_QUERY_TYPE_*
>  - use of GL_UNSIGNED_INT in st_GetPerfMonitorResult()
> 
> Signed-off-by: Samuel Pitoiset <samuel.pitoiset at gmail.com>

Hi Samuel!

I'm really glad to see more drivers implementing this.  Nice work.

Some thoughts below...

> ---
>  src/mesa/Makefile.sources              |   1 +
>  src/mesa/state_tracker/st_cb_perfmon.c | 375 
+++++++++++++++++++++++++++++++++
>  src/mesa/state_tracker/st_cb_perfmon.h |  63 ++++++
>  src/mesa/state_tracker/st_context.c    |   3 +
>  src/mesa/state_tracker/st_extensions.c |   3 +
>  5 files changed, 445 insertions(+)
>  create mode 100644 src/mesa/state_tracker/st_cb_perfmon.c
>  create mode 100644 src/mesa/state_tracker/st_cb_perfmon.h
> 
> diff --git a/src/mesa/Makefile.sources b/src/mesa/Makefile.sources
> index f4904fb..0b020e2 100644
> --- a/src/mesa/Makefile.sources
> +++ b/src/mesa/Makefile.sources
> @@ -234,6 +234,7 @@ STATETRACKER_FILES = \
>  	$(SRCDIR)state_tracker/st_cb_fbo.c \
>  	$(SRCDIR)state_tracker/st_cb_feedback.c \
>  	$(SRCDIR)state_tracker/st_cb_msaa.c \
> +	$(SRCDIR)state_tracker/st_cb_perfmon.c \
>  	$(SRCDIR)state_tracker/st_cb_program.c \
>  	$(SRCDIR)state_tracker/st_cb_queryobj.c \
>  	$(SRCDIR)state_tracker/st_cb_rasterpos.c \
> diff --git a/src/mesa/state_tracker/st_cb_perfmon.c 
b/src/mesa/state_tracker/st_cb_perfmon.c
> new file mode 100644
> index 0000000..1883dc2
> --- /dev/null
> +++ b/src/mesa/state_tracker/st_cb_perfmon.c
> @@ -0,0 +1,375 @@
> +/*
> + * Copyright 2014 Nouveau Project

I'd just say:

Copyright (C) 2013 Christoph Bumiller
Copyright (C) 2014 Samuel Pitoiset

I don't think the Nouveau Project is a legal entity, so it can't really hold 
copyrights.

> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the 
"Software"),
> + * to deal in the Software without restriction, including without 
limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included 
in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS 
OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> + * OTHER DEALINGS IN THE SOFTWARE.
> + *
> + * Authors: Christoph Bumiller
> + *          Samuel Pitoiset
> + */
> +
> +/**
> + * Performance monitoring counters interface to gallium.
> + */
> +
> +#include "st_context.h"
> +#include "st_cb_perfmon.h"
> +#include "st_cb_bitmap.h"
> +
> +#include "main/bitset.h"
> +
> +/* Avoid multiple definitions */
> +#undef MALLOC_STRUCT
> +#undef CALLOC_STRUCT
> +
> +#include "pipe/p_context.h"
> +#include "pipe/p_defines.h"
> +#include "pipe/p_screen.h"
> +#include "util/u_math.h"
> +#include "util/u_memory.h"
> +
> +static int
> +find_query_type(struct pipe_screen *screen, const char *name)
> +{
> +   int num_counters;
> +   int type = -1;
> +   int i;
> +
> +   num_counters = screen->get_driver_query_info(screen, 0, NULL);
> +   if (!num_counters)
> +      return type;
> +
> +   for (i = 0; i < num_counters; i++) {
> +      struct pipe_driver_query_info info;
> +
> +      if (!screen->get_driver_query_info(screen, i, &info))
> +         continue;
> +
> +      if (!strncmp(info.name, name, strlen(name))) {
> +         type = info.query_type;
> +         break;
> +      }
> +   }
> +   return type;
> +}
> +
> +static void
> +reinitialize_perf_monitor(struct st_perf_monitor_object *stm,
> +                          struct pipe_context *pipe)
> +{
> +   int i;
> +
> +   for (i = 0; i < stm->num_queries; i++) {
> +      if (stm->queries[i].pq) {
> +         pipe->destroy_query(pipe, stm->queries[i].pq);
> +         stm->queries[i].pq = NULL;
> +      }
> +   }
> +   stm->num_queries = 0;
> +   stm->ready = FALSE;
> +}

I wonder whether it would make sense to add a Gallium interface for counting 
multiple things - so you could send a single call to the driver saying, please 
count these 40 different things.

i965 hardware at least has a single command, MI_REPORT_PERF_COUNT, that counts 
all 40 counters even if you just wanted 1 or 2 of them.  A naive 
implementation of queries might snapshot all 40 values for each query.  Having 
a single call might make it easier.

With this interface, drivers that count multiple values with a single command 
would want to combine such queries, for efficiency.  For drivers that count 
each value independently, this interface would be perfect.

With the other interface, drivers that count independently could easily split 
up the call.  But it'd also be easy for drivers that count groups together.

It's up to you, though.

> +
> +static struct gl_perf_monitor_object *
> +st_NewPerfMonitor()
> +{
> +   struct st_perf_monitor_object *stq = 
ST_CALLOC_STRUCT(st_perf_monitor_object);
> +   if (stq)
> +      return &stq->base;
> +   return NULL;
> +}
> +
> +static void
> +st_DeletePerfMonitor(struct gl_context *ctx, struct gl_perf_monitor_object 
*m)
> +{
> +   struct st_perf_monitor_object *stm = st_perf_monitor_object(m);
> +   struct pipe_context *pipe = st_context(ctx)->pipe;
> +
> +   reinitialize_perf_monitor(stm, pipe);
> +   FREE(stm);
> +}
> +
> +static GLboolean
> +st_BeginPerfMonitor(struct gl_context *ctx, struct gl_perf_monitor_object 
*m)
> +{
> +   struct st_perf_monitor_object *stm = st_perf_monitor_object(m);
> +   struct pipe_screen *screen = st_context(ctx)->pipe->screen;
> +   struct pipe_context *pipe = st_context(ctx)->pipe;
> +   int num_active_queries = 0;
> +   int group, counter;
> +   int i;
> +
> +   st_flush_bitmap_cache(st_context(ctx));
> +
> +   /* Initialize a monitor to sane starting state. */
> +   reinitialize_perf_monitor(stm, pipe);
> +
> +   /* Check the number of active queries. */
> +   for (group = 0; group < ctx->PerfMonitor.NumGroups; group++) {
> +      const struct gl_perf_monitor_group *g = &ctx-
>PerfMonitor.Groups[group];
> +      num_active_queries += m->ActiveGroups[group];
> +      if (m->ActiveGroups[group] >= g->MaxActiveCounters ||
> +          num_active_queries >= ST_MAX_PERFMON_QUERIES) {
> +         /* Maximum number of queries reached. Cannot start the session. */
> +         return false;
> +      }
> +   }
> +
> +   for (group = 0; group < ctx->PerfMonitor.NumGroups; group++) {
> +      const struct gl_perf_monitor_group *g = &ctx-
>PerfMonitor.Groups[group];
> +      for (counter = 0; counter < g->NumCounters; counter++) {
> +         const struct gl_perf_monitor_counter *c = &g->Counters[counter];
> +         int type;
> +
> +         if (!BITSET_TEST(m->ActiveCounters[group], counter))
> +            continue;
> +
> +         type = find_query_type(screen, c->Name);
> +         assert(type != -1);
> +
> +         stm->queries[stm->num_queries].pq = pipe->create_query(pipe, type, 
0);
> +         stm->queries[stm->num_queries].group_id = group;
> +         stm->queries[stm->num_queries].counter_id = counter;
> +         stm->num_queries++;
> +      }
> +   }
> +
> +   for (i = 0; i < stm->num_queries; i++) {
> +      if (stm->queries[i].pq)
> +         if (!pipe->begin_query(pipe, stm->queries[i].pq))
> +            goto fail;
> +   }
> +   return true;
> +
> +fail:
> +   /* Failed to start a monitoring session. */
> +   reinitialize_perf_monitor(stm, pipe);
> +   return false;
> +}
> +
> +static void
> +st_EndPerfMonitor(struct gl_context *ctx, struct gl_perf_monitor_object *m)
> +{
> +   struct st_perf_monitor_object *stm = st_perf_monitor_object(m);
> +   struct pipe_context *pipe = st_context(ctx)->pipe;
> +   int i;
> +
> +   st_flush_bitmap_cache(st_context(ctx));
> +
> +   for (i = 0; i < stm->num_queries; i++)
> +      if (stm->queries[i].pq)
> +         pipe->end_query(pipe, stm->queries[i].pq);
> +}
> +
> +static void
> +st_ResetPerfMonitor(struct gl_context *ctx, struct gl_perf_monitor_object 
*m)
> +{
> +   struct st_perf_monitor_object *stm = st_perf_monitor_object(m);
> +   struct pipe_context *pipe = st_context(ctx)->pipe;
> +
> +   if (!m->Ended)
> +      st_EndPerfMonitor(ctx, m);

The idea behind the ResetPerfMonitor hook was that you just want to throw away 
the queries and start over.  The driver may need to stop any counting that 
it's doing, but it could skip extra work that EndPerfMonitor might have to do.

For example, EndPerfMonitor might need to take an ending counter snapshot for 
each counter, gather the results (subtracting start/end counter values), and 
so on.  ResetPerfMonitor could avoid that work.

Maybe it's a silly thing to optimize though.  Your implementation certainly 
should work.

> +   reinitialize_perf_monitor(stm, pipe);
> +
> +   if (m->Active)
> +      st_BeginPerfMonitor(ctx, m);
> +}
> +
> +static GLboolean
> +st_IsPerfMonitorResultAvailable(struct gl_context *ctx,
> +                                struct gl_perf_monitor_object *m)
> +{
> +   struct st_perf_monitor_object *stm = st_perf_monitor_object(m);
> +   struct pipe_context *pipe = st_context(ctx)->pipe;
> +   int i;
> +
> +   if (stm->ready)
> +      return stm->ready;
> +
> +   for (i = 0; i < stm->num_queries; i++) {
> +      union pipe_query_result result;
> +      if (!pipe->get_query_result(pipe, stm->queries[i].pq, TRUE, &result))
> +         break;

IsPerfMonitorResultAvailable is not supposed to block.  I think you want FALSE 
here, not TRUE (assuming this is the "wait" flag).

> +   }
> +   stm->ready = i == stm->num_queries;
> +   return stm->ready;
> +}
> +
> +static void
> +st_GetPerfMonitorResult(struct gl_context *ctx,
> +                        struct gl_perf_monitor_object *m,
> +                        GLsizei dataSize,
> +                        GLuint *data,
> +                        GLint *bytesWritten)
> +{
> +   struct st_perf_monitor_object *stm = st_perf_monitor_object(m);
> +   struct pipe_context *pipe = st_context(ctx)->pipe;
> +   int i;
> +
> +   /* This hook should only be called when results are available. */
> +   assert(m->Ended);
> +
> +   /* Copy data to the supplied array (data).
> +    *
> +    * The output data format is: <group ID, counter ID, value> for each
> +    * active counter. The API allows counters to appear in any order.
> +    */
> +   GLsizei offset = 0;
> +
> +   for (i = 0; i < stm->num_queries; i++) {
> +      union pipe_query_result result;
> +      int group_id, counter_id;
> +      GLenum type;
> +
> +      if (!pipe->get_query_result(pipe, stm->queries[i].pq, TRUE, &result))
> +         continue;
> +
> +      group_id = stm->queries[i].group_id;
> +      counter_id = stm->queries[i].counter_id;
> +      type = ctx->PerfMonitor.Groups[group_id].Counters[counter_id].Type;
> +
> +      data[offset++] = group_id;
> +      data[offset++] = counter_id;
> +      switch (type) {
> +      case GL_UNSIGNED_INT64_AMD:
> +         *(uint64_t *)&data[offset] = result.u64;
> +         offset += sizeof(uint64_t) / sizeof(GLuint);
> +         break;
> +      case GL_UNSIGNED_INT:
> +         *(uint32_t *)&data[offset] = result.u32;
> +         offset += sizeof(uint32_t) / sizeof(GLuint);
> +         break;
> +      case GL_FLOAT:
> +      case GL_PERCENTAGE_AMD:
> +         *(GLfloat *)&data[offset] = result.f;
> +         offset += sizeof(GLfloat) / sizeof(GLuint);
> +         break;
> +      }
> +   }
> +
> +   if (bytesWritten)
> +      *bytesWritten = offset * sizeof(GLuint);
> +}
> +
> +bool
> +st_init_perfmon(struct st_context *st)
> +{
> +   struct gl_perf_monitor_state *perfmon = &st->ctx->PerfMonitor;
> +   struct pipe_screen *screen = st->pipe->screen;
> +   struct gl_perf_monitor_group *groups = NULL;
> +   int num_counters, num_groups;
> +   int group, counter;
> +
> +   if (!screen->get_driver_query_group_info) {
> +      /* Drivers must implement it for AMD_performance_monitor. */
> +      return false;
> +   }
> +
> +   /* Get the total number of counters. */
> +   num_counters = screen->get_driver_query_info(screen, 0, NULL);
> +   if (!num_counters)
> +      return false;
> +
> +   /* Get the number of available groups. */
> +   num_groups = screen->get_driver_query_group_info(screen, 0, NULL);
> +   if (num_groups)
> +      groups = CALLOC(num_groups,
> +                      sizeof(struct gl_perf_monitor_group));

Do you need to free these somewhere?

> +   if (!groups)
> +      return false;
> +
> +   for (group = 0; group < num_groups; group++) {
> +      struct gl_perf_monitor_group *g = &groups[group];
> +      struct pipe_driver_query_group_info group_info;
> +      struct gl_perf_monitor_counter *counters = NULL;
> +
> +      if (!screen->get_driver_query_group_info(screen, group, &group_info))
> +         continue;
> +
> +      g->Name = group_info.name;
> +      g->MaxActiveCounters = MIN2(group_info.max_active_queries,
> +                                  ST_MAX_PERFMON_QUERIES);
> +      g->NumCounters = 0;
> +      g->Counters = NULL;
> +
> +      if (group_info.num_queries)
> +         counters = CALLOC(group_info.num_queries,
> +                           sizeof(struct gl_perf_monitor_group));
> +      if (!counters)
> +         goto fail;
> +
> +      for (counter = 0; counter < num_counters; counter++) {
> +         struct gl_perf_monitor_counter *c = &counters[g->NumCounters];
> +         struct pipe_driver_query_info info;
> +
> +         if (!screen->get_driver_query_info(screen, counter, &info))
> +            continue;
> +         if (info.group_id != group)
> +            continue;
> +
> +         c->Name = info.name;
> +         switch (info.type) {
> +            case PIPE_DRIVER_QUERY_TYPE_UINT64:
> +               c->Minimum.u64 = 0;
> +               c->Maximum.u64 = info.max_value.u64;
> +               c->Type = GL_UNSIGNED_INT64_AMD;
> +               break;
> +            case PIPE_DRIVER_QUERY_TYPE_UINT:
> +               c->Minimum.u32 = 0;
> +               c->Maximum.u32 = info.max_value.u32;
> +               c->Type = GL_UNSIGNED_INT;
> +               break;
> +            case PIPE_DRIVER_QUERY_TYPE_FLOAT:
> +               c->Minimum.f = 0.0;
> +               c->Maximum.f = info.max_value.f;
> +               c->Type = GL_FLOAT;
> +               break;
> +            case PIPE_DRIVER_QUERY_TYPE_PERCENTAGE:
> +               c->Minimum.f = 0.0;
> +               c->Maximum.f = 100.0;
> +               c->Type = GL_PERCENTAGE_AMD;
> +               break;
> +            default:
> +               assert(!"Should never happen: invalid driver query type");
> +         }
> +         g->NumCounters++;
> +      }
> +      g->Counters = counters;
> +   }
> +
> +   perfmon->NumGroups = num_groups;
> +   perfmon->Groups = groups;
> +   return true;
> +
> +fail:
> +   for (group = 0; group < num_groups; group++)
> +      FREE((struct gl_perf_monitor_counter *)groups[group].Counters);
> +   FREE(groups);
> +   return false;
> +}
> +
> +void st_init_perfmon_functions(struct dd_function_table *functions)
> +{
> +   functions->NewPerfMonitor = st_NewPerfMonitor;
> +   functions->DeletePerfMonitor = st_DeletePerfMonitor;
> +   functions->BeginPerfMonitor = st_BeginPerfMonitor;
> +   functions->EndPerfMonitor = st_EndPerfMonitor;
> +   functions->ResetPerfMonitor = st_ResetPerfMonitor;
> +   functions->IsPerfMonitorResultAvailable = 
st_IsPerfMonitorResultAvailable;
> +   functions->GetPerfMonitorResult = st_GetPerfMonitorResult;
> +}
> diff --git a/src/mesa/state_tracker/st_cb_perfmon.h 
b/src/mesa/state_tracker/st_cb_perfmon.h
> new file mode 100644
> index 0000000..95c2fce
> --- /dev/null
> +++ b/src/mesa/state_tracker/st_cb_perfmon.h
> @@ -0,0 +1,63 @@
> +/*
> + * Copyright 2014 Nouveau Project
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the 
"Software"),
> + * to deal in the Software without restriction, including without 
limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included 
in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS 
OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> + * OTHER DEALINGS IN THE SOFTWARE.
> + *
> + * Authors: Christoph Bumiller
> + *          Samuel Pitoiset
> + */
> +
> +#ifndef ST_CB_PERFMON_H
> +#define ST_CB_PERFMON_H
> +
> +#include "main/mtypes.h"
> +
> +#define ST_MAX_PERFMON_QUERIES 8
> +
> +/**
> + * Subclass of gl_perf_monitor_object
> + */
> +struct st_perf_monitor_object
> +{
> +   struct gl_perf_monitor_object base;
> +   unsigned num_queries;
> +   struct {
> +      struct pipe_query *pq;
> +      int group_id;
> +      int counter_id;
> +   } queries[ST_MAX_PERFMON_QUERIES];
> +   bool ready;
> +};
> +
> +/**
> + * Cast wrapper
> + */
> +static INLINE struct st_perf_monitor_object *
> +st_perf_monitor_object(struct gl_perf_monitor_object *q)
> +{
> +   return (struct st_perf_monitor_object *)q;
> +}
> +
> +bool
> +st_init_perfmon(struct st_context *st);
> +
> +extern void
> +st_init_perfmon_functions(struct dd_function_table *functions);
> +
> +#endif
> diff --git a/src/mesa/state_tracker/st_context.c 
b/src/mesa/state_tracker/st_context.c
> index c7f3ec6..8aa2d6f 100644
> --- a/src/mesa/state_tracker/st_context.c
> +++ b/src/mesa/state_tracker/st_context.c
> @@ -51,6 +51,7 @@
>  #include "st_cb_fbo.h"
>  #include "st_cb_feedback.h"
>  #include "st_cb_msaa.h"
> +#include "st_cb_perfmon.h"
>  #include "st_cb_program.h"
>  #include "st_cb_queryobj.h"
>  #include "st_cb_readpixels.h"
> @@ -152,6 +153,7 @@ st_create_context_priv( struct gl_context *ctx, struct 
pipe_context *pipe,
>     st_init_bitmap(st);
>     st_init_clear(st);
>     st_init_draw( st );
> +   st_init_perfmon(st);
>  
>     /* Choose texture target for glDrawPixels, glBitmap, renderbuffers */
>     if (pipe->screen->get_param(pipe->screen, PIPE_CAP_NPOT_TEXTURES))
> @@ -363,6 +365,7 @@ void st_init_driver_functions(struct dd_function_table 
*functions)
>     st_init_fbo_functions(functions);
>     st_init_feedback_functions(functions);
>     st_init_msaa_functions(functions);
> +   st_init_perfmon_functions(functions);
>     st_init_program_functions(functions);
>     st_init_query_functions(functions);
>     st_init_cond_render_functions(functions);
> diff --git a/src/mesa/state_tracker/st_extensions.c 
b/src/mesa/state_tracker/st_extensions.c
> index 4207cb6..47c6035 100644
> --- a/src/mesa/state_tracker/st_extensions.c
> +++ b/src/mesa/state_tracker/st_extensions.c
> @@ -588,6 +588,9 @@ void st_init_extensions(struct st_context *st)
>     ctx->Extensions.OES_EGL_image_external = GL_TRUE;
>     ctx->Extensions.OES_draw_texture = GL_TRUE;
>  
> +   if (screen->get_driver_query_info && screen-
>get_driver_query_group_info)
> +      ctx->Extensions.AMD_performance_monitor = GL_TRUE;
> +
>     /* Expose the extensions which directly correspond to gallium caps. */
>     for (i = 0; i < Elements(cap_mapping); i++) {
>        if (screen->get_param(screen, cap_mapping[i].cap)) {
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20140709/5b82b311/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: This is a digitally signed message part.
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20140709/5b82b311/attachment-0001.sig>


More information about the mesa-dev mailing list