[Mesa-dev] [PATCH] anv: implement pipeline statistics queries

Thu Feb 16 14:21:29 UTC 2017

On Wed, Feb 15, 2017 at 11:04 PM, Ilia Mirkin <imirkin at alum.mit.edu> wrote:
> On Tue, Jan 24, 2017 at 5:27 PM, Robert Bragg <robert at sixbynine.org> wrote:
>>>>>> Depending on how strictly we consider that the queries should only measure
>>>>>> the commands they bracket then I think some stalling will be necessary to
>>>>>> serialize the work associated with a query and defer reading the end state
>>>>>> until after the relevant stages have completed their work.
>>>>>>
>>>>>> We aren't very precise about this in GL currently, but in Begin maybe we
>>>>>> should stall until everything >= the statistic-stage is idle and in End
>>>>>> stall until everything <= the statistic-stage is idle before reading
>>>>>> (where
>>>>>> 'statistic-stage' here is the pipeline stage associated with the pipeline
>>>>>> statistic being queried (or respectively the min/max stage for a set)).
>>>>>>
>>>>>> For reference in my implementation of INTEL_performance_query facing this
>>>>>> same question, I'm currently just stalling before and after queries:
>>>>>>
>>>>>>
>>>>>> https://github.com/rib/mesa/blob/wip/rib/oa-next/src/mesa/drivers/dri/i965/brw_performance_query.c#L994
>>>>>>
>>>>>> https://github.com/rib/mesa/blob/wip/rib/oa-next/src/mesa/drivers/dri/i965/brw_performance_query.c#L1136
>>>>>
>>>>> So that's essentially what I'm doing here, I think. (And what the GL
>>>>> driver does.)
>>
>> Yup, the upshot might just be a comment explaining the need for a
>> stall. I think we probably need a stall in CmdEndQuery too, otherwise
>> the command streamer may read the end counter before the work has
>> finished.
>
> Robert,
>
> Can you give me some examples of how I might implement this? I'm not
> so familiar with the Intel HW to know this offhand. Mostly hoping you
> can point me at a mapping of which bit in what command corresponds to
> which stage.

Heh, actually just after I sent out my series for
GL_INTEL_performance_query yesterday I of course remembered that I
needed to fold back command streamer synchronization from a later
patch to the one for pipeline statistics.

My last reply was just trying to suggest replacing the "TODO: This
might only be necessary for certain stats" comment - so nothing to
really implement. I had thought you might be missing a corresponding
stall in the CmdEndQuery but just checking it looks like you already
have one with the same TODO comment. Sorry I didn't double check that
at the time.

I'm not sure it's worth worrying about trying to apply fine grained
control over flushing, even though I suggested that idea originally.
After looking into that possibility more I don't think the HW actually
supports very detailed control (with one exception maybe being to use
DEPTH_STALL with occlusion queries).

My (limited) understanding is that a PIPE_CONTROL with CS_STALL and
STALL_AT_SCOREBOARD should generally suffice to stall the command
streamer until the pipeline has been drained. Since this is what you
are already doing my last reply was trying to say that it maybe just
needs a better comment to explain why we need:

+      /* TODO: This might only be necessary for certain stats */
+      anv_batch_emit(&cmd_buffer->batch, GENX(PIPE_CONTROL), pc) {
+         pc.CommandStreamerStallEnable = true;
+         pc.StallAtPixelScoreboard = true;
+      }

instead of "TODO: This might only be necessary for certain stats".

I don't know if it's a clear explanation, but feel free to steal
anything from my latest attempt to comment the need for stalling in
this patch for INTEL_performance_query.

https://lists.freedesktop.org/archives/mesa-dev/2017-February/144670.html

Btw, in case you ask, I've never found a good explanation of what
'stall at scoreboard' really means since I'm not really familiar with
what the scoreboard is. :-/ One impression I've got is that it's just
the least-restrictive way of satisfying the restrictions on using
CS_STALL. I think the scoreboard is something related to dependency
tracking while scheduling threads to execute on EUs so I currently
imagine it to mean "stall until there are no more threads left to
schedule for pixel shading" - maybe someone else knows better.

One other data point here is that the Intel driver on windows uses
PIPE_CONTROL + CS_STALL + STALL_AT_SCOREBOARD in its implementation of
INTEL_performance_query and query objects, so hopefully they've found
that to be enough. So even if this does nothing to explain why, all
things being equal it could be good to be consistent if we're ever
trying to compare metrics between different drivers.

Regards,
- Robert

>
> Thanks,
>
>   -ilia