[Mesa-dev] [PATCH] i965: perf: minimize the chances to spread queries across batchbuffers

Mon Jul 24 15:30:50 UTC 2017

Hi,

I would like to nominate this commit for 17.1.
It can be found as commit adafe4b733c0242720ccfe10d391e5d44c0e7401 in 
the master branch (I believe it's alreadyin 17.2).

Thanks,

-
Lionel

On 22/06/17 02:25, Lionel Landwerlin wrote:
> Counter related to timings will be sensitive to any delay introduced
> by the software. In particular if our begin & end of performance
> queries end up in different batches, time related counters will
> exhibit biffer values caused by the time it takes for the kernel
> driver to load new requests into the hardware.
>
> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin at intel.com>
> ---
>   src/mesa/drivers/dri/i965/brw_performance_query.c | 8 ++++++++
>   1 file changed, 8 insertions(+)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_performance_query.c b/src/mesa/drivers/dri/i965/brw_performance_query.c
> index 06576a54d03..6b874d0bbee 100644
> --- a/src/mesa/drivers/dri/i965/brw_performance_query.c
> +++ b/src/mesa/drivers/dri/i965/brw_performance_query.c
> @@ -1063,6 +1063,14 @@ brw_end_perf_query(struct gl_context *ctx,
>                                                obj->oa.begin_report_id + 1);
>         }
>   
> +      /* We flush the batchbuffer here to minimize the chances that MI_RPC
> +       * delimiting commands end up in different batchbuffers. If that's the
> +       * case, the measurement will include the time it takes for the kernel
> +       * scheduler to load a new request into the hardware. This is manifested
> +       * in tools like frameretrace by spikes in the "GPU Core Clocks"
> +       * counter.
> +       */
> +      intel_batchbuffer_flush(brw);
>         --brw->perfquery.n_active_oa_queries;
>   
>         /* NB: even though the query has now ended, it can't be accumulated