[Mesa-dev] [PATCH 9/9] i965: Pack simple pipelined query objects into the same buffer

Thu Jun 15 09:49:04 UTC 2017

Quoting Kenneth Graunke (2017-06-15 00:19:35)
> On Friday, June 9, 2017 6:01:40 AM PDT Chris Wilson wrote:
> > Reuse the same query object buffer for multiple queries within the same
> > batch.
> > 
> > A task for the future is propagating the GL_NO_MEMORY errors.
> > 
> > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > Cc: Kenneth Graunke <kenneth at whitecape.org>
> > Cc: Matt Turner <mattst88 at gmail.com>
> > ---
> >  src/mesa/drivers/dri/i965/brw_context.c   |  3 +++
> >  src/mesa/drivers/dri/i965/brw_context.h   | 10 ++++---
> >  src/mesa/drivers/dri/i965/brw_queryobj.c  | 16 +++++------
> >  src/mesa/drivers/dri/i965/gen6_queryobj.c | 44 ++++++++++++++++++++++++++-----
> >  4 files changed, 55 insertions(+), 18 deletions(-)
> 
> The benefit is saving memory, right?

I look at things from the angle of reducing overhead in relocation
handling (here by reusing the same bo/execobj).

> The downside seems to be increased WaitQuery() latencies:
> 
> - Start Query A
> - End Query A
> - Start Query B
> - Batch Flush
> - End Query B
> - WaitQuery for A
> 
> The query BO also contains B, and both batches refer to it, so it seems
> like WaitQuery() would wait for two batches to complete instead of one.

No, with a few exceptions. The latency of wait is still the same, since
you wait for the batch and a qbo only contains queries started in the
same batch. (A new qbo was created if the current is in use on the gpu.)
However if the batch was split between Begin/EndQuery those earlier waits
would then be on the second batch instead.

But since the Wait is already on the granularity of the batch, we always
pay the latency cost of whatever is executed after the query, so we only
make the existing problem occsionally worse.

Fortunately nested queries are not a thing. Right? Otherwise the
pathological case would be something like

	for q in A..Z: Begin q
	for q in A..Z: HeavyWork(); End q; glFlush()
	Wait A

In the brw-batch series, queries were tracked with a seqno (via a common
fence) so even when the bo was reused across multiple batches, we would
only wait for the batch containing its EndQuery.

Alternatively, instead of waiting for the batch, you can just turn the
wait into a busy-spin. That's a horrible decision to have to make, which
side of the perf/power trade off should you err on? Some benefit to
tracking seqno there is that at least you can some idea of whether the
task is currently active.
-Chris