[Intel-gfx] [PATCH 2/3] iris: Create a composite context for both compute and render pipelines

Tue Mar 26 17:01:57 UTC 2019

On Tuesday, March 26, 2019 12:16:20 AM PDT Chris Wilson wrote:
> Quoting Kenneth Graunke (2019-03-26 05:52:10)
> > On Monday, March 25, 2019 3:58:59 AM PDT Chris Wilson wrote:
> > > iris currently uses two distinct GEM contexts to have distinct logical
> > > HW contexts for the compute and render pipelines. However, using two
> > > distinct GEM contexts implies that they are distinct timelines, yet as
> > > they are a single GL context that implies they belong to a single
> > > timeline from the client perspective. Currently, fences are occasionally
> > > inserted to order the two timelines. Using 2 GEM contexts, also implies
> > > that we keep 2 ppGTT for identical buffer state. If we can create a
> > > single GEM context, with the right capabilities, we can have a single
> > > VM, a single timeline, but 2 logical HW contexts for the 2 pipelines.
> > > 
> > > This is allowed through the new context interface that allows VM to be
> > > shared, timelines to be specified, and for the logical contexts to be
> > > constructed as the user desires.
> > > 
> > > Cc: Joonas Lahtinen <joonas.lahtinen at linux.intel.com>
> > > Cc: Kenneth Graunke <kenneth at whitecape.org>
> > > ---
> > >  src/gallium/drivers/iris/iris_batch.c   | 16 ++-----
> > >  src/gallium/drivers/iris/iris_batch.h   |  5 +--
> > >  src/gallium/drivers/iris/iris_context.c | 56 ++++++++++++++++++++++++-
> > >  3 files changed, 60 insertions(+), 17 deletions(-)
> > 
> > Hi Chris,
> > 
> > I don't think that I want the single timeline option.  It seems like
> > we've been moving away from implicit sync for a long time, and the
> > explicit sync code we have is pretty straightforward and seems to do
> > the trick.  Jason and I also chatted briefly, and we don't necessarily
> > want to a strict submission-order between render/compute.
> 
> I disagree if you think this means more implicit sync. It is setting up
> the GEM context to an exact match of the GL context, by _explicit_
> control of the timeline. Then the fences you do export from inside the
> GL context do not need to be faked to be a composite of the pair of
> contexts. You still have explicit fences, and you have explicit control
> over the definition of their timeline.

With regard to multiple GL contexts, yes, everything remains explicit.
But having 2-3 separate timelines within a GL context allows us to
reorder work behind GL's back, which is all the rage these days for
performance.  Tilers do it all the time.  Position-only bucketing may
require it.  I'd really like to start treating render and compute as
distinct asynchronous queues.  At the very least, experimenting with
that and not tying my hands to a particular behavior.

There may be some use for single timeline, though.  Attaching images as
compute shader inputs may require CCS/HiZ resolves, which have to happen
on the RCS.  Right now, I do those on IRIS_BATCH_RENDER, which mean that
it backs up behind any queued render work.  Ideally, I'd do those on a
third context, which could be tied to the compute timeline, so the
resolves and the compute job can both execute ahead of queued rendering,
but still back to back.

> > Separating the VMA from the context state image seems like absolutely
> > the right thing to do - as you said, they're separate in hardware,
> > and no real reason to tie it together.  I would be in favor of new
> > uABI for that.
> > 
> > I don't think there will be much overhead reduction from sharing the
> > VMA here though.  It's very plausible that the compositor might want
> > to run between render and compute batches, at which point we end up
> > doing page directory loads anyway.  I have also heard rumors about bit
> > 47 becoming magical at some point which may prohibit us from sharing...
> 
> Yeah, but that doesn't actually affect the context setup, just how you
> decide to use it in end. And by that point, you'll be forced into using
> this new uABI anyway or something entirely different :-p

Looking into this a bit more, I think we're actually OK.  I thought I
might need to have distinct addresses for render and compute - at which
point nearly every address would differ in terms of bit 47 - but it
looks like the correct answer is "just never use that bit".  *shrug*

> > Context cloning seems OK, but I'm always pretty hesitant to add new
> > uABI unless it's strictly necessary.  In this case, we can do the same
> > thing with a little bit of userspace code, so I'm not sure it's worth
> > adding that...
> 
> Actually you cannot do the same without some of the new uABI either,
> since previously you did not have all the parameters exposed.

What isn't exposed?  We set up everything the first time, why can't we
do it again?

> > I would love to see an iris patch to use the new
> > I915_CONTEXT_PARAM_RECOVERABLE option without the other dependencies.
> 
> https://gitlab.freedesktop.org/ickle/mesa/commit/84d9cb1d8d98a50dcceea19ccbc3836b15cf73ae
> -Chris
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part.
URL: <https://lists.freedesktop.org/archives/intel-gfx/attachments/20190326/97349ddf/attachment.sig>