[Intel-gfx] [PATCH v2] drm/i915: Handle Intel igfx + Intel dgfx hybrid graphics setup

Thu Sep 2 14:33:11 UTC 2021

On Tue, Aug 31, 2021 at 02:18:15PM +0100, Tvrtko Ursulin wrote:
> 
> On 31/08/2021 13:43, Daniel Vetter wrote:
> > On Tue, Aug 31, 2021 at 10:15:03AM +0100, Tvrtko Ursulin wrote:
> > > 
> > > On 30/08/2021 09:26, Daniel Vetter wrote:
> > > > On Fri, Aug 27, 2021 at 03:44:42PM +0100, Tvrtko Ursulin wrote:
> > > > > 
> > > > > On 27/08/2021 15:39, Tvrtko Ursulin wrote:
> > > > > > From: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> > > > > > 
> > > > > > In short this makes i915 work for hybrid setups (DRI_PRIME=1 with Mesa)
> > > > > > when rendering is done on Intel dgfx and scanout/composition on Intel
> > > > > > igfx.
> > > > > > 
> > > > > > Before this patch the driver was not quite ready for that setup, mainly
> > > > > > because it was able to emit a semaphore wait between the two GPUs, which
> > > > > > results in deadlocks because semaphore target location in HWSP is neither
> > > > > > shared between the two, nor mapped in both GGTT spaces.
> > > > > > 
> > > > > > To fix it the patch adds an additional check to a couple of relevant code
> > > > > > paths in order to prevent using semaphores for inter-engine
> > > > > > synchronisation between different driver instances.
> > > > > > 
> > > > > > Patch also moves singly used i915_gem_object_last_write_engine to be
> > > > > > private in its only calling unit (debugfs), while modifying it to only
> > > > > > show activity belonging to the respective driver instance.
> > > > > > 
> > > > > > What remains in this problem space is the question of the GEM busy ioctl.
> > > > > > We have a somewhat ambigous comment there saying only status of native
> > > > > > fences will be reported, which could be interpreted as either i915, or
> > > > > > native to the drm fd. For now I have decided to leave that as is, meaning
> > > > > > any i915 instance activity continues to be reported.
> > > > > > 
> > > > > > v2:
> > > > > >     * Avoid adding rq->i915. (Chris)
> > > > > > 
> > > > > > Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> > > > 
> > > > Can't we just delete semaphore code and done?
> > > > - GuC won't have it
> > > > - media team benchmarked on top of softpin media driver, found no
> > > >     difference
> > > 
> > > You have S-curve for saturated workloads or something else? How thorough and
> > > which media team I guess.
> > > 
> > >  From memory it was a nice win for some benchmarks (non-saturated ones), but
> > > as I have told you previously, we haven't been putting numbers in commit
> > > messages since it wasn't allowed. I may be able to dig out some more details
> > > if I went trawling through GEM channel IRC logs, although probably not the
> > > actual numbers since those were usually on pastebin. Or you go an talk with
> > > Chris since he probably remembers more details. Or you just decide you don't
> > > care and remove it. I wouldn't do that without putting the complete story in
> > > writing, but it's your call after all.
> > 
> > Media has also changed, they're not using relocations anymore.
> 
> Meaning you think it changes the benchmarking story? When coupled with
> removal of GPU relocations then possibly yes.
> 
> > Unless there's solid data performance tuning of any kind that gets in the
> > way simply needs to be removed. Yes this is radical, but the codebase is
> > in a state to require this.
> > 
> > So either way we'd need to rebenchmark this if it's really required. Also
> 
> Therefore can you share what benchmarks have been done or is it secret?  As
> said, I think the non-saturated case was the more interesting one here.
> 
> > if we really need this code still someone needs to fix the design, the
> > current code is making layering violations an art form.
> >
> > > Anyway, without the debugfs churn it is more or less two line patch to fix
> > > igfx + dgfx hybrid setup. So while mulling it over this could go in. I'd
> > > just refine it to use a GGTT check instead of GT. And unless DG1 ends up
> > > being GuC only.
> > 
> > The minimal robust fix here is imo to stop us from upcasting dma_fence to
> > i915_request if it's not for our device. Not sprinkle code here into the
> > semaphore code. We shouldn't even get this far with foreign fences.
> 
> Device check does not work for multi-tile. It was one of my earlier attempts
> before I realized the problem. You'll see v3 which I think handles all the
> cases.

There is no hw semaphores on multi-tile.

But there _is_ a lot more going on than just hw semaphores that spawn
driver instances. Like priority boosting, clock boosting, and all kinds of
other things. I really dont' think it's very robust if we play
whack-a-mole here with things leaking.
-Daniel

> You also forgot to comment on the question lower in the email. I'll just
> send a patch which removes that anyway so you can comment there.
> 
> Regards,
> 
> Tvrtko
> 
> > -Daniel
> > 
> > > 
> > > > - pre-gen8 semaphore code was also silently ditched and no one cared
> > > > 
> > > > Plus removing semaphore code would greatly simplify conversion to
> > > > drm/sched.
> > > > 
> > > > > > ---
> > > > > >     drivers/gpu/drm/i915/gem/i915_gem_object.h | 17 ----------
> > > > > >     drivers/gpu/drm/i915/i915_debugfs.c        | 39 ++++++++++++++++++++--
> > > > > >     drivers/gpu/drm/i915/i915_request.c        | 12 ++++++-
> > > > > >     3 files changed, 47 insertions(+), 21 deletions(-)
> > > > > > 
> > > > > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h b/drivers/gpu/drm/i915/gem/i915_gem_object.h
> > > > > > index 48112b9d76df..3043fcbd31bd 100644
> > > > > > --- a/drivers/gpu/drm/i915/gem/i915_gem_object.h
> > > > > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h
> > > > > > @@ -503,23 +503,6 @@ i915_gem_object_finish_access(struct drm_i915_gem_object *obj)
> > > > > >     	i915_gem_object_unpin_pages(obj);
> > > > > >     }
> > > > > > -static inline struct intel_engine_cs *
> > > > > > -i915_gem_object_last_write_engine(struct drm_i915_gem_object *obj)
> > > > > > -{
> > > > > > -	struct intel_engine_cs *engine = NULL;
> > > > > > -	struct dma_fence *fence;
> > > > > > -
> > > > > > -	rcu_read_lock();
> > > > > > -	fence = dma_resv_get_excl_unlocked(obj->base.resv);
> > > > > > -	rcu_read_unlock();
> > > > > > -
> > > > > > -	if (fence && dma_fence_is_i915(fence) && !dma_fence_is_signaled(fence))
> > > > > > -		engine = to_request(fence)->engine;
> > > > > > -	dma_fence_put(fence);
> > > > > > -
> > > > > > -	return engine;
> > > > > > -}
> > > > > > -
> > > > > >     void i915_gem_object_set_cache_coherency(struct drm_i915_gem_object *obj,
> > > > > >     					 unsigned int cache_level);
> > > > > >     void i915_gem_object_flush_if_display(struct drm_i915_gem_object *obj);
> > > > > > diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> > > > > > index 04351a851586..55fd6191eb32 100644
> > > > > > --- a/drivers/gpu/drm/i915/i915_debugfs.c
> > > > > > +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> > > > > > @@ -135,13 +135,46 @@ static const char *stringify_vma_type(const struct i915_vma *vma)
> > > > > >     	return "ppgtt";
> > > > > >     }
> > > > > > +static char *
> > > > > > +last_write_engine(struct drm_i915_private *i915,
> > > > > > +		  struct drm_i915_gem_object *obj)
> > > > > > +{
> > > > > > +	struct intel_engine_cs *engine;
> > > > > > +	struct dma_fence *fence;
> > > > > > +	char *res = NULL;
> > > > > > +
> > > > > > +	rcu_read_lock();
> > > > > > +	fence = dma_resv_get_excl_unlocked(obj->base.resv);
> > > > > > +	rcu_read_unlock();
> > > > > > +
> > > > > > +	if (!fence || dma_fence_is_signaled(fence))
> > > > > > +		goto out;
> > > > > > +
> > > > > > +	if (!dma_fence_is_i915(fence)) {
> > > > > > +		res = "<external-fence>";
> > > > > > +		goto out;
> > > > > > +	}
> > > > > > +
> > > > > > +	engine = to_request(fence)->engine;
> > > > > > +	if (engine->gt->i915 != i915) {
> > > > > > +		res = "<external-i915>";
> > > > > > +		goto out;
> > > > > > +	}
> > > > > > +
> > > > > > +	res = engine->name;
> > > > > > +
> > > > > > +out:
> > > > > > +	dma_fence_put(fence);
> > > > > > +	return res;
> > > > > > +}
> > > > > > +
> > > > > >     void
> > > > > >     i915_debugfs_describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
> > > > > >     {
> > > > > >     	struct drm_i915_private *dev_priv = to_i915(obj->base.dev);
> > > > > > -	struct intel_engine_cs *engine;
> > > > > >     	struct i915_vma *vma;
> > > > > >     	int pin_count = 0;
> > > > > > +	char *engine;
> > > > > >     	seq_printf(m, "%pK: %c%c%c %8zdKiB %02x %02x %s%s%s",
> > > > > >     		   &obj->base,
> > > > > > @@ -230,9 +263,9 @@ i915_debugfs_describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
> > > > > >     	if (i915_gem_object_is_framebuffer(obj))
> > > > > >     		seq_printf(m, " (fb)");
> > > > > > -	engine = i915_gem_object_last_write_engine(obj);
> > > > > > +	engine = last_write_engine(dev_priv, obj);
> > > > > >     	if (engine)
> > > > > > -		seq_printf(m, " (%s)", engine->name);
> > > > > > +		seq_printf(m, " (%s)", engine);
> > > > > 
> > > > > Or I zap this from the code altogether. Not sure it is very useful since the
> > > > > only caller is i915_gem_framebuffer debugfs file and how much it can care
> > > > > about maybe hitting the timing window when exclusive fence will contain
> > > > > something.
> > > > 
> > > > Ideally we'd just look at the fence timeline name. But i915 has this very
> > > > convoluted typesafe-by-rcu reuse which means we actually can't do that,
> > > > and our fence timeline name is very useless.
> > > 
> > > Why do we even care to output any of this here? I'd just remove it since it
> > > is a very transient state with an extremely short window of opportunity to
> > > make it show anything. Which I think makes it pretty useless in debugfs.
> > > 
> > > Regards,
> > > 
> > > Tvrtko
> > > 
> > > > 
> > > > Would be good to fix that, Matt Auld has started an attempt but didn't get
> > > > very far.
> > > > -Daniel
> > > > 
> > > > > 
> > > > > Regards,
> > > > > 
> > > > > Tvrtko
> > > > > 
> > > > > >     }
> > > > > >     static int i915_gem_object_info(struct seq_file *m, void *data)
> > > > > > diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> > > > > > index ce446716d092..64adf619fe82 100644
> > > > > > --- a/drivers/gpu/drm/i915/i915_request.c
> > > > > > +++ b/drivers/gpu/drm/i915/i915_request.c
> > > > > > @@ -1152,6 +1152,12 @@ __emit_semaphore_wait(struct i915_request *to,
> > > > > >     	return 0;
> > > > > >     }
> > > > > > +static bool
> > > > > > +can_use_semaphore_wait(struct i915_request *to, struct i915_request *from)
> > > > > > +{
> > > > > > +	return to->engine->gt == from->engine->gt;
> > > > > > +}
> > > > > > +
> > > > > >     static int
> > > > > >     emit_semaphore_wait(struct i915_request *to,
> > > > > >     		    struct i915_request *from,
> > > > > > @@ -1160,6 +1166,9 @@ emit_semaphore_wait(struct i915_request *to,
> > > > > >     	const intel_engine_mask_t mask = READ_ONCE(from->engine)->mask;
> > > > > >     	struct i915_sw_fence *wait = &to->submit;
> > > > > > +	if (!can_use_semaphore_wait(to, from))
> > > > > > +		goto await_fence;
> > > > > > +
> > > > > >     	if (!intel_context_use_semaphores(to->context))
> > > > > >     		goto await_fence;
> > > > > > @@ -1263,7 +1272,8 @@ __i915_request_await_execution(struct i915_request *to,
> > > > > >     	 * immediate execution, and so we must wait until it reaches the
> > > > > >     	 * active slot.
> > > > > >     	 */
> > > > > > -	if (intel_engine_has_semaphores(to->engine) &&
> > > > > > +	if (can_use_semaphore_wait(to, from) &&
> > > > > > +	    intel_engine_has_semaphores(to->engine) &&
> > > > > >     	    !i915_request_has_initial_breadcrumb(to)) {
> > > > > >     		err = __emit_semaphore_wait(to, from, from->fence.seqno - 1);
> > > > > >     		if (err < 0)
> > > > > > 
> > > > 
> > 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch