[Intel-gfx] [PATCH RESEND] drm/i915: Flush buffer pools on driver remove

Matt Roper matthew.d.roper at intel.com
Thu Sep 23 13:51:14 UTC 2021


On Thu, Sep 23, 2021 at 03:07:06PM +0200, Janusz Krzysztofik wrote:
> Hi Matt,
> 
> Thanks for review.
> 
> On czwartek, 23 września 2021 00:24:29 CEST Matt Roper wrote:
> > On Fri, Sep 03, 2021 at 04:23:20PM +0200, Janusz Krzysztofik wrote:
> > > In preparation for clean driver release, attempts to drain work queues
> > > and release freed objects are taken at driver remove time.  However, GT
> > > buffer pools are now not flushed before the driver release phase.
> > > Since unused objects may stay there for up to one second, some may
> > > survive until driver release is attempted.  That can potentially
> > > explain sporadic then hardly reproducible issues observed at driver
> > > release time, like non-zero shrink counter or outstanding address space
> > 
> > So just to make sure I'm understanding the description here:
> >  - We currently do an explicit flush of the buffer pools within the call
> >    path of drm_driver.release(); this removes all buffers, regardless of
> >    their age.
> 
> And also triggers release of the buffers' underlying resources (objects, 
> address space areas).
> 
> >  - However there may be other code that runs *earlier* within the
> >    drm_driver.release() call chain 
> 
> Yes, within the drm_driver.release() call chain, but not necessarily earlier 
> -- that's irrelevant, I believe, ...
> 
> >    that expects buffer pools have
> >    already been flushed and are already empty.
> 
> ... since that other code expects not just buffer pools but resource 
> categories they consume (objects, address space areas) to be flushed already, 
> while some may still be busy with old buffers not auto-flushed yet.
> 
> >  - Since buffer pools auto-flush old buffers once per second in a worker
> >    thread, there's a small window where if we remove the driver while
> >    there are still buffers with an age of less than one second, the
> >    assumptions of the other release code may be violated.
> 
> Correct.
> 
> > So by moving the flush to driver remove (which executes earlier via the
> > pci_driver.remove() flow) you're ensuring that all buffers are flushed
> > before _any_ code in drm_driver.release() executes.
> 
> And also flushed before some other code in pci_driver.remove() flushes those 
> other resource categories released on buffer pools flush, then completeness of 
> all those flushes is checked in drm_driver.release().
> 
> May I copy-paste some of you wording while rephrasing my commit description?

Sure go ahead.

Thanks.


Matt

> 
> Thanks,
> Janusz
> 
> > 
> > I found the wording of the commit message here somewhat confusing since
> > it's talking about flushes we do in driver release, but mentions
> > problems that arise during driver release due to lack of flushing.  You
> > might want to reword the commit message somewhat to help clarify.
> > Otherwise, the code change itself looks reasonable to me.
> > 
> > BTW, I do notice that drm_driver.release() in general is technically
> > deprecated at this point (with a suggestion in the drm_drv.h comments to
> > switch to using drmm_add_action(), drmm_kmalloc(), etc. to manage the
> > cleanup of resources).  At some point in the future me may want to
> > rework the i915 cleanup in general according to that guidance.
> > 
> > 
> > Matt
> > 
> > > areas.
> > > 
> > > Flush buffer pools on GT remove as a fix.  On driver release, don't
> > > flush the pools again, just assert that the flush was called and
> > > nothing added more in between.
> > > 
> > > Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik at linux.intel.com>
> > > Cc: Chris Wilson <chris at chris-wilson.co.uk>
> > > ---
> > > Resending with Cc: dri-devel at lists.freedesktop.org as requested, and a
> > > typo in commit description fixed.
> > > 
> > > Thanks,
> > > Janusz
> > > 
> > >  drivers/gpu/drm/i915/gt/intel_gt.c             | 2 ++
> > >  drivers/gpu/drm/i915/gt/intel_gt_buffer_pool.c | 2 --
> > >  2 files changed, 2 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
> > > index 62d40c986642..8f322a4ecd87 100644
> > > --- a/drivers/gpu/drm/i915/gt/intel_gt.c
> > > +++ b/drivers/gpu/drm/i915/gt/intel_gt.c
> > > @@ -737,6 +737,8 @@ void intel_gt_driver_remove(struct intel_gt *gt)
> > >  	intel_uc_driver_remove(&gt->uc);
> > >  
> > >  	intel_engines_release(gt);
> > > +
> > > +	intel_gt_flush_buffer_pool(gt);
> > >  }
> > >  
> > >  void intel_gt_driver_unregister(struct intel_gt *gt)
> > > diff --git a/drivers/gpu/drm/i915/gt/intel_gt_buffer_pool.c b/drivers/gpu/drm/i915/gt/intel_gt_buffer_pool.c
> > > index aa0a59c5b614..acc49c56a9f3 100644
> > > --- a/drivers/gpu/drm/i915/gt/intel_gt_buffer_pool.c
> > > +++ b/drivers/gpu/drm/i915/gt/intel_gt_buffer_pool.c
> > > @@ -245,8 +245,6 @@ void intel_gt_fini_buffer_pool(struct intel_gt *gt)
> > >  	struct intel_gt_buffer_pool *pool = &gt->buffer_pool;
> > >  	int n;
> > >  
> > > -	intel_gt_flush_buffer_pool(gt);
> > > -
> > >  	for (n = 0; n < ARRAY_SIZE(pool->cache_list); n++)
> > >  		GEM_BUG_ON(!list_empty(&pool->cache_list[n]));
> > >  }
> > 
> > 
> 
> 
> 
> 

-- 
Matt Roper
Graphics Software Engineer
VTT-OSGC Platform Enablement
Intel Corporation
(916) 356-2795


More information about the Intel-gfx mailing list