[Intel-gfx] [PATCH RESEND] drm/i915: Flush buffer pools on driver remove
Matt Roper
matthew.d.roper at intel.com
Thu Sep 23 13:51:14 UTC 2021
On Thu, Sep 23, 2021 at 03:07:06PM +0200, Janusz Krzysztofik wrote:
> Hi Matt,
>
> Thanks for review.
>
> On czwartek, 23 września 2021 00:24:29 CEST Matt Roper wrote:
> > On Fri, Sep 03, 2021 at 04:23:20PM +0200, Janusz Krzysztofik wrote:
> > > In preparation for clean driver release, attempts to drain work queues
> > > and release freed objects are taken at driver remove time. However, GT
> > > buffer pools are now not flushed before the driver release phase.
> > > Since unused objects may stay there for up to one second, some may
> > > survive until driver release is attempted. That can potentially
> > > explain sporadic then hardly reproducible issues observed at driver
> > > release time, like non-zero shrink counter or outstanding address space
> >
> > So just to make sure I'm understanding the description here:
> > - We currently do an explicit flush of the buffer pools within the call
> > path of drm_driver.release(); this removes all buffers, regardless of
> > their age.
>
> And also triggers release of the buffers' underlying resources (objects,
> address space areas).
>
> > - However there may be other code that runs *earlier* within the
> > drm_driver.release() call chain
>
> Yes, within the drm_driver.release() call chain, but not necessarily earlier
> -- that's irrelevant, I believe, ...
>
> > that expects buffer pools have
> > already been flushed and are already empty.
>
> ... since that other code expects not just buffer pools but resource
> categories they consume (objects, address space areas) to be flushed already,
> while some may still be busy with old buffers not auto-flushed yet.
>
> > - Since buffer pools auto-flush old buffers once per second in a worker
> > thread, there's a small window where if we remove the driver while
> > there are still buffers with an age of less than one second, the
> > assumptions of the other release code may be violated.
>
> Correct.
>
> > So by moving the flush to driver remove (which executes earlier via the
> > pci_driver.remove() flow) you're ensuring that all buffers are flushed
> > before _any_ code in drm_driver.release() executes.
>
> And also flushed before some other code in pci_driver.remove() flushes those
> other resource categories released on buffer pools flush, then completeness of
> all those flushes is checked in drm_driver.release().
>
> May I copy-paste some of you wording while rephrasing my commit description?
Sure go ahead.
Thanks.
Matt
>
> Thanks,
> Janusz
>
> >
> > I found the wording of the commit message here somewhat confusing since
> > it's talking about flushes we do in driver release, but mentions
> > problems that arise during driver release due to lack of flushing. You
> > might want to reword the commit message somewhat to help clarify.
> > Otherwise, the code change itself looks reasonable to me.
> >
> > BTW, I do notice that drm_driver.release() in general is technically
> > deprecated at this point (with a suggestion in the drm_drv.h comments to
> > switch to using drmm_add_action(), drmm_kmalloc(), etc. to manage the
> > cleanup of resources). At some point in the future me may want to
> > rework the i915 cleanup in general according to that guidance.
> >
> >
> > Matt
> >
> > > areas.
> > >
> > > Flush buffer pools on GT remove as a fix. On driver release, don't
> > > flush the pools again, just assert that the flush was called and
> > > nothing added more in between.
> > >
> > > Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik at linux.intel.com>
> > > Cc: Chris Wilson <chris at chris-wilson.co.uk>
> > > ---
> > > Resending with Cc: dri-devel at lists.freedesktop.org as requested, and a
> > > typo in commit description fixed.
> > >
> > > Thanks,
> > > Janusz
> > >
> > > drivers/gpu/drm/i915/gt/intel_gt.c | 2 ++
> > > drivers/gpu/drm/i915/gt/intel_gt_buffer_pool.c | 2 --
> > > 2 files changed, 2 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
> > > index 62d40c986642..8f322a4ecd87 100644
> > > --- a/drivers/gpu/drm/i915/gt/intel_gt.c
> > > +++ b/drivers/gpu/drm/i915/gt/intel_gt.c
> > > @@ -737,6 +737,8 @@ void intel_gt_driver_remove(struct intel_gt *gt)
> > > intel_uc_driver_remove(>->uc);
> > >
> > > intel_engines_release(gt);
> > > +
> > > + intel_gt_flush_buffer_pool(gt);
> > > }
> > >
> > > void intel_gt_driver_unregister(struct intel_gt *gt)
> > > diff --git a/drivers/gpu/drm/i915/gt/intel_gt_buffer_pool.c b/drivers/gpu/drm/i915/gt/intel_gt_buffer_pool.c
> > > index aa0a59c5b614..acc49c56a9f3 100644
> > > --- a/drivers/gpu/drm/i915/gt/intel_gt_buffer_pool.c
> > > +++ b/drivers/gpu/drm/i915/gt/intel_gt_buffer_pool.c
> > > @@ -245,8 +245,6 @@ void intel_gt_fini_buffer_pool(struct intel_gt *gt)
> > > struct intel_gt_buffer_pool *pool = >->buffer_pool;
> > > int n;
> > >
> > > - intel_gt_flush_buffer_pool(gt);
> > > -
> > > for (n = 0; n < ARRAY_SIZE(pool->cache_list); n++)
> > > GEM_BUG_ON(!list_empty(&pool->cache_list[n]));
> > > }
> >
> >
>
>
>
>
--
Matt Roper
Graphics Software Engineer
VTT-OSGC Platform Enablement
Intel Corporation
(916) 356-2795
More information about the Intel-gfx
mailing list