[PATCH] drm/i915/dpt: Restrict shrinker to DPT objects not mapped

Srinivas, Vidya vidya.srinivas at intel.com
Sat Nov 23 04:08:35 UTC 2024



> -----Original Message-----
> From: Ville Syrjälä <ville.syrjala at linux.intel.com>
> Sent: 23 November 2024 04:18
> To: Srinivas, Vidya <vidya.srinivas at intel.com>
> Cc: intel-gfx at lists.freedesktop.org; Syrjala, Ville <ville.syrjala at intel.com>;
> bgeffon at google.com; Lee, Shawn C <shawn.c.lee at intel.com>
> Subject: Re: [PATCH] drm/i915/dpt: Restrict shrinker to DPT objects not
> mapped
> 
> On Fri, Nov 22, 2024 at 04:37:41PM +0530, Vidya Srinivas wrote:
> > Restricting all DPT objects as unshrinkable was causing some
> > chromebooks to run out of memory causing DMA remap failures. Thanks to
> > Brian Geffon for the pointers on debug and suggesting usage of
> > !obj->mm.mapping
> >
> > Fixes: 43e2b37e2ab6 ("drm/i915/dpt: Make DPT object unshrinkable")
> >
> > Credits-to: Brian Geffon <bgeffon at google.com>
> > Suggested-by: Ville Syrj_l_ <ville.syrjala at linux.intel.com>
> > Signed-off-by: Vidya Srinivas <vidya.srinivas at intel.com>
> > ---
> >  drivers/gpu/drm/i915/gem/i915_gem_object.h | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h
> > b/drivers/gpu/drm/i915/gem/i915_gem_object.h
> > index 3dc61cbd2e11..b155f0139d4e 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_object.h
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h
> > @@ -285,7 +285,7 @@ i915_gem_object_is_shrinkable(const struct
> > drm_i915_gem_object *obj)  {
> >  	/* TODO: make DPT shrinkable when it has no bound vmas */
> >  	return i915_gem_object_type_has(obj,
> I915_GEM_OBJECT_IS_SHRINKABLE) &&
> > -		!obj->is_dpt;
> > +		!(obj->is_dpt && obj->mm.mapping);
> 
> AFAICS the mapping could only become NULL if the DPT got evicted from the
> GGTT, but if that can happen then I think the current code must still be busted
> wrt. suspend/resume even with the unshrinkable DPT obj.
> 
> Looking at the vma suspend stuff I think the only way something could still be
> bound in the DPT during resume is if it was pinned during suspend. But we
> should be unpinning all FBs during suspend, so that's a bit weird in itself. Hmm,
> we do the unpinning from the cleanup_work so maybe that's still pending
> when we suspend and thus something is still pinned in the DPT. And indeed
> there is no flush_work()/etc. for that. So perhaps if we add that we could just
> revert the unshrinkable DPT patch.
> 
Hello Ville,
Thank you so much. Yes, Google team also insist on that as they worry the DPT
patch might just be a workaround for the real memory corruption.

> Did we have a good way to reproduce the resume explosion? If not, I suppose
> we could try to confirm the lack of flush_work() as the culprit by simply sticking
> some kind of sleep() into the cleanup function to make sure it doesn't get a
> chance to run during suspend.
We can reproduce the original issue where during the suspend/resume,
the meteorlake chromebook goes for a reboot and we have the crash
captured from console-ramoops.

> 
> I also had this:
> https://patchwork.freedesktop.org/patch/503398/?series=108669&rev=1
> which would be good to have so that we can actully do the obvious
> flush_work(cleanup_work) instead of playing confusing tricks with the the
> commit_work.
> 
Thank you so much. Will try this patch as well and update.

Regards
Vidya

> >  }
> >
> >  static inline bool
> > --
> > 2.34.1
> 
> --
> Ville Syrjälä
> Intel


More information about the Intel-gfx mailing list