[PATCH] drm/i915/dpt: Restrict shrinker to DPT objects not mapped
Srinivas, Vidya
vidya.srinivas at intel.com
Sat Nov 23 04:43:39 UTC 2024
> -----Original Message-----
> From: Srinivas, Vidya
> Sent: 23 November 2024 09:39
> To: Ville Syrjälä <ville.syrjala at linux.intel.com>
> Cc: intel-gfx at lists.freedesktop.org; Syrjala, Ville <ville.syrjala at intel.com>;
> bgeffon at google.com; Lee, Shawn C <shawn.c.lee at intel.com>
> Subject: RE: [PATCH] drm/i915/dpt: Restrict shrinker to DPT objects not
> mapped
>
>
>
> > -----Original Message-----
> > From: Ville Syrjälä <ville.syrjala at linux.intel.com>
> > Sent: 23 November 2024 04:18
> > To: Srinivas, Vidya <vidya.srinivas at intel.com>
> > Cc: intel-gfx at lists.freedesktop.org; Syrjala, Ville
> > <ville.syrjala at intel.com>; bgeffon at google.com; Lee, Shawn C
> > <shawn.c.lee at intel.com>
> > Subject: Re: [PATCH] drm/i915/dpt: Restrict shrinker to DPT objects
> > not mapped
> >
> > On Fri, Nov 22, 2024 at 04:37:41PM +0530, Vidya Srinivas wrote:
> > > Restricting all DPT objects as unshrinkable was causing some
> > > chromebooks to run out of memory causing DMA remap failures. Thanks
> > > to Brian Geffon for the pointers on debug and suggesting usage of
> > > !obj->mm.mapping
> > >
> > > Fixes: 43e2b37e2ab6 ("drm/i915/dpt: Make DPT object unshrinkable")
> > >
> > > Credits-to: Brian Geffon <bgeffon at google.com>
> > > Suggested-by: Ville Syrj_l_ <ville.syrjala at linux.intel.com>
> > > Signed-off-by: Vidya Srinivas <vidya.srinivas at intel.com>
> > > ---
> > > drivers/gpu/drm/i915/gem/i915_gem_object.h | 2 +-
> > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h
> > > b/drivers/gpu/drm/i915/gem/i915_gem_object.h
> > > index 3dc61cbd2e11..b155f0139d4e 100644
> > > --- a/drivers/gpu/drm/i915/gem/i915_gem_object.h
> > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h
> > > @@ -285,7 +285,7 @@ i915_gem_object_is_shrinkable(const struct
> > > drm_i915_gem_object *obj) {
> > > /* TODO: make DPT shrinkable when it has no bound vmas */
> > > return i915_gem_object_type_has(obj,
> > I915_GEM_OBJECT_IS_SHRINKABLE) &&
> > > - !obj->is_dpt;
> > > + !(obj->is_dpt && obj->mm.mapping);
> >
> > AFAICS the mapping could only become NULL if the DPT got evicted from
> > the GGTT, but if that can happen then I think the current code must
> > still be busted wrt. suspend/resume even with the unshrinkable DPT obj.
> >
> > Looking at the vma suspend stuff I think the only way something could
> > still be bound in the DPT during resume is if it was pinned during
> > suspend. But we should be unpinning all FBs during suspend, so that's
> > a bit weird in itself. Hmm, we do the unpinning from the cleanup_work
> > so maybe that's still pending when we suspend and thus something is
> > still pinned in the DPT. And indeed there is no flush_work()/etc. for
> > that. So perhaps if we add that we could just revert the unshrinkable DPT
> patch.
> >
> Hello Ville,
> Thank you so much. Yes, Google team also insist on that as they worry the DPT
> patch might just be a workaround for the real memory corruption.
Sorry, just to be clear. With the DPT patch (43e2b37e2ab6 "drm/i915/dpt: Make DPT object unshrinkable"),
Maybe its not that mapping became NULL. But since all DPT objs are unshrinkable, they are hitting low memory
which is probably causing DMA remap failure when memory is trying to be cleared.
"i915 0000:00:02.0: Failed to DMA remap 147457 pages"
BUG: Bad page state in process chrome pfn:14200
page dumped because: non-NULL mapping
So their suggestion initially was to try and use !obj->is_dpt || !obj->mm.mapping as well so that
shrinker could clear some more.
However as you mentioned, if we could do something and revert the DPT patch entirely, it would
be really very helpful
>
> > Did we have a good way to reproduce the resume explosion? If not, I
> > suppose we could try to confirm the lack of flush_work() as the
> > culprit by simply sticking some kind of sleep() into the cleanup
> > function to make sure it doesn't get a chance to run during suspend.
> We can reproduce the original issue where during the suspend/resume, the
> meteorlake chromebook goes for a reboot and we have the crash captured
> from console-ramoops.
For this, I meant we have a reproduction WITHOUT the DPT patch 43e2b37e2ab6.
But, we don't have reproduction at Intel for the "DMA remap failed case"while having the
unshrinkable DPT 43e2b37e2ab6 "drm/i915/dpt: Make DPT object unshrinkable"
So, we could try reproducing the original issue and try with the pointers you have provided.
>
> >
> > I also had this:
> > https://patchwork.freedesktop.org/patch/503398/?series=108669&rev=1
> > which would be good to have so that we can actully do the obvious
> > flush_work(cleanup_work) instead of playing confusing tricks with the
> > the commit_work.
> >
> Thank you so much. Will try this patch as well and update.
>
> Regards
> Vidya
>
> > > }
> > >
> > > static inline bool
> > > --
> > > 2.34.1
> >
> > --
> > Ville Syrjälä
> > Intel
More information about the Intel-gfx
mailing list