[PATCH] drm/i915/dpt: Restrict shrinker to DPT objects not mapped

Ville Syrjälä ville.syrjala at linux.intel.com
Fri Nov 22 22:47:58 UTC 2024


On Fri, Nov 22, 2024 at 04:37:41PM +0530, Vidya Srinivas wrote:
> Restricting all DPT objects as unshrinkable was causing
> some chromebooks to run out of memory causing
> DMA remap failures. Thanks to Brian Geffon for the
> pointers on debug and suggesting usage of !obj->mm.mapping
> 
> Fixes: 43e2b37e2ab6 ("drm/i915/dpt: Make DPT object unshrinkable")
> 
> Credits-to: Brian Geffon <bgeffon at google.com>
> Suggested-by: Ville Syrj_l_ <ville.syrjala at linux.intel.com>
> Signed-off-by: Vidya Srinivas <vidya.srinivas at intel.com>
> ---
>  drivers/gpu/drm/i915/gem/i915_gem_object.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h b/drivers/gpu/drm/i915/gem/i915_gem_object.h
> index 3dc61cbd2e11..b155f0139d4e 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.h
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h
> @@ -285,7 +285,7 @@ i915_gem_object_is_shrinkable(const struct drm_i915_gem_object *obj)
>  {
>  	/* TODO: make DPT shrinkable when it has no bound vmas */
>  	return i915_gem_object_type_has(obj, I915_GEM_OBJECT_IS_SHRINKABLE) &&
> -		!obj->is_dpt;
> +		!(obj->is_dpt && obj->mm.mapping);

AFAICS the mapping could only become NULL if the DPT got
evicted from the GGTT, but if that can happen then I think
the current code must still be busted wrt. suspend/resume
even with the unshrinkable DPT obj.

Looking at the vma suspend stuff I think the only way something
could still be bound in the DPT during resume is if it was pinned
during suspend. But we should be unpinning all FBs during suspend,
so that's a bit weird in itself. Hmm, we do the unpinning from
the cleanup_work so maybe that's still pending when we suspend 
and thus something is still pinned in the DPT. And indeed
there is no flush_work()/etc. for that. So perhaps if we add
that we could just revert the unshrinkable DPT patch.

Did we have a good way to reproduce the resume explosion? If not,
I suppose we could try to confirm the lack of flush_work() as the
culprit by simply sticking some kind of sleep() into the cleanup
function to make sure it doesn't get a chance to run during
suspend.

I also had this:
https://patchwork.freedesktop.org/patch/503398/?series=108669&rev=1
which would be good to have so that we can actully do the
obvious flush_work(cleanup_work) instead of playing confusing
tricks with the the commit_work.

>  }
>  
>  static inline bool
> -- 
> 2.34.1

-- 
Ville Syrjälä
Intel


More information about the Intel-gfx mailing list