[Intel-gfx] [PATCH 1/2] drm/i915/gem: Replace reloc chain with terminator on error unwind

Thu Aug 20 07:36:11 UTC 2020

Quoting Pavel Machek (2020-08-19 20:33:26)
> Hi!
> 
> > > > If we hit an error during construction of the reloc chain, we need to
> > > > replace the chain into the next batch with the terminator so that upon
> > > > flushing the relocations so far, we do not execute a hanging batch.
> > > 
> > > Thanks for the patches. I assume this should fix problem from
> > > "5.9-rc1: graphics regression moved from -next to mainline" thread.
> > > 
> > > I have applied them over current -next, and my machine seems to be
> > > working so far (but uptime is less than 30 minutes).
> > > 
> > > If the machine still works tommorow, I'll assume problem is solved.
> > 
> > Aye, best wait until we have to start competing with Chromium for
> > memory... The suspicion is that it was the resource allocation failure
> > path.
> 
> Yep, my machines are low on memory.
> 
> But ... test did not work that well. I have dead X and blinking
> screen. Machine still works reasonably well over ssh, so I guess
> that's an improvement.

Well my last remaining 32bit gen3 device is currently pushing up the
daises, so could you try removing the attempt to use WC? Something like

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index 44df98d85b38..b26f7de913c3 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -955,10 +955,7 @@ static u32 *__reloc_gpu_map(struct reloc_cache *cache,
 {
        u32 *map;

-       map = i915_gem_object_pin_map(pool->obj,
-                                     cache->has_llc ?
-                                     I915_MAP_FORCE_WB :
-                                     I915_MAP_FORCE_WC);
+       map = i915_gem_object_pin_map(pool->obj, I915_MAP_FORCE_WB);

on top of the previous patch. Faultinjection didn't turn up anything in
eb_relocate_vma, so we need to dig deeper.
-Chris