[Intel-gfx] [PATCH 5/6] drm/i915/gt: Serialize GRDOM access between multiple engine resets

Wed Jun 29 16:02:59 UTC 2022

On 29/06/2022 16:30, Mauro Carvalho Chehab wrote:
> On Tue, 28 Jun 2022 16:49:23 +0100
> Tvrtko Ursulin <tvrtko.ursulin at linux.intel.com> wrote:
> 
>> .. which for me means a different patch 1, followed by patch 6 (moved
>> to be patch 2) would be ideal stable material.
>>
>> Then we have the current patch 2 which is open/unknown (to me at least).
>>
>> And the rest seem like optimisations which shouldn't be tagged as fixes.
>>
>> Apart from patch 5 which should be cc: stable, but no fixes as agreed.
>>
>> Could you please double check if what I am suggesting here is feasible
>> to implement and if it is just send those minimal patches out alone?
> 
> Tested and porting just those 3 patches are enough to fix the Broadwell
> bug.
> 
> So, I submitted a v2 of this series with just those. They all need to
> be backported to stable.

I would really like to give even a smaller fix a try. Something like, although not even compile tested:

commit 4d5e94aef164772f4d85b3b4c1a46eac9a2bd680
Author: Chris Wilson <chris.p.wilson at intel.com>
Date:   Wed Jun 29 16:25:24 2022 +0100

     drm/i915/gt: Serialize TLB invalidates with GT resets
     
     Avoid trying to invalidate the TLB in the middle of performing an
     engine reset, as this may result in the reset timing out. Currently,
     the TLB invalidate is only serialised by its own mutex, forgoing the
     uncore lock, but we can take the uncore->lock as well to serialise
     the mmio access, thereby serialising with the GDRST.
     
     Tested on a NUC5i7RYB, BIOS RYBDWi35.86A.0380.2019.0517.1530 with
     i915 selftest/hangcheck.
     
     Cc: stable at vger.kernel.org
     Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")
     Reported-by: Mauro Carvalho Chehab <mchehab at kernel.org>
     Tested-by: Mauro Carvalho Chehab <mchehab at kernel.org>
     Reviewed-by: Mauro Carvalho Chehab <mchehab at kernel.org>
     Signed-off-by: Chris Wilson <chris.p.wilson at intel.com>
     Cc: Tvrtko Ursulin <tvrtko.ursulin at linux.intel.com>
     Acked-by: Thomas Hellström <thomas.hellstrom at linux.intel.com>
     Reviewed-by: Andi Shyti <andi.shyti at intel.com>
     Signed-off-by: Mauro Carvalho Chehab <mchehab at kernel.org>
     Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>

diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
index 8da3314bb6bf..aaadd0b02043 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
@@ -952,7 +952,23 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
         mutex_lock(&gt->tlb_invalidate_lock);
         intel_uncore_forcewake_get(uncore, FORCEWAKE_ALL);
  
+       spin_lock_irq(&uncore->lock); /* serialise invalidate with GT reset */
+
+       for_each_engine(engine, gt, id) {
+               struct reg_and_bit rb;
+
+               rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num);
+               if (!i915_mmio_reg_offset(rb.reg))
+                       continue;
+
+               intel_uncore_write_fw(uncore, rb.reg, rb.bit);
+       }
+
+       spin_unlock_irq(&uncore->lock);
+
         for_each_engine(engine, gt, id) {
+               struct reg_and_bit rb;
+
                 /*
                  * HW architecture suggest typical invalidation time at 40us,
                  * with pessimistic cases up to 100us and a recommendation to
@@ -960,13 +976,11 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
                  */
                 const unsigned int timeout_us = 100;
                 const unsigned int timeout_ms = 4;
-               struct reg_and_bit rb;
  
                 rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num);
                 if (!i915_mmio_reg_offset(rb.reg))
                         continue;
  
-               intel_uncore_write_fw(uncore, rb.reg, rb.bit);
                 if (__intel_wait_for_register_fw(uncore,
                                                  rb.reg, rb.bit, 0,
                                                  timeout_us, timeout_ms,

If this works it would be least painful to backport. The other improvements can then be devoid of the fixes tag.

> I still think that other TLB patches are needed/desired upstream, but
> I'll submit them on a separate series. Let's fix the regression first ;-)

Yep, that's exactly right.

Regards,

Tvrtko