[Bug 105888] [CNL] System hang when running piglit.

Wed Apr 4 21:45:44 UTC 2018

https://bugs.freedesktop.org/show_bug.cgi?id=105888

--- Comment #2 from Chris Wilson <chris at chris-wilson.co.uk> ---
Too bad, looks like the GEM_TRACE might have the clue as to what happened.

The complaint is that the CSB value read back from the hw doesn't match the tag
we programmed into the ELSP. It might be a garbage value, or there might be
extra bits in the dword we need to mask. The main worry is that the read is
garbage, and it disappearing with GEM_TRACE suggests timing (the GEM_TRACE here
will also do extra mmio read to cross the HWSP against the register block).

DMAR / Vt'd is not mentioned in the log, I presume you have it disabled? iommu
is always a worry wrt random memory latencices and order of operations.

You can force us to use the older mmio only path with

diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c
b/drivers/gpu/drm/i915/intel_engine_cs.c
index 12486d8f534b..78e15e1805f4 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -463,6 +463,8 @@ static void intel_engine_init_batch_pool(struct
intel_engine_cs *engine)

 static bool csb_force_mmio(struct drm_i915_private *i915)
 {
+       return true;
+
        /*
         * IOMMU adds unpredictable latency causing the CSB write (from the
         * GPU into the HWSP) to only be visible some time after the interrupt

which should be less susceptible to timing. See if the bug goes away with that
may be helpful.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are on the CC list for the bug.
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/intel-gfx-bugs/attachments/20180404/e0d77508/attachment.html>