[Intel-gfx] [PATCH 1/2] drm/i915: fix up semaphore_waits_for

Mon Mar 24 23:37:12 CET 2014

On Sat, Mar 22, 2014 at 06:52:25PM +0100, Daniel Vetter wrote:
> On Fri, Mar 21, 2014 at 07:33:59PM +0200, Mika Kuoppala wrote:
> > Hi,
> > 
> > Daniel Vetter <daniel.vetter at ffwll.ch> writes:
> > 
> > > There's an entire pile of issues in here:
> > >
> > > - Use the main RING_HEAD register, not ACTHD. ACTHD points at the gtt
> > >   offset of the batch buffer when a batch is executed. Semaphores are
> > >   always emitted to the main ring, so we always want to look at that.
> > 
> > The ipehr check should make sure that we are at the ring and
> > acthd should not be at batch.
> > 
> > Regardless I agree that RING_HEAD is much more safer. When I have
> > tried to get bottom of the snb reset hang, I have noticed that
> > after reset the acthd is sometimes 0x0c even tho head is 0x00,
> > on snb.
> 
> Hm, should we maybe mask out the lower bits, too? If you can confirm this,
> can you please add a follow-up patch?
> 
> > > - Mask the obtained HEAD pointer with the actual ring size, which is
> > >   much smaller. Together with the above issue this resulted us in
> > >   trying to dereference a pointer way outside of the ring mmio
> > >   mapping. The resulting invalid access in interrupt context
> > >   (hangcheck is executed from timers) lead to a full blown kernel
> > >   panic. The fbcon panic handler then tried to frob our driver harder,
> > >   resulting in a full machine hang at least on my snb here where I've
> > >   stumbled over this.
> > >
> > > - Handle ring wrapping correctly and be a bit more explicit about how
> > >   many dwords we're scanning. We probably should also scan more than
> > >   just 4 ...
> > 
> > ipehr dont update on MI_NOOPS and the head should point to
> > the extra MI_NOOP after semaphore sequence. So 4 should be
> > enough. Perhaps revisit this when bdw gains semaphores.
> 
> Yeah, Chris also mentioned that the HEAD should point at one dword past
> the end of the ring, so should even work when there are no MI_NOOPs. In
> any case this code is more robust in case hw engineers suddenly change the
> behaviour.
> 
> > > - Space out some of teh computations for readability.
> > >
> > > This prevents hard-hangs on my snb here. Verdict from QA is still
> > > pending, but the symptoms match.
> > >
> > > Cc: Mika Kuoppala <mika.kuoppala at intel.com>
> > > Cc: Ben Widawsky <ben at bwidawsk.net>
> > > Cc: Chris Wilson <chris at chris-wilson.co.uk>
> > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=74100
> > 
> > The hard hangs are not so frequent with this patch but
> > they are still there. This patch should take care of
> > the oops we have been seeing, but there is still
> > something else to be found until #74100 can be marked as
> > fixed.
> > 
> > Very often after reset, when igt has pushed the quietence
> > batches into rings, blitter and video ring heads gets
> > moved properly but all updates to hardware status page and to
> > the sync registers are missing. And result is hang by hangcheck.
> > Repeat enough times and it is a hard hang.
> > 
> > > Signed-off-by: Daniel Vetter <daniel.vetter at ffwll.ch>
> > 
> > Please remove the blowup comment and also update the
> > bugzilla tag. 
> 
> Yeah, QA also says that it doesn't fix the hard hangs, only seems to
> reduce them a bit on certain setups. I've updated the commit message.
> 
> btw are you testing with FBDEV=n? The lack of a fbcon panic handler should
> greatly increase the chances that the last few message get correctly
> transmitted through other means like netconsole.
> 
> > Patches 1-2 and the followup one are
> > 
> > Reviewed-by: Mika Kuoppala <mika.kuoppala at intel.com>
> 
> Thanks for the review, patches merged.
> -Daniel

Patch 2 was trivial to implement for gen8. This functionality is a lot
less trivial. I volunteered to do patch 2, are you going to port this to
gen8?

-- 
Ben Widawsky, Intel Open Source Technology Center