[Intel-gfx] "BUG: unable to handle kernel NULL pointer dereference at 0000000000000070" in [i915] reset_common_ring

Chris Wilson chris at chris-wilson.co.uk
Thu Oct 19 11:08:02 UTC 2017


Quoting Bjørn Mork (2017-10-19 11:55:18)
> Bjørn Mork <bjorn at mork.no> writes:
> > Chris Wilson <chris at chris-wilson.co.uk> writes:
> >
> >> and at a guess
> >> intel_iommu=igfx_off to avoid the hangs in the first place.
> >
> > Thanks for the tip. I'll try that
> 
> My memory is more than a bit flakey, but this did eventually ring a
> bell.,.  And googling I see that I have tried that tip before without
> success:
> 
>  https://bugs.freedesktop.org/show_bug.cgi?id=101288
> 
> 
> If you look at the logs attached to that bug, you'll see that the PC was
> running with "intel_iommu=igfx_off" at the time of the hang:
> 
>  [ 54047.182] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-4.9.0-3-amd64 root=UUID=71507198-90f4-4c25-be41-efc47d2dedd1 ro intel_iommu=igfx_off

IOMMU enabled?: 0
So no, not that; just a regular bug.

> I believe the GPU hang reported in that bug is the same?  The driver
> just did not Oops while trying to reset.

That oops should only happen when the context-switches are out of kilter
with the breadcrumbs (i.e. the context switched away before the request
was completed; that's something we try to catch during CI as the hw
behaving unexpectedly). iommu is easy to suspect as we know it can
introduce memory latencies that cause reordering of events, a nightmare.

We've also fixed a number of races around reset, but I do not recall if
they were before or after 4.9. Simplest way is grab drm-tip, apply
existing userspace (hoping it hangs) and checking it no longer oopses.
Reverse bisect to find the backport.
-Chris


More information about the Intel-gfx mailing list