[Intel-gfx] [PATCH 6/9] drm/i915: driver based PASID handling

Daniel Vetter daniel at ffwll.ch
Fri Oct 9 00:28:37 PDT 2015

On Thu, Oct 08, 2015 at 11:46:08PM +0100, David Woodhouse wrote:
> On Thu, 2015-10-08 at 12:29 +0100, Tomas Elf wrote:
> > 
> > Could someone clarify what this means from the TDR point of view, 
> > please? When you say "context blew up" I'm guessing that you mean that 
> > come context caused the fault handler to get involved somehow?
> > 
> > Does this imply that the offending context will hang and the driver will 
> > have to detect this hang? If so, then yes - if we have the per-engine 
> > hang recovery mode as part of the upcoming TDR work in place then we 
> > could handle it by stepping over the offending batch buffer and moving 
> > on with a minimum of side-effects on the rest of the driver/GPU.
> I don't think the context does hang.
> I've made the page-request code artificially fail and report that it
> was an invalid page fault. The gem_svm_fault test seems to complete
> (albeit complaining that the test failed). Whereas if I just don't
> service the page-request at all, *then* the GPU hang is detected.
> I haven't actually looked at precisely what *is* happening.

Hm if this still works the same way as on older platforms then pagefaults
just read all 0 and writes go nowhere from the gpu. That generally also
explains ever-increasing numbers of the CS execution pointer since it's
busy churning through 48b worth of address space filled with MI_NOP. I'd
have hoped our hw would do better than that with svm ...

If there's really no way to make it hang when we complete the fault then I
guess we'll have to hang it by not completing. Otherwise we'll have to
roll our own fault detection code right from the start.
Daniel Vetter
Software Engineer, Intel Corporation

More information about the Intel-gfx mailing list