[Intel-gfx] [PATCH i-g-t] lib/igt_gt.c : allow changes to stop_rings mode bits

Gore, Tim tim.gore at intel.com
Tue Jul 14 01:55:27 PDT 2015



Tim GoreĀ 
Intel Corporation (UK) Ltd. - Co. Reg. #1134945 - Pipers Way, Swindon SN3 1RJ

> -----Original Message-----
> From: Daniel Vetter [mailto:daniel.vetter at ffwll.ch] On Behalf Of Daniel
> Vetter
> Sent: Tuesday, July 14, 2015 9:09 AM
> To: Gore, Tim
> Cc: Daniel Vetter; intel-gfx at lists.freedesktop.org; Wood, Thomas
> Subject: Re: [Intel-gfx] [PATCH i-g-t] lib/igt_gt.c : allow changes to stop_rings
> mode bits
> 
> On Mon, Jul 13, 2015 at 04:07:14PM +0000, Gore, Tim wrote:
> >
> >
> > Tim Gore
> > Intel Corporation (UK) Ltd. - Co. Reg. #1134945 - Pipers Way, Swindon
> > SN3 1RJ
> >
> >
> > > -----Original Message-----
> > > From: Daniel Vetter [mailto:daniel.vetter at ffwll.ch] On Behalf Of
> > > Daniel Vetter
> > > Sent: Monday, July 13, 2015 3:59 PM
> > > To: Gore, Tim
> > > Cc: Daniel Vetter; intel-gfx at lists.freedesktop.org; Wood, Thomas
> > > Subject: Re: [Intel-gfx] [PATCH i-g-t] lib/igt_gt.c : allow changes
> > > to stop_rings mode bits
> > >
> > > On Mon, Jul 13, 2015 at 09:43:11AM +0000, Gore, Tim wrote:
> > > >
> > > >
> > > > Tim Gore
> > > > Intel Corporation (UK) Ltd. - Co. Reg. #1134945 - Pipers Way,
> > > > Swindon
> > > > SN3 1RJ
> > > >
> > > > > -----Original Message-----
> > > > > From: Daniel Vetter [mailto:daniel.vetter at ffwll.ch] On Behalf Of
> > > > > Daniel Vetter
> > > > > Sent: Monday, July 13, 2015 10:30 AM
> > > > > To: Gore, Tim
> > > > > Cc: intel-gfx at lists.freedesktop.org; Wood, Thomas
> > > > > Subject: Re: [Intel-gfx] [PATCH i-g-t] lib/igt_gt.c : allow
> > > > > changes to stop_rings mode bits
> > > > >
> > > > > On Fri, Jul 10, 2015 at 02:06:06PM +0100, tim.gore at intel.com wrote:
> > > > > > From: Tim Gore <tim.gore at intel.com>
> > > > > >
> > > > > > In function igt_set_stop_rings, the test
> > > > > >   igt_assert_f(flags == 0 || current == 0, ..
> > > > > >
> > > > > > will fail if we are trying to force a hang but the
> > > > > > STOP_RINGS_ALLOW_BAN or STOP_RINGS_ALLOW_ERROR bit is
> set.
> > > > > > With the introduction of per ring resets in the driver (in
> > > > > > android) these bits do not get cleared to zero when an
> > > > > > individual ring is reset. This causes subsequent attempt to
> > > > > > cause a ring hang via this function to fail, leading to
> > > > > > several igt tests failing (ie gem_reset_stats subtest ban-xxx etc).
> > > > >
> > > > > Fix tdr to reset these instead?
> > > > > -Daniel
> > > > >
> > > > I could change tdr, but why. When the TDR handles a ring hang and
> > > > resets the ring, why would it modify the flag that defines if the
> > > > driver should ban a frequently hanging context? If we get rid of
> > > > the stop_rings interface, as Chris Wilson suggested, we would
> > > > still need to keep the STOP_RING_ALLOW_BAN/ALLOW_ERRORS bits in
> > > > debugfs, but
> > > you
> > > > would not expect to have to re-write these bits each time there is
> > > > a ring
> > > reset.
> > >
> > > The fix current hang recover code to no reset this, add some grace
> > > period, then push this patch to igt. We don't have full-blown abi
> > > guarantees for debugfs/igt stuff, but I want at least a few months
> > > (really last released
> > > kernel&igt) of backwards/forward compatibility. And inconsistent
> > > behaviour isn't great imo.
> > > -Daniel
> >
> > Sorry Daniel, I didn't really follow that.
> > I didn't want a gpu reset to clear the ALLOW_BAN bit, since this will
> > defeat the point of this bit, except perhaps in test situations where
> > you can keep setting it each time you deliberately cause a hang. It
> > seems like the ALLOW_BAN bit has uses in real world situations, although I
> don't know it anyone uses it.
> 
> ALLOW_BAN is only for testing. It's meant to re-enable auto-banning
> because we disable that by default for automated tests. This is all meant to
> be used together with the stop_rings stuff only.
> 
> > Would you be more comfortable with
> > Igt_assert_f ( ( flags ==0 ) ||
> >      (( current & STOP_RING_ALL) ==0)  && ((current ^ flags) & ~
> > STOP_RING_ALL == 0 ) )
> >
> > So the either the new flags must be 0 (currently allowed) or the
> > existing flags must indicate that  all hangs are cleared (0 except
> > possibly the mode bits) AND the mode bits you are writing are the same as
> the current values. ??
> 
> Iirc gem_reset_stats uses stop_rings only to supress gpu reset warnings in
> dmesg (since they're expected) and not for the actual stop_rings logic. It sets
> stop_rings after submitting the hanging batch, but before that one is
> detected as hung. We could just add another bit to stop_rings for that case.
> 
> What I still don't understand is why tdr can't just keep on properly resetting
> stop_rings. It'll break tons of existing tests, and I don't understand the
> upside. stop_rings has become quasi-abi (that's why the ALLOW bits have
> such funky semantics, it's for backwards compat), if you need to change it
> you need to extend it, not break it.
> -Daniel
> --

OK, I hadn't spotted that it was a debug only flag. In that case it is just up to the
tests to keep setting it. It only seems to be used in gem_reset_stats anyway. So
I'll change TDR to clear ALLOW_BAN/WARN when stop_bits gets cleared to 0.

  Tim

> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch


More information about the Intel-gfx mailing list