[Nouveau] [PATCH/TESTING(all hw)/DISCUSSION] FIFO (minor) create and (major) destroy instabilities on nv50+

Ben Skeggs skeggsb at gmail.com
Mon Jan 4 19:20:41 PST 2010


On Mon, 2010-01-04 at 23:54 +0100, Maarten Maathuis wrote:
> I forgot to mention that you should run nop from fbcon without X
> running for reliable lockups.
Yup, that's what I've been doing.

> 
> On Mon, Jan 4, 2010 at 11:39 PM, Ben Skeggs <skeggsb at gmail.com> wrote:
> > On Mon, 2010-01-04 at 20:29 +0100, Maarten Maathuis wrote:
> >> I've narrowed it down further, the "pgraph->fifo_access" bit is still
> >> cleanup (register 0x400500 represents pgraph fifo access), the rest
> >> appears needed for the desired effect. The reordering of pfifo and
> >> pgraph destroy is needed. As usual, feedback is appreciated.
> > I played a bit yesterday and have the gr/fifoctx unload ordering swap
> > and queued up already, as well as unconditionally waiting on a fence at
> > channel destroy (not really needed, but served as a bit of a cleanup
> > anyway).
> >
> > I'll try and look at the rest of the changes.
> >
Mmm OK.  The gr/fifoctx swap appears to just achieve a little extra
delay before we hit the grctx unload, some of the other changes (the
PGRAPH stuff in fifo channel disable specifically) work around the
changed ordering.

For an identical effect, add a nice mdelay(50) right before the
pgraph->fifo_access(dev, false) in nouveau_channel_free()..  We have a
race.

Ben.
> > Ben.
> >>
> >> Maarten.
> >>
> >> On Sat, Jan 2, 2010 at 4:36 PM, Maarten Maathuis <madman2003 at gmail.com> wrote:
> >> > Many people using nv50+ hardware are aware of gpu lockups when a fifo
> >> > closes under certain conditions. Based on a mmio-trace and some trail
> >> > and error testing i've come up with a patch that improves the
> >> > situation on my NV96.
> >> >
> >> > This patch needs testing on NV50+ hardware and regression testing on
> >> > older hardware, since i did change some of the common codepaths. This
> >> > is very much a work in progress, and if you have anything to
> >> > add/correct, please share it.
> >> >
> >> > I've also attached a 2 test apps, once is bitscan-fail from mwk, use
> >> > it like ./bitscan-fail 0x200 to trigger PGRAPH errors. A modified
> >> > version only emits NOPs (method 0x100) and represents the no error
> >> > situation.
> >> >
> >> > For me, i can run the NOP program in loops of 10000 iterations with no
> >> > problems (i've done so several times), the bitscan-fail survives 10000
> >> > iterations sometimes, but can also fail after a few thousand. In
> >> > comparison, a single run of bitscan-fail could cause a gpu lockup for
> >> > me in the past.
> >> >
> >> > Please try the gallium driver, the test apps, suspend to ram. Suspend
> >> > to ram isn't 100% reliable yet for me (this was always the case after
> >> > strange experiments/hammering/etc), but should not regress. This goes
> >> > for older hw as well, whatever worked should still work, but i
> >> > wouldn't expect serious improvements there.
> >> >
> >> > As always, feedback is appreciated, especially since this is a touchy subject.
> >> >
> >> > Maarten.
> >> >
> >> _______________________________________________
> >> Nouveau mailing list
> >> Nouveau at lists.freedesktop.org
> >> http://lists.freedesktop.org/mailman/listinfo/nouveau
> >
> >
> >




More information about the Nouveau mailing list