[Nouveau] [PATCH/TESTING(all hw)/DISCUSSION] FIFO (minor) create and (major) destroy instabilities on nv50+

Maarten Maathuis madman2003 at gmail.com
Tue Jan 5 13:19:31 PST 2010


On Tue, Jan 5, 2010 at 9:41 AM, Maarten Maathuis <madman2003 at gmail.com> wrote:
> On Tue, Jan 5, 2010 at 4:20 AM, Ben Skeggs <skeggsb at gmail.com> wrote:
>> On Mon, 2010-01-04 at 23:54 +0100, Maarten Maathuis wrote:
>>> I forgot to mention that you should run nop from fbcon without X
>>> running for reliable lockups.
>> Yup, that's what I've been doing.
>>
>>>
>>> On Mon, Jan 4, 2010 at 11:39 PM, Ben Skeggs <skeggsb at gmail.com> wrote:
>>> > On Mon, 2010-01-04 at 20:29 +0100, Maarten Maathuis wrote:
>>> >> I've narrowed it down further, the "pgraph->fifo_access" bit is still
>>> >> cleanup (register 0x400500 represents pgraph fifo access), the rest
>>> >> appears needed for the desired effect. The reordering of pfifo and
>>> >> pgraph destroy is needed. As usual, feedback is appreciated.
>>> > I played a bit yesterday and have the gr/fifoctx unload ordering swap
>>> > and queued up already, as well as unconditionally waiting on a fence at
>>> > channel destroy (not really needed, but served as a bit of a cleanup
>>> > anyway).
>>> >
>>> > I'll try and look at the rest of the changes.
>>> >
>> Mmm OK.  The gr/fifoctx swap appears to just achieve a little extra
>> delay before we hit the grctx unload, some of the other changes (the
>> PGRAPH stuff in fifo channel disable specifically) work around the
>> changed ordering.
>>
>> For an identical effect, add a nice mdelay(50) right before the
>> pgraph->fifo_access(dev, false) in nouveau_channel_free()..  We have a
>> race.
>
> So what do you propose as the preferred solution?
>
>>
>> Ben.
>>> > Ben.
>>> >>
>>> >> Maarten.
>>> >>
>>> >> On Sat, Jan 2, 2010 at 4:36 PM, Maarten Maathuis <madman2003 at gmail.com> wrote:
>>> >> > Many people using nv50+ hardware are aware of gpu lockups when a fifo
>>> >> > closes under certain conditions. Based on a mmio-trace and some trail
>>> >> > and error testing i've come up with a patch that improves the
>>> >> > situation on my NV96.
>>> >> >
>>> >> > This patch needs testing on NV50+ hardware and regression testing on
>>> >> > older hardware, since i did change some of the common codepaths. This
>>> >> > is very much a work in progress, and if you have anything to
>>> >> > add/correct, please share it.
>>> >> >
>>> >> > I've also attached a 2 test apps, once is bitscan-fail from mwk, use
>>> >> > it like ./bitscan-fail 0x200 to trigger PGRAPH errors. A modified
>>> >> > version only emits NOPs (method 0x100) and represents the no error
>>> >> > situation.
>>> >> >
>>> >> > For me, i can run the NOP program in loops of 10000 iterations with no
>>> >> > problems (i've done so several times), the bitscan-fail survives 10000
>>> >> > iterations sometimes, but can also fail after a few thousand. In
>>> >> > comparison, a single run of bitscan-fail could cause a gpu lockup for
>>> >> > me in the past.
>>> >> >
>>> >> > Please try the gallium driver, the test apps, suspend to ram. Suspend
>>> >> > to ram isn't 100% reliable yet for me (this was always the case after
>>> >> > strange experiments/hammering/etc), but should not regress. This goes
>>> >> > for older hw as well, whatever worked should still work, but i
>>> >> > wouldn't expect serious improvements there.
>>> >> >
>>> >> > As always, feedback is appreciated, especially since this is a touchy subject.
>>> >> >
>>> >> > Maarten.
>>> >> >
>>> >> _______________________________________________
>>> >> Nouveau mailing list
>>> >> Nouveau at lists.freedesktop.org
>>> >> http://lists.freedesktop.org/mailman/listinfo/nouveau
>>> >
>>> >
>>> >
>>
>>
>>
>

I've isolated a small part of a mmiotrace, which is one of the few
cases where bit28 of 0x40032c is unset. The end is most interesting,
the beginning is just to be sure everything is there. Maybe it helps.

W 4 543.049438 3 0xc6100c80 0x50001 0x0 0
R 4 543.049496 3 0xc6100c80 0x50000 0x0 0
R 4 543.049548 3 0xc6400500 0x10010001 0x0 0
R 4 543.049596 3 0xc6400500 0x10010001 0x0 0
W 4 543.049644 3 0xc6400500 0x10010000 0x0 0
R 4 543.049693 3 0xc6400700 0x0 0x0 0
R 4 543.049741 3 0xc6400380 0x0 0x0 0
R 4 543.049797 3 0xc6400384 0x0 0x0 0
R 4 543.049845 3 0xc6400388 0x0 0x0 0
W 4 543.049900 3 0xc6100c80 0x1 0x0 0
R 4 543.049958 3 0xc6100c80 0x0 0x0 0
W 4 543.050009 3 0xc6400500 0x10010001 0x0 0
W 4 543.050150 10 0xc41f04c8 0x1 0x0 0
W 4 543.050175 10 0xc41f04cc 0x4 0x0 0
W 4 543.050282 3 0xc6070000 0x1 0x0 0
R 4 543.050358 3 0xc6070000 0x0 0x0 0
R 4 543.050418 3 0xc661002c 0x370 0x0 0
R 4 543.050462 3 0xc661002c 0x370 0x0 0
W 4 543.050588 10 0xc41f0440 0x1 0x0 0
W 4 543.050614 10 0xc41f0444 0x4 0x0 0
W 4 543.050719 3 0xc6070000 0x1 0x0 0
R 4 543.050793 3 0xc6070000 0x0 0x0 0
W 4 543.050896 10 0xc41f03c0 0x1 0x0 0
W 4 543.050922 10 0xc41f03c4 0x4 0x0 0
W 4 543.051028 3 0xc6070000 0x1 0x0 0
R 4 543.051101 3 0xc6070000 0x0 0x0 0
W 4 543.051227 10 0xc41f05e0 0x1 0x0 0
W 4 543.051253 10 0xc41f05e4 0x4 0x0 0
W 4 543.051360 3 0xc6070000 0x1 0x0 0
R 4 543.051434 3 0xc6070000 0x0 0x0 0
W 4 543.051529 10 0xc41f0200 0x1 0x0 0
W 4 543.051554 10 0xc41f0204 0x4 0x0 0
W 4 543.051659 3 0xc6070000 0x1 0x0 0
R 4 543.051732 3 0xc6070000 0x0 0x0 0
W 4 543.051784 10 0xc439e000 0x7e 0x0 0
W 4 543.051807 10 0xc439e004 0x7e 0x0 0
W 4 543.051829 10 0xc439e008 0x1 0x0 0
W 4 543.051851 10 0xc439e00c 0x2 0x0 0
W 4 543.051926 3 0xc6070000 0x1 0x0 0
R 4 543.051999 3 0xc6070000 0x0 0x0 0
W 4 543.052158 3 0xc60032f4 0x1ff64 0x0 0
W 4 543.052228 3 0xc60032ec 0x4 0x0 0
R 4 543.052296 3 0xc60032ec 0x4 0x0 0
R 4 543.052377 3 0xc6002504 0x0 0x0 0
W 4 543.052451 3 0xc6002504 0x1 0x0 0
R 4 543.052745 3 0xc6000100 0x0 0x0 0
R 4 543.052849 3 0xc6002080 0x0 0x0 0
R 4 543.053007 3 0xc6003220 0xd06191 0x0 0
R 4 543.053075 3 0xc6003250 0x90000001 0x0 0
R 4 543.053154 3 0xc6002504 0x11 0x0 0
R 4 543.053226 3 0xc6002508 0x340 0x0 0
R 4 543.053295 3 0xc6003220 0xd06191 0x0 0
R 4 543.053365 3 0xc6003250 0x90000001 0x0 0
R 4 543.053444 3 0xc6000200 0xdff3d113 0x0 0
R 4 543.053516 3 0xc600251c 0x3f 0x0 0
R 4 543.053581 3 0xc640032c 0x8001fd9a 0x0 0
R 4 543.053630 3 0xc640032c 0x8001fd9a 0x0 0
W 4 543.053678 3 0xc640032c 0x1fd9a 0x0 0
R 4 543.053753 3 0xc60032f0 0x3 0x0 0
W 4 543.053843 3 0xc60032f0 0x7f 0x0 0
R 4 543.053921 3 0xc6003220 0xd06191 0x0 0
W 4 543.053990 3 0xc6003220 0xd06191 0x0 0
R 4 543.054054 3 0xc6002504 0x11 0x0 0
W 4 543.054123 3 0xc6002504 0x10 0x0 0
R 4 543.054195 3 0xc600260c 0x801fd99f 0x0 0
W 4 543.054268 3 0xc600260c 0x1ff68 0x0 0
W 4 543.054371 10 0xc43cdd10 0x0 0x0 0
W 4 543.054393 10 0xc43cdd14 0x0 0x0 0
W 4 543.054415 10 0xc43cdd18 0x0 0x0 0
W 4 543.054437 10 0xc43cdd1c 0x0 0x0 0
W 4 543.054460 10 0xc43cdd20 0x0 0x0 0
W 4 543.054482 10 0xc43cdd24 0x0 0x0 0
W 4 543.054504 10 0xc43cdd28 0x0 0x0 0
W 4 543.054526 10 0xc43cdd2c 0x0 0x0 0
W 4 543.054549 10 0xc43cdd30 0x0 0x0 0
W 4 543.054571 10 0xc43cdd34 0x0 0x0 0
W 4 543.054593 10 0xc43cdd38 0x0 0x0 0
W 4 543.054616 10 0xc43cdd3c 0x0 0x0 0
W 4 543.054638 10 0xc43cdd40 0x0 0x0 0
W 4 543.054660 10 0xc43cdd44 0x0 0x0 0
W 4 543.054823 3 0xc6070000 0x1 0x0 0
R 4 543.054921 3 0xc6070000 0x0 0x0 0


More information about the Nouveau mailing list