[Nouveau] [PATCH/TESTING(all hw)/DISCUSSION] FIFO (minor) create and (major) destroy instabilities on nv50+

Maarten Maathuis madman2003 at gmail.com
Tue Jan 5 13:21:20 PST 2010


On Tue, Jan 5, 2010 at 10:19 PM, Maarten Maathuis <madman2003 at gmail.com> wrote:
> On Tue, Jan 5, 2010 at 9:41 AM, Maarten Maathuis <madman2003 at gmail.com> wrote:
>> On Tue, Jan 5, 2010 at 4:20 AM, Ben Skeggs <skeggsb at gmail.com> wrote:
>>> On Mon, 2010-01-04 at 23:54 +0100, Maarten Maathuis wrote:
>>>> I forgot to mention that you should run nop from fbcon without X
>>>> running for reliable lockups.
>>> Yup, that's what I've been doing.
>>>
>>>>
>>>> On Mon, Jan 4, 2010 at 11:39 PM, Ben Skeggs <skeggsb at gmail.com> wrote:
>>>> > On Mon, 2010-01-04 at 20:29 +0100, Maarten Maathuis wrote:
>>>> >> I've narrowed it down further, the "pgraph->fifo_access" bit is still
>>>> >> cleanup (register 0x400500 represents pgraph fifo access), the rest
>>>> >> appears needed for the desired effect. The reordering of pfifo and
>>>> >> pgraph destroy is needed. As usual, feedback is appreciated.
>>>> > I played a bit yesterday and have the gr/fifoctx unload ordering swap
>>>> > and queued up already, as well as unconditionally waiting on a fence at
>>>> > channel destroy (not really needed, but served as a bit of a cleanup
>>>> > anyway).
>>>> >
>>>> > I'll try and look at the rest of the changes.
>>>> >
>>> Mmm OK.  The gr/fifoctx swap appears to just achieve a little extra
>>> delay before we hit the grctx unload, some of the other changes (the
>>> PGRAPH stuff in fifo channel disable specifically) work around the
>>> changed ordering.
>>>
>>> For an identical effect, add a nice mdelay(50) right before the
>>> pgraph->fifo_access(dev, false) in nouveau_channel_free()..  We have a
>>> race.
>>
>> So what do you propose as the preferred solution?
>>
>>>
>>> Ben.
>>>> > Ben.
>>>> >>
>>>> >> Maarten.
>>>> >>
>>>> >> On Sat, Jan 2, 2010 at 4:36 PM, Maarten Maathuis <madman2003 at gmail.com> wrote:
>>>> >> > Many people using nv50+ hardware are aware of gpu lockups when a fifo
>>>> >> > closes under certain conditions. Based on a mmio-trace and some trail
>>>> >> > and error testing i've come up with a patch that improves the
>>>> >> > situation on my NV96.
>>>> >> >
>>>> >> > This patch needs testing on NV50+ hardware and regression testing on
>>>> >> > older hardware, since i did change some of the common codepaths. This
>>>> >> > is very much a work in progress, and if you have anything to
>>>> >> > add/correct, please share it.
>>>> >> >
>>>> >> > I've also attached a 2 test apps, once is bitscan-fail from mwk, use
>>>> >> > it like ./bitscan-fail 0x200 to trigger PGRAPH errors. A modified
>>>> >> > version only emits NOPs (method 0x100) and represents the no error
>>>> >> > situation.
>>>> >> >
>>>> >> > For me, i can run the NOP program in loops of 10000 iterations with no
>>>> >> > problems (i've done so several times), the bitscan-fail survives 10000
>>>> >> > iterations sometimes, but can also fail after a few thousand. In
>>>> >> > comparison, a single run of bitscan-fail could cause a gpu lockup for
>>>> >> > me in the past.
>>>> >> >
>>>> >> > Please try the gallium driver, the test apps, suspend to ram. Suspend
>>>> >> > to ram isn't 100% reliable yet for me (this was always the case after
>>>> >> > strange experiments/hammering/etc), but should not regress. This goes
>>>> >> > for older hw as well, whatever worked should still work, but i
>>>> >> > wouldn't expect serious improvements there.
>>>> >> >
>>>> >> > As always, feedback is appreciated, especially since this is a touchy subject.
>>>> >> >
>>>> >> > Maarten.
>>>> >> >
>>>> >> _______________________________________________
>>>> >> Nouveau mailing list
>>>> >> Nouveau at lists.freedesktop.org
>>>> >> http://lists.freedesktop.org/mailman/listinfo/nouveau
>>>> >
>>>> >
>>>> >
>>>
>>>
>>>
>>
>
> I've isolated a small part of a mmiotrace, which is one of the few
> cases where bit28 of 0x40032c is unset. The end is most interesting,
> the beginning is just to be sure everything is there. Maybe it helps.

I meant to say bit31.

>
> W 4 543.049438 3 0xc6100c80 0x50001 0x0 0
> R 4 543.049496 3 0xc6100c80 0x50000 0x0 0
> R 4 543.049548 3 0xc6400500 0x10010001 0x0 0
> R 4 543.049596 3 0xc6400500 0x10010001 0x0 0
> W 4 543.049644 3 0xc6400500 0x10010000 0x0 0
> R 4 543.049693 3 0xc6400700 0x0 0x0 0
> R 4 543.049741 3 0xc6400380 0x0 0x0 0
> R 4 543.049797 3 0xc6400384 0x0 0x0 0
> R 4 543.049845 3 0xc6400388 0x0 0x0 0
> W 4 543.049900 3 0xc6100c80 0x1 0x0 0
> R 4 543.049958 3 0xc6100c80 0x0 0x0 0
> W 4 543.050009 3 0xc6400500 0x10010001 0x0 0
> W 4 543.050150 10 0xc41f04c8 0x1 0x0 0
> W 4 543.050175 10 0xc41f04cc 0x4 0x0 0
> W 4 543.050282 3 0xc6070000 0x1 0x0 0
> R 4 543.050358 3 0xc6070000 0x0 0x0 0
> R 4 543.050418 3 0xc661002c 0x370 0x0 0
> R 4 543.050462 3 0xc661002c 0x370 0x0 0
> W 4 543.050588 10 0xc41f0440 0x1 0x0 0
> W 4 543.050614 10 0xc41f0444 0x4 0x0 0
> W 4 543.050719 3 0xc6070000 0x1 0x0 0
> R 4 543.050793 3 0xc6070000 0x0 0x0 0
> W 4 543.050896 10 0xc41f03c0 0x1 0x0 0
> W 4 543.050922 10 0xc41f03c4 0x4 0x0 0
> W 4 543.051028 3 0xc6070000 0x1 0x0 0
> R 4 543.051101 3 0xc6070000 0x0 0x0 0
> W 4 543.051227 10 0xc41f05e0 0x1 0x0 0
> W 4 543.051253 10 0xc41f05e4 0x4 0x0 0
> W 4 543.051360 3 0xc6070000 0x1 0x0 0
> R 4 543.051434 3 0xc6070000 0x0 0x0 0
> W 4 543.051529 10 0xc41f0200 0x1 0x0 0
> W 4 543.051554 10 0xc41f0204 0x4 0x0 0
> W 4 543.051659 3 0xc6070000 0x1 0x0 0
> R 4 543.051732 3 0xc6070000 0x0 0x0 0
> W 4 543.051784 10 0xc439e000 0x7e 0x0 0
> W 4 543.051807 10 0xc439e004 0x7e 0x0 0
> W 4 543.051829 10 0xc439e008 0x1 0x0 0
> W 4 543.051851 10 0xc439e00c 0x2 0x0 0
> W 4 543.051926 3 0xc6070000 0x1 0x0 0
> R 4 543.051999 3 0xc6070000 0x0 0x0 0
> W 4 543.052158 3 0xc60032f4 0x1ff64 0x0 0
> W 4 543.052228 3 0xc60032ec 0x4 0x0 0
> R 4 543.052296 3 0xc60032ec 0x4 0x0 0
> R 4 543.052377 3 0xc6002504 0x0 0x0 0
> W 4 543.052451 3 0xc6002504 0x1 0x0 0
> R 4 543.052745 3 0xc6000100 0x0 0x0 0
> R 4 543.052849 3 0xc6002080 0x0 0x0 0
> R 4 543.053007 3 0xc6003220 0xd06191 0x0 0
> R 4 543.053075 3 0xc6003250 0x90000001 0x0 0
> R 4 543.053154 3 0xc6002504 0x11 0x0 0
> R 4 543.053226 3 0xc6002508 0x340 0x0 0
> R 4 543.053295 3 0xc6003220 0xd06191 0x0 0
> R 4 543.053365 3 0xc6003250 0x90000001 0x0 0
> R 4 543.053444 3 0xc6000200 0xdff3d113 0x0 0
> R 4 543.053516 3 0xc600251c 0x3f 0x0 0
> R 4 543.053581 3 0xc640032c 0x8001fd9a 0x0 0
> R 4 543.053630 3 0xc640032c 0x8001fd9a 0x0 0
> W 4 543.053678 3 0xc640032c 0x1fd9a 0x0 0
> R 4 543.053753 3 0xc60032f0 0x3 0x0 0
> W 4 543.053843 3 0xc60032f0 0x7f 0x0 0
> R 4 543.053921 3 0xc6003220 0xd06191 0x0 0
> W 4 543.053990 3 0xc6003220 0xd06191 0x0 0
> R 4 543.054054 3 0xc6002504 0x11 0x0 0
> W 4 543.054123 3 0xc6002504 0x10 0x0 0
> R 4 543.054195 3 0xc600260c 0x801fd99f 0x0 0
> W 4 543.054268 3 0xc600260c 0x1ff68 0x0 0
> W 4 543.054371 10 0xc43cdd10 0x0 0x0 0
> W 4 543.054393 10 0xc43cdd14 0x0 0x0 0
> W 4 543.054415 10 0xc43cdd18 0x0 0x0 0
> W 4 543.054437 10 0xc43cdd1c 0x0 0x0 0
> W 4 543.054460 10 0xc43cdd20 0x0 0x0 0
> W 4 543.054482 10 0xc43cdd24 0x0 0x0 0
> W 4 543.054504 10 0xc43cdd28 0x0 0x0 0
> W 4 543.054526 10 0xc43cdd2c 0x0 0x0 0
> W 4 543.054549 10 0xc43cdd30 0x0 0x0 0
> W 4 543.054571 10 0xc43cdd34 0x0 0x0 0
> W 4 543.054593 10 0xc43cdd38 0x0 0x0 0
> W 4 543.054616 10 0xc43cdd3c 0x0 0x0 0
> W 4 543.054638 10 0xc43cdd40 0x0 0x0 0
> W 4 543.054660 10 0xc43cdd44 0x0 0x0 0
> W 4 543.054823 3 0xc6070000 0x1 0x0 0
> R 4 543.054921 3 0xc6070000 0x0 0x0 0
>


More information about the Nouveau mailing list