[Nouveau] [REGRESSION] nouveau: Crash in gk104_fifo_intr_runlist()

Tue Aug 11 20:53:51 PDT 2015

Sending the revert patch to Dave after receiving his green light for
this, and will investigate the issue on my side. I should be able to find a
gk107 somewhere...

On Wed, Aug 12, 2015 at 12:35 PM, Alexandre Courbot <gnurou at gmail.com> wrote:
> Mmm in that case it is probably best to revert that commit for the
> time being. It was targeting GM20B (and maybe other Maxwells too) so
> reverting it should not hurt anyone at the moment. I think Ben is on
> holidays for now, is there anyone else who can send a pull request to
> Dave Airlie for this? We don't want 4.2 to ship with a crash every
> other reboot...
>
> On Wed, Aug 12, 2015 at 10:01 AM, Eric Biggers <ebiggers3 at gmail.com> wrote:
>> Hi,
>>
>> I think I've done about 10 reboots with the commit reverted and I never
>> experienced the crash.  But with 4.2.0-rc6 I get the crash on about every
>> other reboot.
>>
>> Probably relevant: the computer on which the crash occurs has two GPUs (one
>> Intel and one Nvidia).  The Intel one is actually being used, whereas I
>> presume the Nvidia one is being automatically disabled shortly after boot,
>> perhaps when the crash occurs...
>>
>> Eric
>>
>> On Mon, Aug 10, 2015 at 11:28 PM, Alexandre Courbot <gnurou at gmail.com>
>> wrote:
>>>
>>> Indeed, and I am actually surprised to see one here. I will
>>> double-check that patch.
>>>
>>> Eric, would you be able to give an estimate of the repro rate for this
>>> issue? More testing with and without the patch would be welcome, it'd
>>> be good to know whether it is actually the culprit or not.
>>>
>>> On Mon, Aug 10, 2015 at 2:28 AM, Ilia Mirkin <imirkin at alum.mit.edu> wrote:
>>> > Alexandre, could you take a look? 0xbad* generally comes from bad mmio
>>> > reads.
>>> >
>>> > On Aug 9, 2015 1:08 PM, "Eric Biggers" <ebiggers3 at gmail.com> wrote:
>>> >>
>>> >> Hi,
>>> >>
>>> >> I am testing Linux v4.2-rc5 and I am sporadically getting crashes
>>> >> shortly
>>> >> after
>>> >> startup in gk104_fifo_intr_runlist().  What I've found is that the
>>> >> 'mask'
>>> >> value
>>> >> read from offset 0x2a00 comes back as '0xbad0da00'.  This causes the
>>> >> 'engn'
>>> >> variable to be assigned the value 9, which is invalid; then wake_up()
>>> >> is
>>> >> called
>>> >> on an uninitialized waitqueue which causes the crash.
>>> >>
>>> >> Reverting commit 1addc12648521d ("drm/nouveau/fifo/gk104: kick channels
>>> >> when
>>> >> deactivating them") seemed to make the problem go away, although I
>>> >> can't
>>> >> be 100%
>>> >> sure because the problem is sporadic.
>>> >>
>>> >> Attached an example of the kernel log up to the crash.
>>> >>
>>> >> Eric
>>> >>
>>> >> _______________________________________________
>>> >> Nouveau mailing list
>>> >> Nouveau at lists.freedesktop.org
>>> >> http://lists.freedesktop.org/mailman/listinfo/nouveau
>>> >>
>>> >
>>
>>