[Nouveau] CCACHE and VFETCH FAULTs causing lockups

Maarten Maathuis madman2003 at gmail.com
Mon Mar 7 13:51:05 PST 2011


On Sun, Mar 6, 2011 at 2:24 PM, Ben Skeggs <skeggsb at gmail.com> wrote:
>
>
> Sent from my iPhone
>
> On 07/03/2011, at 0:03, Maarten Maathuis <madman2003 at gmail.com> wrote:
>
>> On Sun, Mar 6, 2011 at 1:44 PM, Ben Skeggs <skeggsb at gmail.com> wrote:
>>> Sorry for the top posting, it's late and typing from my phone in bed lol.
>>>
>>> Just wanted to see if you had an update? And, this is NV86 I guess?
>>>
>>> Ben.
>>>
>>> Sent from my iPhone
>>>
>>> On 02/03/2011, at 8:20, Maarten Maathuis <madman2003 at gmail.com> wrote:
>>>
>>>> On Tue, Mar 1, 2011 at 9:51 PM, Ben Skeggs <bskeggs at redhat.com> wrote:
>>>>> On Tue, 2011-03-01 at 21:08 +0000, Maarten Maathuis wrote:
>>>>>
>>>>>> Those come after 15-30 minutes of running warzone2100, i haven't
>>>>>> played any games for a while, so no idea how long this has been going
>>>>>> on.
>>>>>> I also got a TRAP_CCACHE on channel 2 a little while ago, it takes
>>>>>> much longer to trigger (a few hours). I'm using todays "nouveau
>>>>>> kernel" git.
>>>>> You're not the first person to have reported this fwiw, personally, I
>>>>> haven't seen it yet..
>>>>>
>>>>>>
>>>>>> I'm guessing something is being unmapped too early or without reason,
>>>>>> or some cache is stale. But it isn't obvious what exactly it is.
>>>>>>
>>>>>> Because i don't remember having these lockups before I'm inclined to
>>>>>> guess that this commit is involved
>>>>>> http://cgit.freedesktop.org/nouveau/linux-2.6/commit/?id=6330d8f5ecc4a19fd2ad3c7fa128b2f4c2ce3360
>>>>>>
>>>>>> Any ideas?
>>>>> Not really.  If this commit *is* the cause, the problem is still
>>>>> somewhere else.  That commit just makes sure PTEs are marked invalid, so
>>>>> if it's causing your faults, then previously the GPU would still have
>>>>> been reading/writing invalid data.
>>>>>
>>>>> Plus, I expect you should probably have seen a VM fault..
>>>>
>>>> So these faults are just generic errors? Unrelated to page faults?
>>>>
>>>>>
>>>>> Ben.
>>>>>>
>>>>>> Maarten.
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Far away from the primal instinct, the song seems to fade away, the
>>>> river get wider between your thoughts and the things we do and say.
>>>> _______________________________________________
>>>> Nouveau mailing list
>>>> Nouveau at lists.freedesktop.org
>>>> http://lists.freedesktop.org/mailman/listinfo/nouveau
>>>
>>
>> No this is NV96. The revert definitely helps, but no luck so far in
>> finding a plausible cause for the problem.
> Hey,
>
> Ok. Hmm. I thought you had NV86 for some reason! It's a long shot and I'm not entirely convinced it'll help at all, but can you switch graph.tlb_flush pointer to the nv86 version and see if anything changes?

I used to have a NV86, but it died more than a year ago in the typical
way for that generation of card, due to thermal issues I guess (it was
a passively cooled card). I haven't tried using the nv86 tlb flush,
out of curiosity, is this something nvidia does (a lot) on nv86?

>
> The *other* possible thing is that the ttm delayed delete queue is causing multiple tlb flushes to happen at the same time.  I'll add locking for that in the morning, that was a complete oversight.

I've had no lockups since you added the spinlocks, so maybe that was
it. Time will tell.

>
> Ben.
>
>>
>> --
>> Far away from the primal instinct, the song seems to fade away, the
>> river get wider between your thoughts and the things we do and say.
>



-- 
Far away from the primal instinct, the song seems to fade away, the
river get wider between your thoughts and the things we do and say.


More information about the Nouveau mailing list