GPU lockup CP stall for more than 10000msec on latest vanilla git

Maarten Lankhorst maarten.lankhorst at canonical.com
Tue Dec 18 07:24:59 PST 2012


Op 18-12-12 14:38, Markus Trippelsdorf schreef:
> On 2012.12.18 at 12:20 +0100, Michel Dänzer wrote:
>> On Mon, 2012-12-17 at 23:55 +0100, Markus Trippelsdorf wrote: 
>>> On 2012.12.17 at 23:25 +0100, Markus Trippelsdorf wrote:
>>>> On 2012.12.17 at 17:00 -0500, Alex Deucher wrote:
>>>>> On Mon, Dec 17, 2012 at 4:48 PM, Markus Trippelsdorf
>>>>> <markus at trippelsdorf.de> wrote:
>>>>>> On 2012.12.17 at 16:32 -0500, Alex Deucher wrote:
>>>>>>> On Mon, Dec 17, 2012 at 1:27 PM, Markus Trippelsdorf
>>>>>>> <markus at trippelsdorf.de> wrote:
>>>>>>>> As soon as I open the following website:
>>>>>>>> http://www.boston.com/bigpicture/2012/12/2012_year_in_pictures_part_i.html
>>>>>>>>
>>>>>>>> my Radeon RS780 stalls (GPU lockup) leaving the machine unusable:
>>>>>>> Is this a regression?  Most likely a 3D driver bug unless you are only
>>>>>>> seeing it with specific kernels.  What browser are you using and do
>>>>>>> you have hw accelerated webgl, etc. enabled?  If so, what version of
>>>>>>> mesa are you using?
>>>>>> This is a regression, because it is caused by yesterdays merge of
>>>>>> drm-next by Linus. IOW I only see this bug when running a
>>>>>> v3.7-9432-g9360b53 kernel.
>>>>> Can you bisect?  I'm guessing it may be related to the new DMA rings.  Possibly:
>>>>> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=2d6cc7296d4ee128ab0fa3b715f0afde511f49c2
>>>> Yes, the commit above causes the issue. 
>>>>
>>>>  2d6cc72  GPU lockups
>>> With 2d6cc72 reverted I get:
>>>
>>> Dec 17 23:09:35 x4 kernel: ------------[ cut here ]------------
>> Probably a separate issue, can you bisect this one as well?
> Yes. Git-bisect points to:
>
> 85b144f860176ec18db927d6d9ecdfb24d9c6483 is the first bad commit
> commit 85b144f860176ec18db927d6d9ecdfb24d9c6483
> Author: Maarten Lankhorst <maarten.lankhorst at canonical.com>
> Date:   Thu Nov 29 11:36:54 2012 +0000
>
>     drm/ttm: call ttm_bo_cleanup_refs with reservation and lru lock
>     held, v3
>
> (Please note that this bug is a little bit harder to reproduce. But
> when you scroll up and down for ~10 seconds on the webpage mentioned
> above it will trigger the oops.
> So while I'm not 100% sure that the issue is caused by exactly this
> commit, the vicinity should be right)
>
Those dmesg warnings sound suspicious, looks like something is going very wrong there.

Can you revert the one before it? "drm/radeon: allow move_notify to be called without reservation"
Reservation should be held at this point, that commit got in accidentally.

I doubt not holding a reservation is causing it though, I don't really see how that commit could
cause it however, so can you please double check it never happened before that point, and only started at that commit?

also slap in a BUG_ON(!ttm_bo_is_reserved(bo)) in ttm_bo_cleanup_refs_and_unlock for good measure,
and a BUG_ON(spin_trylock(&bdev->fence_lock)); to ttm_bo_wait.

I really don't see how that specific commit can be wrong though, so awaiting your results first before I try to dig more into it.

~Maarten



More information about the dri-devel mailing list