GPU lockup+blackout on 2.6.35 with r600...

Alex Deucher alexdeucher at gmail.com
Thu Aug 5 08:23:30 PDT 2010


On Thu, Aug 5, 2010 at 9:50 AM, Jerome Glisse <glisse at freedesktop.org> wrote:
> On 08/05/2010 06:52 AM, Daniel J Blueman wrote:
>>
>> After around 10-30 mins of typical usage (ie cairo rendering to
>> surfaces + compiz) on my Dell Studio 15 (model 1557), my R600 radeon
>> locks up, giving a blank screen.
>>
>> The only information I have is a backtrace [1]. 'radeontool regs'
>> doesn't give anything - what information, /sys files etc would be
>> useful here, and would it be useful to log a freedesktop.org bug
>> report at this stage, assuming I haven't found a reliable reproducer?
>>
>> OS is Ubuntu 10.10 development release, mesa 7.8.2, libdrm2 2.4.21.
>>
>> Thanks,
>>   Daniel
>>
>> --- [1]
>>
>> radeon 0000:02:00.0: GPU lockup CP stall for more than 1000msec
>> ------------[ cut here ]------------
>> WARNING: at
>> /home/kernel-ppa/COD/linux/drivers/gpu/drm/radeon/radeon_fence.c:235
>> radeon_fence_wait+0x2b7/0x320 [radeon]()
>> Hardware name: Studio 1557
>> GPU lockup (waiting for 0x00008C35 last fence id 0x00008C33)
>> Modules linked in: binfmt_misc parport_pc ppdev kvm_intel kvm
>> microcode snd_hda_codec_atihdmi joydev ipt_REJECT xt_limit xt_tcpudp
>> ipt_addrtype xt_state snd_hda_codec_idt ip6table_filter ip6_tables
>> snd_hda_intel nf_nat_irc nf_conntrack_irc snd_hda_codec nf_nat_ftp
>> nf_nat snd_hwdep nf_conntrack_ipv4 snd_seq_midi arc4 snd_pcm
>> snd_rawmidi nf_defrag_ipv4 radeon nf_conntrack_ftp nf_conntrack
>> iptable_filter snd_seq_midi_event snd_seq ip_tables ttm iwlagn video
>> snd_timer uvcvideo x_tables snd_seq_device output drm_kms_helper lp
>> dell_laptop iwlcore drm videodev dcdbas dell_wmi v4l1_compat mac80211
>> i7core_edac parport v4l2_compat_ioctl32 snd psmouse edac_core
>> i2c_algo_bit soundcore cfg80211 snd_page_alloc serio_raw sdhci_pci
>> sdhci ahci libahci led_class r8169 mii btrfs zlib_deflate crc32c
>> libcrc32c
>> Pid: 1624, comm: Xorg Tainted: G      D     2.6.35-020635-generic #020635
>> Call Trace:
>> ? radeon_fence_wait+0x2b7/0x320 [radeon]
>> warn_slowpath_common+0x90/0xc0
>> warn_slowpath_fmt+0x6e/0x70
>> ? schedule_timeout+0x15a/0x2e0
>> ? r600_irq_set+0x27d/0xc00 [radeon]
>> ? radeon_ring_commit+0xa3/0xb0 [radeon]
>> ? r100_gpu_cp_is_lockup+0xc2/0xd0 [radeon]
>> ? r600_gpu_is_lockup+0x1cb/0x220 [radeon]
>> radeon_fence_wait+0x2b7/0x320 [radeon]
>> ? autoremove_wake_function+0x0/0x40
>> radeon_sync_obj_wait+0x11/0x20 [radeon]
>> ttm_bo_wait+0x102/0x1b0 [ttm]
>> ttm_bo_move_accel_cleanup+0x19e/0x230 [ttm]
>> radeon_move_blit+0x124/0x170 [radeon]
>> radeon_bo_move+0xda/0x1a0 [radeon]
>> ttm_bo_handle_move_mem+0xec/0x370 [ttm]
>> ttm_bo_evict+0x1cc/0x270 [ttm]
>> ? drm_mm_split_at_start+0x1d/0x80 [drm]
>> ttm_mem_evict_first+0xed/0x180 [ttm]
>> ? ttm_bo_man_get_node+0xd5/0xe0 [ttm]
>> ttm_bo_mem_force_space+0xab/0x110 [ttm]
>> ttm_bo_mem_space+0x305/0x370 [ttm]
>> ttm_bo_move_buffer+0xcf/0x140 [ttm]
>> ? drm_mm_split_at_start+0x1d/0x80 [drm]
>> ttm_bo_validate+0xd5/0x100 [ttm]
>> ttm_bo_init+0x1f2/0x240 [ttm]
>> radeon_bo_create+0x121/0x240 [radeon]
>> ? radeon_ttm_bo_destroy+0x0/0x80 [radeon]
>> radeon_gem_object_create+0x89/0x100 [radeon]
>> ? drm_gem_handle_create+0x8b/0xa0 [drm]
>> radeon_gem_create_ioctl+0x58/0xe0 [radeon]
>> drm_ioctl+0x283/0x460 [drm]
>> ? radeon_gem_create_ioctl+0x0/0xe0 [radeon]
>> ? do_readv_writev+0x16b/0x1e0
>> vfs_ioctl+0x3a/0xc0
>> do_vfs_ioctl+0x6d/0x1f0
>> sys_ioctl+0x87/0xa0
>> ? sys_writev+0x55/0xb0
>> system_call_fastpath+0x16/0x1b
>> ---[ end trace c0bc12025fa8386c ]---
>> radeon 0000:02:00.0: GPU softreset
>> radeon 0000:02:00.0:   R_008010_GRBM_STATUS=0xA0003028
>> radeon 0000:02:00.0:   R_008014_GRBM_STATUS2=0x00000002
>> radeon 0000:02:00.0:   R_000E50_SRBM_STATUS=0x200000C0
>> radeon 0000:02:00.0:   R_008020_GRBM_SOFT_RESET=0x00007FEE
>> radeon 0000:02:00.0: R_008020_GRBM_SOFT_RESET=0x00000001
>> radeon 0000:02:00.0:   R_008010_GRBM_STATUS=0x00003028
>> radeon 0000:02:00.0:   R_008014_GRBM_STATUS2=0x00000002
>> radeon 0000:02:00.0:   R_000E50_SRBM_STATUS=0x200000C0
>> radeon 0000:02:00.0: GPU reset succeed
>> Clocks initialized !
>> ring test succeeded in 1 usecs
>> ib test succeeded in 1 usecs
>
>
> SO it keep reseting ? According to log there is a GPU lockup but then
> a successfull GPU reset so GPU should resume fine (that's what log says)
> Best is to open a bug and attach full dmesg + lspci -vv and context
> in which the lockup happen
>

Perhaps the lockup timeout is too short or interrupts are being
delivered late or something?  There's a fdo or kernel bugzilla entry
where removing the lockup timeout fixed the issue.  Does something
like this:

--- a/drivers/gpu/drm/radeon/radeon_fence.c
+++ b/drivers/gpu/drm/radeon/radeon_fence.c
@@ -237,9 +237,11 @@ retry:
                         * as signaled for now
                         */
                        rdev->gpu_lockup = true;
+#if 0
                        r = radeon_gpu_reset(rdev);
                        if (r)
                                return r;
+#endif
                        WREG32(rdev->fence_drv.scratch_reg, fence->seq);
                        rdev->gpu_lockup = false;
                }

fix the issue?

Alex


More information about the dri-devel mailing list