[Mesa-dev] GPU lockup CP stall when calling clBuildProgram on Cayman

Tom Stellard tom at stellard.net
Mon Jan 13 10:00:52 PST 2014


On Thu, Jan 09, 2014 at 02:57:20PM +0000, christophe choquet wrote:
> Hi,
> 
> I am using kernel 3.12.6-gentoo, Mesa 10.0.1 and once every two calls to clBuildProgram, the GPU goes to reset after 10 seconds.
> This also happens on Debian unstable with Mesa 9.2. First hello_world works, the next one hangs, third works, and so on.
> 
> Despite this hang on this particular OpenCL call, every thing is just fine. I tried to comment out DMA flushing code in r600/r600_hw_context.c, but this issue does not look the one that what was discovered on R600 HW.
> 
> After the hang, opencl_examples/hello_world returns the correct value (when the machine does not hang completely which happens sometimes). Same behaviour for get-global-id test program.
> 

This is likely the same issues as https://bugs.freedesktop.org/show_bug.cgi?id=73418

Are you running the OpenCL programs with or without X?  Can you reply in the comments of the bug.

Thanks,
Tom

> Here is my config & logs:
> lscpi:
> 01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Cayman PRO [Radeon HD 6950]
> 
> dmesg:
> [  826.250105] radeon 0000:01:00.0: GPU lockup CP stall for more than 10000msec
> [  826.250110] radeon 0000:01:00.0: GPU lockup (waiting for 0x00000000000037bc last fence id 0x00000000000037ba)
> [  826.250118] [drm] Disabling audio 0 support
> [  826.257466] radeon 0000:01:00.0: Saved 111 dwords of commands on ring 0.
> [  826.257496] radeon 0000:01:00.0: GPU softreset: 0x00000008
> [  826.257498] radeon 0000:01:00.0:   GRBM_STATUS               = 0xB0001828
> [  826.257500] radeon 0000:01:00.0:   GRBM_STATUS_SE0           = 0x00000003
> [  826.257502] radeon 0000:01:00.0:   GRBM_STATUS_SE1           = 0x00000003
> [  826.257504] radeon 0000:01:00.0:   SRBM_STATUS               = 0x200000C0
> [  826.257526] radeon 0000:01:00.0:   SRBM_STATUS2              = 0x00000000
> [  826.257528] radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
> [  826.257529] radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x40000000
> [  826.257531] radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00010006
> [  826.257533] radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x80228647
> [  826.257535] radeon 0000:01:00.0:   R_00D034_DMA_STATUS_REG   = 0x44C83D57
> [  826.257537] radeon 0000:01:00.0:   R_00D834_DMA_STATUS_REG   = 0x44C83D57
> [  826.257539] radeon 0000:01:00.0:   VM_CONTEXT0_PROTECTION_FAULT_ADDR   0x00000000
> [  826.257541] radeon 0000:01:00.0:   VM_CONTEXT0_PROTECTION_FAULT_STATUS 0x00000000
> [  826.257542] radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00000000
> [  826.257544] radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x00000000
> [  826.264350] radeon 0000:01:00.0: GRBM_SOFT_RESET=0x00004001
> [  826.264403] radeon 0000:01:00.0: SRBM_SOFT_RESET=0x00000100
> [  826.265558] radeon 0000:01:00.0:   GRBM_STATUS               = 0x00001828
> [  826.265560] radeon 0000:01:00.0:   GRBM_STATUS_SE0           = 0x00000003
> [  826.265561] radeon 0000:01:00.0:   GRBM_STATUS_SE1           = 0x00000003
> [  826.265563] radeon 0000:01:00.0:   SRBM_STATUS               = 0x200000C0
> [  826.265585] radeon 0000:01:00.0:   SRBM_STATUS2              = 0x00000000
> [  826.265587] radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
> [  826.265589] radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x00000000
> [  826.265590] radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00000000
> [  826.265592] radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x00000000
> [  826.265594] radeon 0000:01:00.0:   R_00D034_DMA_STATUS_REG   = 0x44C83D57
> [  826.265596] radeon 0000:01:00.0:   R_00D834_DMA_STATUS_REG   = 0x44C83D57
> [  826.265623] radeon 0000:01:00.0: GPU reset succeeded, trying to resume
> [  826.283559] [drm] PCIE gen 2 link speeds already enabled
> [  826.285981] [drm] PCIE GART of 1024M enabled (table at 0x0000000000273000).
> [  826.286049] radeon 0000:01:00.0: WB enabled
> [  826.286051] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000080000c00 and cpu addr 0xffff8800cbaa3c00
> ......
> 
> 
> On hello_world.c program hangs every two calls at line:
>  error = clBuildProgram(program,
>                           1, /* Number of devices */
>                           &device_id,
>                           NULL, /* options */
>                           NULL, /* callback function when compile is complete */
>                           NULL); /* user data for callback */
> 
> 
> Thanks for your help,
> Regards
>  		 	   		  

> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev



More information about the mesa-dev mailing list