GPU hang trying to run OpenCL kernels on x86_64

Luís Mendes luis.p.mendes at gmail.com
Tue Jun 26 09:03:17 UTC 2018


I've tested Ubuntu 18.04 with kernel 4.17.2 using libdrm-2.4.92 and
mesa-18.1.0 and AMD RX 550 4GB is still hanging when running the identified
OpenCL kernels.

[  548.704916] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0
timeout, last signaled seq=30, last emitted seq=33
[  548.704988] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma1
timeout, last signaled seq=29, last emitted seq=31
[  548.704992] [drm] IP block:gmc_v8_0 is hung!
[  548.704994] [drm] IP block:tonga_ih is hung!
[  548.704996] [drm] IP block:gmc_v8_0 is hung!
[  548.704997] [drm] IP block:gfx_v8_0 is hung!
[  548.704998] [drm] IP block:sdma_v3_0 is hung!
[  548.704999] [drm] IP block:tonga_ih is hung!
[  548.705000] [drm] IP block:uvd_v6_0 is hung!
[  548.705001] [drm] IP block:gfx_v8_0 is hung!
[  548.705002] [drm] IP block:sdma_v3_0 is hung!
[  548.705003] [drm] IP block:uvd_v6_0 is hung!
[  548.705004] [drm] IP block:vce_v3_0 is hung!
[  548.705005] [drm] GPU recovery disabled.
[  548.705006] [drm] IP block:vce_v3_0 is hung!
[  548.705007] [drm] GPU recovery disabled.

Are there any new regarding this issue?

Regards,
Luís

On Fri, May 25, 2018 at 11:23 AM, Luís Mendes <luis.p.mendes at gmail.com>
wrote:

> I've just tested Ubuntu 18.04 with kernel 4.17-rc6 using libdrm-2.4.92 and
> mesa-18.1.0.
> Now both sdma0 and sdma1 timeout as can be seen in the attached logs.
>
> ~agd5f -b drm-next-4.18 doesn't improve also.
>
> I have also tried amdgpu-pro 18.20 both on Ubuntu 18.04 and 16.04, but no
> improvements.
> I have tried amdgpu-pro 18.10 and 17.50 and also no improvements.
>
> ./amdgpu-pro-install -opencl=legacy,pal --headless
>
> On Thu, May 24, 2018 at 11:18 AM, Luís Mendes <luis.p.mendes at gmail.com>
> wrote:
>
>> Additional update...
>>
>> I was able to boot and enter X by installing an NVIDIA GTX 1050 Ti as the
>> primary display card and using an AMD RX 550 as the secondary card on the
>> Tyan S7025 with the same Ubuntu 18.04 and the same Linux kernel 4.17-rc6.
>> However once I try to run an OpenCL kernel on RX 550 I get a sdma1
>> timeout and the GPU hangs, which likely what is happening when I boot with
>> RX 550 as the single GPU card on the system.
>>
>> This means it is not an issue introduced in 4.17-rc6, it just means that
>> I didn't notice the effect of the system with the two GPUs vs system with
>> single AMD GPU.
>>
>> The dmesg log follows attached.
>>
>> Luís
>>
>> On Thu, May 24, 2018 at 10:13 AM, Luís Mendes <luis.p.mendes at gmail.com>
>> wrote:
>>
>>> Hi Michel,
>>>
>>> I also work as a researcher at a university and we are considering
>>> buying AMD cards to do OpenCL computations for numerical modelling, but
>>> currently I am unable to give a try at the AMD cards I have at home.
>>> I couldn't find any working driver for them... also amdgpu-pro drivers
>>> don't work, or at least I have been unable to make them work.
>>>
>>> Regards,
>>> Luís
>>>
>>> On Thu, May 24, 2018 at 10:01 AM, Luís Mendes <luis.p.mendes at gmail.com>
>>> wrote:
>>>
>>>> Hi Michel,
>>>>
>>>> So summarizing with Linux kernel 4.17-rc6 on Ubuntu 18.04 using AMD RX
>>>> 460/RX 550 I am not able to enter X.
>>>> The same system with AMD Radeon R7 240 not only enters X as also runs
>>>> the OpenCL kernel that RX 460 / RX 550 are unable to run for all the
>>>> kernels that I have tested.
>>>> Could this also be a Mesa issue, regarding OpenCL on RX 460?
>>>>
>>>> Regards,
>>>> Luís
>>>>
>>>> On Thu, May 24, 2018 at 9:55 AM, Luís Mendes <luis.p.mendes at gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Michel,
>>>>>
>>>>> I will have to check previous rc releases of 4.17 to see if it wasn't
>>>>> already happening, before trying any possible git bisect.
>>>>> As an update I can say that an AMD Radeon R7 240 works fine on the
>>>>> same system with the same kernel and I am able to run the OpenCL kernels,
>>>>> that I couldn't with RX 460/RX 550.
>>>>>
>>>>> Regards,
>>>>> Luís
>>>>>
>>>>> On Thu, May 24, 2018 at 9:30 AM, Michel Dänzer <michel at daenzer.net>
>>>>> wrote:
>>>>>
>>>>>> On 2018-05-24 12:06 AM, Luís Mendes wrote:
>>>>>> > I've tried Linux 4.17-rc6 with Ubuntu 18.04 on Tyan S7002 and I am
>>>>>> not even
>>>>>> > able see lightdm/gdm3 as system hangs when starting X.
>>>>>> > Having SR-IOV enabled or disabled makes no difference.
>>>>>> > Tested with AMD RX 460.
>>>>>> > When X is supposed to start the system hangs and only a rectangular
>>>>>> region
>>>>>> > on the top left corner screen remains with console text messages
>>>>>> from the
>>>>>> > boot process while the remaining of the screen is just black. I am
>>>>>> unable
>>>>>> > to do anything with the keyboard, switching to console does not
>>>>>> work,
>>>>>> > ctrl-alt-del also doesn't work. I've to do a cold reset.
>>>>>>
>>>>>> Can you isolate which change introduced this new issue with git
>>>>>> bisect?
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Earthling Michel Dänzer               |
>>>>>> http://www.amd.com
>>>>>> Libre software enthusiast             |             Mesa and X
>>>>>> developer
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20180626/30262571/attachment-0001.html>


More information about the amd-gfx mailing list