Hard lockups with ROCM

Alex Deucher alexdeucher at gmail.com
Thu May 16 01:43:12 UTC 2019


On Wed, May 15, 2019 at 8:33 PM Daniel Kasak <d.j.kasak.dk at gmail.com> wrote:
>
> On Mon, May 13, 2019 at 11:44 AM Daniel Kasak <d.j.kasak.dk at gmail.com> wrote:
>>
>> Hi all. I had version 2.2.0 of the ROCM stack running on a 5.0.x and 5.1.0 kernel. Things were going great with various boinc GPU tasks. But there is a setiathome GPU task which reliably gives me a hard lockup within about 30 minutes of running. I actually had to do *two* emergency re-installs over the past week. Perhaps part of this was my fault ( running btrfs with lzo compression on my root partition ... ). But absolutely part of this was the hard lockups. I've tested all kinds of other things ( eg rebuilding lots of stuff under Gentoo ) ... I don't have a general stability issue even under hours of high load. But after restarting boinc with that same setiathome task ... <bang>!
>>
>> If someone wants me to sacrifice another installation, they can point me to instructions for trying to gather more information.
>>
>> Anyway ... perhaps more work around detecting and recovering from GPU lockups is in order?
>>
>> Dan
>
>
> <sigh>
>
> That's what I was afraid of :(

Not sure what you were afraid of.  I don't think anyone has looked at
setiathome on ROCm.  I'd suggest filing a bug
(https://bugs.freedesktop.org) and attaching your dmesg output and
xorg log (if using X).  If there is a GPU reset, note that you will
need to restart your desktop environment because currently neither
glamor or any compositors support GL robustness extensions to reset
their contexts after a GPU reset.

Alex


More information about the amd-gfx mailing list