<div dir="ltr"><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, May 16, 2019 at 11:43 AM Alex Deucher <<a href="mailto:alexdeucher@gmail.com">alexdeucher@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On Wed, May 15, 2019 at 8:33 PM Daniel Kasak <<a href="mailto:d.j.kasak.dk@gmail.com" target="_blank">d.j.kasak.dk@gmail.com</a>> wrote:<br>
><br>
> On Mon, May 13, 2019 at 11:44 AM Daniel Kasak <<a href="mailto:d.j.kasak.dk@gmail.com" target="_blank">d.j.kasak.dk@gmail.com</a>> wrote:<br>
>><br>
>> Hi all. I had version 2.2.0 of the ROCM stack running on a 5.0.x and 5.1.0 kernel. Things were going great with various boinc GPU tasks. But there is a setiathome GPU task which reliably gives me a hard lockup within about 30 minutes of running. I actually had to do *two* emergency re-installs over the past week. Perhaps part of this was my fault ( running btrfs with lzo compression on my root partition ... ). But absolutely part of this was the hard lockups. I've tested all kinds of other things ( eg rebuilding lots of stuff under Gentoo ) ... I don't have a general stability issue even under hours of high load. But after restarting boinc with that same setiathome task ... <bang>!<br>
>><br>
>> If someone wants me to sacrifice another installation, they can point me to instructions for trying to gather more information.<br>
>><br>
>> Anyway ... perhaps more work around detecting and recovering from GPU lockups is in order?<br>
>><br>
>> Dan<br>
><br>
><br>
> <sigh><br>
><br>
> That's what I was afraid of :(<br>
<br>
Not sure what you were afraid of. I don't think anyone has looked at<br>
setiathome on ROCm. I'd suggest filing a bug<br>
(<a href="https://bugs.freedesktop.org" rel="noreferrer" target="_blank">https://bugs.freedesktop.org</a>) and attaching your dmesg output and<br>
xorg log (if using X). If there is a GPU reset, note that you will<br>
need to restart your desktop environment because currently neither<br>
glamor or any compositors support GL robustness extensions to reset<br>
their contexts after a GPU reset.<br>
<br>
Alex<br></blockquote><div><br></div><div>Hi Alex. dmesg output is not available ... this is a *hard* lockup. I need to power-cycle after it happens ( ALT <a class="gmail_plusreply" id="plusReplyChip-0">+</a> SysRq + { S , U , B } doesn't even work ). That's why I asked for instructions to possibly gather more info. I did check the xorg log after I did an emergency export of my filesystem ... nothing of interest in there. It seems like I currently don't really have enough info to make a bug report worthwhile.</div><div><br></div><div>Dan<br></div></div></div>