<html>
<head>
<base href="https://bugs.freedesktop.org/">
</head>
<body>
<p>
<div>
<b><a class="bz_bug_link
bz_status_NEW "
title="NEW - [hawaii, radeonsi, clover] Running Piglit cl/program/execute/{,tail-}calls{,-struct,-workitem-id}.cl cause GPU VM error and ring stalled GPU lockup"
href="https://bugs.freedesktop.org/show_bug.cgi?id=105113#c8">Comment # 8</a>
on <a class="bz_bug_link
bz_status_NEW "
title="NEW - [hawaii, radeonsi, clover] Running Piglit cl/program/execute/{,tail-}calls{,-struct,-workitem-id}.cl cause GPU VM error and ring stalled GPU lockup"
href="https://bugs.freedesktop.org/show_bug.cgi?id=105113">bug 105113</a>
from <span class="vcard"><a class="email" href="mailto:mail@maciej.szmigiero.name" title="Maciej S. Szmigiero <mail@maciej.szmigiero.name>"> <span class="fn">Maciej S. Szmigiero</span></a>
</span></b>
<pre>(In reply to Jan Vesely from <a href="show_bug.cgi?id=105113#c7">comment #7</a>)
<span class="quote">> (In reply to Maciej S. Szmigiero from <a href="show_bug.cgi?id=105113#c6">comment #6</a>)
> > There are really two issues at play here:
> > 1) If the LLVM-generated code cannot be run properly then it should be simply
> > rejected by whatever is actually in charge of submitting it to the GPU (I
> > guess
> > this would be Mesa?).
> > This way an application will know it cannot use OpenCL for computation, at
> > least
> > not with this compute kernel.
> >
> > Instead, it currently looks like many of these test run but give incorrect
> > results, which is obviously rather bad.
>
> Do you have an example of this? clover should return OUT_OF_RESOURCES error
> when the compute state creation fails (like in the presence of code
> relocations).
> It does not change the content of the buffer, so it will return whatever was
> stored in the buffer on creation.</span >
Aren't program@execute@calls-struct and program@execute@tail-calls tests
from <a href="show_bug.cgi?id=105113#c4">comment 4</a> examples of this behavior?
These seem to run but return wrong results, or am I not parsing the piglit
test results correctly?
<span class="quote">> > 2) Some (previous) Mesa + LLVM versions generate a command stream that
> > crashes the GPU and, as far as I can remember, sometimes even lockup the
> > whole machine.
> >
> > It should not be possible to crash the GPU, regardless how incorrect a
> > command stream that userspace sends to it is - because otherwise it is
> > possible for
> > an unprivileged user with GPU access to DoS the machine.
>
> This is a separate issue. GPU hangs are generally addressed via gpu reset
> which should be enabled for gfx8/9 GPUs in recent amdgpu.ko [0]
>
> [0] <a href="https://patchwork.freedesktop.org/patch/257994/">https://patchwork.freedesktop.org/patch/257994/</a></span >
This would explain why "amdgpu" seemed to not even attempt to reset the GPU
after a crash.
However, I think I've got at least one lockup when testing this issue half a
year ago on "radeon" driver ("amdgpu" is still marked as experimental for SI
parts).
If I am able to reproduce it in the future I will report it then.</pre>
</div>
</p>
<hr>
<span>You are receiving this mail because:</span>
<ul>
<li>You are the assignee for the bug.</li>
</ul>
</body>
</html>