<html>
<head>
<base href="https://bugs.freedesktop.org/">
</head>
<body>
<p>
<div>
<b><a class="bz_bug_link
bz_status_NEW "
title="NEW - [hawaii, radeonsi, clover] Running Piglit cl/program/execute/{,tail-}calls{,-struct,-workitem-id}.cl cause GPU VM error and ring stalled GPU lockup"
href="https://bugs.freedesktop.org/show_bug.cgi?id=105113#c9">Comment # 9</a>
on <a class="bz_bug_link
bz_status_NEW "
title="NEW - [hawaii, radeonsi, clover] Running Piglit cl/program/execute/{,tail-}calls{,-struct,-workitem-id}.cl cause GPU VM error and ring stalled GPU lockup"
href="https://bugs.freedesktop.org/show_bug.cgi?id=105113">bug 105113</a>
from <span class="vcard"><a class="email" href="mailto:jan.vesely@rutgers.edu" title="Jan Vesely <jan.vesely@rutgers.edu>"> <span class="fn">Jan Vesely</span></a>
</span></b>
<pre>(In reply to Maciej S. Szmigiero from <a href="show_bug.cgi?id=105113#c8">comment #8</a>)
<span class="quote">> Aren't program@execute@calls-struct and program@execute@tail-calls tests
> from <a href="show_bug.cgi?id=105113#c4">comment 4</a> examples of this behavior?
> These seem to run but return wrong results, or am I not parsing the piglit
> test results correctly?</span >
This is more of a piglit problem. piglit uses a combination of enqueue and
clFinish. However, the error happens on kernel launch. thus;
1.) clEnqueueNDRangeKernel -- success
2.) The driver tries to launch the kernel and fails on relocations
3.) application(piglit) calls clFinish
depending on the order of 2. and 3. clFinish can either see an empty queue and
succeed or try to wait for kernel execution and fail.
The following series should address that:
<a href="https://patchwork.freedesktop.org/series/52857/">https://patchwork.freedesktop.org/series/52857/</a>
<span class="quote">> This would explain why "amdgpu" seemed to not even attempt to reset the GPU
> after a crash.
>
> However, I think I've got at least one lockup when testing this issue half a
> year ago on "radeon" driver ("amdgpu" is still marked as experimental for SI
> parts).
> If I am able to reproduce it in the future I will report it then.</span >
<a href="show_bug.cgi?id=105113#c1">comment #1</a> shows an example of a successful restart using radeon.ko, so I guess
it worked for at least some ASICs. at any rate, restarting GPU is a separate,
kernel, problem.
Feel free to remove the relocation guard if you want to investigate GPU reset.</pre>
</div>
</p>
<hr>
<span>You are receiving this mail because:</span>
<ul>
<li>You are the assignee for the bug.</li>
</ul>
</body>
</html>