[Bug 108272] [polaris10] opencl-mesa: Anything using OpenCL segfaults, XFX Radeon RX 580

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Mon Dec 17 17:46:12 UTC 2018


https://bugs.freedesktop.org/show_bug.cgi?id=108272

--- Comment #12 from Jan Vesely <jan.vesely at rutgers.edu> ---
Hi,

sorry for the delay. somehow I missed the notifications.
(In reply to jamespharvey20 from comment #11)
> When I originally filed this, I assumed it was 1 bug since I tried 2 things
> with OpenCL, and both failed with opencl-mesa but worked with opencl-amd.
> 
> Jan Vesely was correct that there were two separate problems.
> 
> I'm hoping Jan Vesely can give guidance on whether to leave this bug open
> for any of the reasons below, or if I should close it and potentially open
> up 1-2 new bugs.
> 
> The original luxmark bug (segfault) is solved, but that exposes 2 new
> opencl-mesa bugs when running luxmark.
> 
> The original IndigoBenchmark bug (segfault) isn't solved, but as explained
> below, I understand if we have to consider that unsolvable for now.
> 
> I don't think this affects any of these bugs, but I'll mention a few weeks
> ago, I switched back to my Asus Radeon R9 390.  The same behaviors discussed
> in this entire bug report occur.  (i.e. 18.2.3 and before crash luxmark.) 
> If someone really wants me to do so, I can switch back to the RX 580 to test
> 18.2.4, but I'm betting since it works properly with the R9 390 that the
> problem is fixed.
> 
> ORIGINAL LUXMARK BUG #1
> -----------------------------------------
> 
> Using mesa 18.2.4, the luxmark segfault is solved.

As this was the first bug. I'd close this one and open new bugs for both indigo
and incorrect rendering in luxmark.

> 
> NEW - LUXMARK BUG #2
> ------------------------------------
> 
> Jan Vesely's comment on 2018-10-09 mentions: "bumping MAX_GLOBAL_BUFFERS to
> 32 allows luxmark to run, albeit still with many incorrect pixels -- libclc
> rounding conversions are incorrect."
> 
> That's what I'm seeing out of 18.2.4.  Using LuxBall HDR (Simple Benchmark):
> 
> MESA 18.2.4: 40626 (Image validation OK (65739 different pixels, 10.27%)
> 
> AMDGPU-PRO: 15739 (Image validation OK (5736 different pixels, 0.90%)
> 
> There's no typos there.  opencl-mesa scores almost unbelievably higher than
> opencl-amd, but the different pixels percentage increases by a factor of
> 11.4.
> 
> As Jan's other comment on 2018-10-09 mentions, the image looks garbled and
> the results are incorrect.
> 
> Not sure if this bug should be left open for this issue, or if I should
> create a new bug.  (Or, if there is a bug already open for it.)  Or, if mesa
> will say it's purely libclc's problem, and to go to them about it.

I'd say this is probably a purely libclc problem, but feel free to open the bug
against clover on freedesktop. 10% is rather good I usually saw ~30% wrong
pixels on my machines.

> 
> NEW - LUXMARK BUG #3
> ------------------------------------
> 
> Although luxmark can now benchmark, when doing so, all input becomes
> unusably awful.  It reminds me of when Windows has too many things open,
> suddenly decided it can't cope, and you're waiting to see if it's going to
> recover or crash.  Keystrokes take too long to be printed, and the mouse
> becomes slow and jumpy.  Top shows cpu and memory usage are fine, which was
> my first thought.  BTW, running xf86-video-amdgpu 18.1.0, and when I
> upgraded mesa, it was both mesa and opencl-mesa.
> 
> In comparison, if I use opencl-amd, input is not affected.  I wouldn't even
> know the GPU is being slammed.
> 
> Using the program radeontop, I can see when using mesa, "Graphics pipe",
> "Texture Addresser", and "Shader Interpolator" are between 95-100%, usually
> 98-100%.
> 
> When using opencl-amd, radeontop shows the same.  (Granted, Vertex Grouper +
> Tesselator / Shader Export/Scan Converter/Depth Block/Color Block bounce
> between 5-20% vs on opencl-mesa, they bounce between 1-5%.)

This sounds like GPU priority/scheduling problem. I haven't looked into whether
it can be solved via opening lower priority pipe for compute, or we need to
enable advanced features like CWSR. Please open a separate bug. Hogging a large
portion of the GPU might explain some of that high score.

> 
> INDIGO BUG
> ------------------
> 
> I edited 18.2.4's si_get.c to be very short:
> 
>     snprintf(sscreen->renderer_string, sizeof(sscreen->renderer_string),
>        "%s",
>        chip_name);
> 
> And compiled/installed it, but it didn't affect the crash.
> 
> IndigoBenchmark said they're statically linking with LLVM 3.4, which is
> quite old.  But, it runs fine with opencl-amd, and only crashes on
> opencl-mesa.  I just posted a followup "where do we go from here"-ish
> comment there which has to be moderator approved so isn't showing yet. 
>  https://www.indigorenderer.com/forum/viewtopic.php?f=37&t=14986
> 
> Part of me thinks it needs to be given up on, being a closed-source
> precompiled binary statically linked against LLVM 3.4.
> 
> Part of me thinks since it only crashes with opencl-mesa, and runs perfectly
> fine with opencl-amd, there's probably (but not definitely) a bug in
> opencl-mesa.
> 
> But, I understand since they don't seem to be paying this any attention, we
> may have to give up on the Indigo Bug as being unable to be realistically
> investigated further.

Can you check if indigo exports any LLVM symbols? It might be that we end up
using those instead of the new ones from libLLVM.*
If that's the case one solution would be to link mesa/clover with static LLVM.
Enabling symbol versioning for LLVM should work as well.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/dri-devel/attachments/20181217/296b3c9e/attachment.html>


More information about the dri-devel mailing list