<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<p>Hi Lauri,</p>
<p>Thanks for your persistence. Seeing that this is reproducible on several boards with up-to-date BIOS is really helpful and gives me some confidence that it's more than a weird vendor or board-specific corner case and that we should be able to reproduce it.
Yong is going to start looking into this problem.</p>
<p>Regards,<br>
Felix<br>
</p>
<div class="moz-cite-prefix">On 3/14/2019 12:41 PM, Lauri Ehrenpreis wrote:<br>
</div>
<blockquote type="cite" cite="mid:CAGyaPbCvOijB=pWe3zkMze3aT+Yr86aQYK2OP=P8qpOdCzaORg@mail.gmail.com">
<div dir="ltr">Yes it affects this a bit but it doesn't get the speed up to "normal" level. I got best results with "profile_peak" - then the memcpy speed on CPU is 1/3 of what it is without opencl initialization:
<div><br>
</div>
<div>
<div> echo "profile_peak" > /sys/class/drm/card0/device/power_dpm_force_performance_level<br>
</div>
<div><span class="gmail-im" style="color:rgb(80,0,80)">
<div>./cl_slow_test 1 5</div>
<div>got 1 platforms 1 devices</div>
</span>
<div>speed 3710.360352 avg 3710.360352 mbytes/s</div>
<div>speed 3713.660400 avg 3712.010254 mbytes/s</div>
<div>speed 3797.630859 avg 3740.550537 mbytes/s</div>
<div>speed 3708.004883 avg 3732.414062 mbytes/s</div>
<div>speed 3796.403076 avg 3745.211914 mbytes/s</div>
</div>
<div><br>
</div>
<div>Without calling clCreateContext:</div>
<div>
<div>./cl_slow_test 0 5</div>
<div>speed 7299.201660 avg 7299.201660 mbytes/s</div>
<div>speed 9298.841797 avg 8299.021484 mbytes/s</div>
<div>speed 9360.181641 avg 8652.742188 mbytes/s</div>
<div>speed 9004.759766 avg 8740.746094 mbytes/s</div>
<div>speed 9414.607422 avg 8875.518555 mbytes/s</div>
</div>
</div>
<div><br>
</div>
<div>--</div>
<div>Lauri</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Thu, Mar 14, 2019 at 5:46 PM Ernst Sjöstrand <<a href="mailto:ernstp@gmail.com" moz-do-not-send="true">ernstp@gmail.com</a>> wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
Does<br>
echo high > /sys/class/drm/card0/device/power_dpm_force_performance_level<br>
or setting cpu scaling governor to performance affect it at all?<br>
<br>
Regards<br>
//Ernst<br>
<br>
Den tors 14 mars 2019 kl 14:31 skrev Lauri Ehrenpreis <<a href="mailto:laurioma@gmail.com" target="_blank" moz-do-not-send="true">laurioma@gmail.com</a>>:<br>
><br>
> I tried also with those 2 boards now:<br>
> <a href="https://www.asrock.com/MB/AMD/Fatal1ty%20B450%20Gaming-ITXac/index.asp" rel="noreferrer" target="_blank" moz-do-not-send="true">
https://www.asrock.com/MB/AMD/Fatal1ty%20B450%20Gaming-ITXac/index.asp</a><br>
> <a href="https://www.msi.com/Motherboard/B450I-GAMING-PLUS-AC" rel="noreferrer" target="_blank" moz-do-not-send="true">
https://www.msi.com/Motherboard/B450I-GAMING-PLUS-AC</a><br>
><br>
> Both are using latest BIOS, ubuntu 18.10, kernel <a href="https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.0.2/" rel="noreferrer" target="_blank" moz-do-not-send="true">
https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.0.2/</a><br>
><br>
> There are some differences in dmesg (asrock has some amdgpu assert in dmesg) but otherwise results are exactly the same.<br>
> In desktop env cl_slow_test works fast, over ssh terminal it doesn't. If i move mouse then it starts working fast in terminal as well.<br>
><br>
> So one can't use OpenCL without monitor and desktop env running and this happens with 2 different chipsets (b350 & b450), latest bios from 3 different vendors, latest kernel and latest rocm. This doesn't look like edge case with unusual setup to me..<br>
><br>
> Attached dmesg, dmidecode, and clinfo from both boards.<br>
><br>
> --<br>
> Lauri<br>
><br>
> On Wed, Mar 13, 2019 at 10:15 PM Lauri Ehrenpreis <<a href="mailto:laurioma@gmail.com" target="_blank" moz-do-not-send="true">laurioma@gmail.com</a>> wrote:<br>
>><br>
>> For reproduction only the tiny cl_slow_test.cpp is needed which is attached to first e-mail.<br>
>><br>
>> System information is following:<br>
>> CPU: Ryzen5 2400G<br>
>> Main board: Gigabyte AMD B450 AORUS mini itx: <a href="https://www.gigabyte.com/Motherboard/B450-I-AORUS-PRO-WIFI-rev-10#kf" rel="noreferrer" target="_blank" moz-do-not-send="true">
https://www.gigabyte.com/Motherboard/B450-I-AORUS-PRO-WIFI-rev-10#kf</a><br>
>> BIOS: F5 8.47 MB 2019/01/25 (latest)<br>
>> Kernel: <a href="https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.0/" rel="noreferrer" target="_blank" moz-do-not-send="true">
https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.0/</a> (amd64)<br>
>> OS: Ubuntu 18.04 LTS<br>
>> rocm-opencl-dev installation:<br>
>> wget -qO - <a href="http://repo.radeon.com/rocm/apt/debian/rocm.gpg.key" rel="noreferrer" target="_blank" moz-do-not-send="true">
http://repo.radeon.com/rocm/apt/debian/rocm.gpg.key</a> | sudo apt-key add -<br>
>> echo 'deb [arch=amd64] <a href="http://repo.radeon.com/rocm/apt/debian/" rel="noreferrer" target="_blank" moz-do-not-send="true">
http://repo.radeon.com/rocm/apt/debian/</a> xenial main' | sudo tee /etc/apt/sources.list.d/rocm.list<br>
>> sudo apt install rocm-opencl-dev<br>
>><br>
>> Also exactly the same issue happens with this board: <a href="https://www.gigabyte.com/Motherboard/GA-AB350-Gaming-3-rev-1x#kf" rel="noreferrer" target="_blank" moz-do-not-send="true">
https://www.gigabyte.com/Motherboard/GA-AB350-Gaming-3-rev-1x#kf</a><br>
>><br>
>> I have MSI and Asrock mini itx boards ready as well, So far didn't get amdgpu & opencl working there but I'll try again tomorrow..<br>
>><br>
>> --<br>
>> Lauri<br>
>><br>
>><br>
>> On Wed, Mar 13, 2019 at 8:51 PM Kuehling, Felix <<a href="mailto:Felix.Kuehling@amd.com" target="_blank" moz-do-not-send="true">Felix.Kuehling@amd.com</a>> wrote:<br>
>>><br>
>>> Hi Lauri,<br>
>>><br>
>>> I still think the SMU is doing something funny, but rocm-smi isn't<br>
>>> showing enough information to really see what's going on.<br>
>>><br>
>>> On APUs the SMU firmware is embedded in the system BIOS. Unlike discrete<br>
>>> GPUs, the SMU firmware is not loaded by the driver. You could try<br>
>>> updating your system BIOS to the latest version available from your main<br>
>>> board vendor and see if that makes a difference. It may include a newer<br>
>>> version of the SMU firmware, potentially with a fix.<br>
>>><br>
>>> If that doesn't help, we'd have to reproduce the problem in house to see<br>
>>> what's happening, which may require the same main board and BIOS version<br>
>>> you're using. We can ask our SMU firmware team if they've ever<br>
>>> encountered your type of problem. But I don't want to give you too much<br>
>>> hope. It's a tricky problem involving HW, firmware and multiple driver<br>
>>> components in a fairly unusual configuration.<br>
>>><br>
>>> Regards,<br>
>>> Felix<br>
>>><br>
>>> On 2019-03-13 7:28 a.m., Lauri Ehrenpreis wrote:<br>
>>> > What I observe is that moving the mouse made the memory speed go up<br>
>>> > and also it made mclk=1200Mhz in rocm-smi output.<br>
>>> > However if I force mclk to 1200Mhz myself then memory speed is still<br>
>>> > slow.<br>
>>> ><br>
>>> > So rocm-smi output when memory speed went fast due to mouse movement:<br>
>>> > rocm-smi<br>
>>> > ======================== ROCm System Management Interface<br>
>>> > ========================<br>
>>> > ================================================================================================<br>
>>> > GPU Temp AvgPwr SCLK MCLK PCLK Fan Perf<br>
>>> > PwrCap SCLK OD MCLK OD GPU%<br>
>>> > GPU[0] : WARNING: Empty SysFS value: pclk<br>
>>> > GPU[0] : WARNING: Unable to read<br>
>>> > /sys/class/drm/card0/device/gpu_busy_percent<br>
>>> > 0 44.0c N/A 400Mhz 1200Mhz N/A 0% manual N/A<br>
>>> > 0% 0% N/A<br>
>>> > ================================================================================================<br>
>>> > ======================== End of ROCm SMI Log<br>
>>> > ========================<br>
>>> ><br>
>>> > And rocm-smi output when I forced memclk=1200MHz myself:<br>
>>> > rocm-smi --setmclk 2<br>
>>> > rocm-smi<br>
>>> > ======================== ROCm System Management Interface<br>
>>> > ========================<br>
>>> > ================================================================================================<br>
>>> > GPU Temp AvgPwr SCLK MCLK PCLK Fan Perf<br>
>>> > PwrCap SCLK OD MCLK OD GPU%<br>
>>> > GPU[0] : WARNING: Empty SysFS value: pclk<br>
>>> > GPU[0] : WARNING: Unable to read<br>
>>> > /sys/class/drm/card0/device/gpu_busy_percent<br>
>>> > 0 39.0c N/A 400Mhz 1200Mhz N/A 0% manual N/A<br>
>>> > 0% 0% N/A<br>
>>> > ================================================================================================<br>
>>> > ======================== End of ROCm SMI Log<br>
>>> > ========================<br>
>>> ><br>
>>> > So only difference is that temperature shows 44c when memory speed was<br>
>>> > fast and 39c when it was slow. But mclk was 1200MHz and sclk was<br>
>>> > 400MHz in both cases.<br>
>>> > Can it be that rocm-smi just has a bug in reporting and mclk was not<br>
>>> > actually 1200MHz when I forced it with rocm-smi --setmclk 2 ?<br>
>>> > That would explain the different behaviour..<br>
>>> ><br>
>>> > If so then is there a programmatic way how to really guarantee the<br>
>>> > high speed mclk? Basically I want do something similar in my program<br>
>>> > what happens if I move<br>
>>> > the mouse in desktop env and this way guarantee the normal memory<br>
>>> > speed each time the program starts.<br>
>>> ><br>
>>> > --<br>
>>> > Lauri<br>
>>> ><br>
>>> ><br>
>>> > On Tue, Mar 12, 2019 at 11:36 PM Deucher, Alexander<br>
>>> > <<a href="mailto:Alexander.Deucher@amd.com" target="_blank" moz-do-not-send="true">Alexander.Deucher@amd.com</a> <mailto:<a href="mailto:Alexander.Deucher@amd.com" target="_blank" moz-do-not-send="true">Alexander.Deucher@amd.com</a>>> wrote:<br>
>>> ><br>
>>> > Forcing the sclk and mclk high may impact the CPU frequency since<br>
>>> > they share TDP.<br>
>>> ><br>
>>> > Alex<br>
>>> > ------------------------------------------------------------------------<br>
>>> > *From:* amd-gfx <<a href="mailto:amd-gfx-bounces@lists.freedesktop.org" target="_blank" moz-do-not-send="true">amd-gfx-bounces@lists.freedesktop.org</a><br>
>>> > <mailto:<a href="mailto:amd-gfx-bounces@lists.freedesktop.org" target="_blank" moz-do-not-send="true">amd-gfx-bounces@lists.freedesktop.org</a>>> on behalf of Lauri<br>
>>> > Ehrenpreis <<a href="mailto:laurioma@gmail.com" target="_blank" moz-do-not-send="true">laurioma@gmail.com</a> <mailto:<a href="mailto:laurioma@gmail.com" target="_blank" moz-do-not-send="true">laurioma@gmail.com</a>>><br>
>>> > *Sent:* Tuesday, March 12, 2019 5:31 PM<br>
>>> > *To:* Kuehling, Felix<br>
>>> > *Cc:* Tom St Denis; <a href="mailto:amd-gfx@lists.freedesktop.org" target="_blank" moz-do-not-send="true">
amd-gfx@lists.freedesktop.org</a><br>
>>> > <mailto:<a href="mailto:amd-gfx@lists.freedesktop.org" target="_blank" moz-do-not-send="true">amd-gfx@lists.freedesktop.org</a>><br>
>>> > *Subject:* Re: Slow memory access when using OpenCL without X11<br>
>>> > However it's not only related to mclk and sclk. I tried this:<br>
>>> > rocm-smi --setsclk 2<br>
>>> > rocm-smi --setmclk 3<br>
>>> > rocm-smi<br>
>>> > ======================== ROCm System Management Interface<br>
>>> > ========================<br>
>>> > ================================================================================================<br>
>>> > GPU Temp AvgPwr SCLK MCLK PCLK Fan Perf<br>
>>> > PwrCap SCLK OD MCLK OD GPU%<br>
>>> > GPU[0] : WARNING: Empty SysFS value: pclk<br>
>>> > GPU[0] : WARNING: Unable to read<br>
>>> > /sys/class/drm/card0/device/gpu_busy_percent<br>
>>> > 0 34.0c N/A 1240Mhz 1333Mhz N/A 0%<br>
>>> > manual N/A 0% 0% N/A<br>
>>> > ================================================================================================<br>
>>> > ======================== End of ROCm SMI Log<br>
>>> > ========================<br>
>>> ><br>
>>> > ./cl_slow_test 1<br>
>>> > got 1 platforms 1 devices<br>
>>> > speed 3919.777100 avg 3919.777100 mbytes/s<br>
>>> > speed 3809.373291 avg 3864.575195 mbytes/s<br>
>>> > speed 585.796814 avg 2771.649170 mbytes/s<br>
>>> > speed 188.721848 avg 2125.917236 mbytes/s<br>
>>> > speed 188.916367 avg 1738.517090 mbytes/s<br>
>>> ><br>
>>> > So despite forcing max sclk and mclk the memory speed is still slow..<br>
>>> ><br>
>>> > --<br>
>>> > Lauri<br>
>>> ><br>
>>> ><br>
>>> > On Tue, Mar 12, 2019 at 11:21 PM Lauri Ehrenpreis<br>
>>> > <<a href="mailto:laurioma@gmail.com" target="_blank" moz-do-not-send="true">laurioma@gmail.com</a> <mailto:<a href="mailto:laurioma@gmail.com" target="_blank" moz-do-not-send="true">laurioma@gmail.com</a>>> wrote:<br>
>>> ><br>
>>> > IN the case when memory is slow, the rocm-smi outputs this:<br>
>>> > ======================== ROCm System Management<br>
>>> > Interface ========================<br>
>>> > ================================================================================================<br>
>>> > GPU Temp AvgPwr SCLK MCLK PCLK Fan<br>
>>> > Perf PwrCap SCLK OD MCLK OD GPU%<br>
>>> > GPU[0] : WARNING: Empty SysFS value: pclk<br>
>>> > GPU[0] : WARNING: Unable to read<br>
>>> > /sys/class/drm/card0/device/gpu_busy_percent<br>
>>> > 0 30.0c N/A 400Mhz 933Mhz N/A 0%<br>
>>> > auto N/A 0% 0% N/A<br>
>>> > ================================================================================================<br>
>>> > ======================== End of ROCm SMI Log<br>
>>> > ========================<br>
>>> ><br>
>>> > normal memory speed case gives following:<br>
>>> > ======================== ROCm System Management<br>
>>> > Interface ========================<br>
>>> > ================================================================================================<br>
>>> > GPU Temp AvgPwr SCLK MCLK PCLK Fan<br>
>>> > Perf PwrCap SCLK OD MCLK OD GPU%<br>
>>> > GPU[0] : WARNING: Empty SysFS value: pclk<br>
>>> > GPU[0] : WARNING: Unable to read<br>
>>> > /sys/class/drm/card0/device/gpu_busy_percent<br>
>>> > 0 35.0c N/A 400Mhz 1200Mhz N/A 0%<br>
>>> > auto N/A 0% 0% N/A<br>
>>> > ================================================================================================<br>
>>> > ======================== End of ROCm SMI Log<br>
>>> > ========================<br>
>>> ><br>
>>> > So there is a difference in MCLK - can this cause such a huge<br>
>>> > slowdown?<br>
>>> ><br>
>>> > --<br>
>>> > Lauri<br>
>>> ><br>
>>> > On Tue, Mar 12, 2019 at 6:39 PM Kuehling, Felix<br>
>>> > <<a href="mailto:Felix.Kuehling@amd.com" target="_blank" moz-do-not-send="true">Felix.Kuehling@amd.com</a> <mailto:<a href="mailto:Felix.Kuehling@amd.com" target="_blank" moz-do-not-send="true">Felix.Kuehling@amd.com</a>>> wrote:<br>
>>> ><br>
>>> > [adding the list back]<br>
>>> ><br>
>>> > I'd suspect a problem related to memory clock. This is an<br>
>>> > APU where<br>
>>> > system memory is shared with the CPU, so if the SMU<br>
>>> > changes memory<br>
>>> > clocks that would affect CPU memory access performance. If<br>
>>> > the problem<br>
>>> > only occurs when OpenCL is running, then the compute power<br>
>>> > profile could<br>
>>> > have an effect here.<br>
>>> ><br>
>>> > Laurie, can you monitor the clocks during your tests using<br>
>>> > rocm-smi?<br>
>>> ><br>
>>> > Regards,<br>
>>> > Felix<br>
>>> ><br>
>>> > On 2019-03-11 1:15 p.m., Tom St Denis wrote:<br>
>>> > > Hi Lauri,<br>
>>> > ><br>
>>> > > I don't have ROCm installed locally (not on that team at<br>
>>> > AMD) but I<br>
>>> > > can rope in some of the KFD folk and see what they say :-).<br>
>>> > ><br>
>>> > > (in the mean time I should look into installing the ROCm<br>
>>> > stack on my<br>
>>> > > Ubuntu disk for experimentation...).<br>
>>> > ><br>
>>> > > Only other thing that comes to mind is some sort of<br>
>>> > stutter due to<br>
>>> > > power/clock gating (or gfx off/etc). But that typically<br>
>>> > affects the<br>
>>> > > display/gpu side not the CPU side.<br>
>>> > ><br>
>>> > > Felix: Any known issues with Raven and ROCm interacting<br>
>>> > over memory<br>
>>> > > bus performance?<br>
>>> > ><br>
>>> > > Tom<br>
>>> > ><br>
>>> > > On Mon, Mar 11, 2019 at 12:56 PM Lauri Ehrenpreis<br>
>>> > <<a href="mailto:laurioma@gmail.com" target="_blank" moz-do-not-send="true">laurioma@gmail.com</a> <mailto:<a href="mailto:laurioma@gmail.com" target="_blank" moz-do-not-send="true">laurioma@gmail.com</a>><br>
>>> > > <mailto:<a href="mailto:laurioma@gmail.com" target="_blank" moz-do-not-send="true">laurioma@gmail.com</a> <mailto:<a href="mailto:laurioma@gmail.com" target="_blank" moz-do-not-send="true">laurioma@gmail.com</a>>>><br>
>>> > wrote:<br>
>>> > ><br>
>>> > > Hi!<br>
>>> > ><br>
>>> > > The 100x memory slowdown is hard to belive indeed. I<br>
>>> > attached the<br>
>>> > > test program with my first e-mail which depends only on<br>
>>> > > rocm-opencl-dev package. Would you mind compiling it<br>
>>> > and checking<br>
>>> > > if it slows down memory for you as well?<br>
>>> > ><br>
>>> > > steps:<br>
>>> > > 1) g++ cl_slow_test.cpp -o cl_slow_test -I<br>
>>> > > /opt/rocm/opencl/include/ -L<br>
>>> > /opt/rocm/opencl/lib/x86_64/ -lOpenCL<br>
>>> > > 2) logout from desktop env and disconnect<br>
>>> > hdmi/diplayport etc<br>
>>> > > 3) log in over ssh<br>
>>> > > 4) run the program ./cl_slow_test 1<br>
>>> > ><br>
>>> > > For me it reproduced even without step 2 as well but<br>
>>> > less<br>
>>> > > reliably. moving mouse for example could make the<br>
>>> > memory speed<br>
>>> > > fast again.<br>
>>> > ><br>
>>> > > --<br>
>>> > > Lauri<br>
>>> > ><br>
>>> > ><br>
>>> > ><br>
>>> > > On Mon, Mar 11, 2019 at 6:33 PM Tom St Denis<br>
>>> > <<a href="mailto:tstdenis82@gmail.com" target="_blank" moz-do-not-send="true">tstdenis82@gmail.com</a> <mailto:<a href="mailto:tstdenis82@gmail.com" target="_blank" moz-do-not-send="true">tstdenis82@gmail.com</a>><br>
>>> > > <mailto:<a href="mailto:tstdenis82@gmail.com" target="_blank" moz-do-not-send="true">tstdenis82@gmail.com</a><br>
>>> > <mailto:<a href="mailto:tstdenis82@gmail.com" target="_blank" moz-do-not-send="true">tstdenis82@gmail.com</a>>>> wrote:<br>
>>> > ><br>
>>> > > Hi Lauri,<br>
>>> > ><br>
>>> > > There's really no connection between the two<br>
>>> > other than they<br>
>>> > > run in the same package. I too run a 2400G (as my<br>
>>> > > workstation) and I got the same ~6.6GB/sec<br>
>>> > transfer rate but<br>
>>> > > without a CL app running ... The only logical<br>
>>> > reason is your<br>
>>> > > CL app is bottlenecking the APUs memory bus but<br>
>>> > you claim<br>
>>> > > "simply opening a context is enough" so<br>
>>> > something else is<br>
>>> > > going on.<br>
>>> > ><br>
>>> > > Your last reply though says "with it running in the<br>
>>> > > background" so it's entirely possible the CPU<br>
>>> > isn't busy but<br>
>>> > > the package memory controller (shared between<br>
>>> > both the CPU and<br>
>>> > > GPU) is busy. For instance running xonotic in a<br>
>>> > 1080p window<br>
>>> > > on my 4K display reduced the memory test to<br>
>>> > 5.8GB/sec and<br>
>>> > > that's hardly a heavy memory bound GPU app.<br>
>>> > ><br>
>>> > > The only other possible connection is the GPU is<br>
>>> > generating so<br>
>>> > > much heat that it's throttling the package which<br>
>>> > is also<br>
>>> > > unlikely if you have a proper HSF attached (I<br>
>>> > use the ones<br>
>>> > > that came in the retail boxes).<br>
>>> > ><br>
>>> > > Cheers,<br>
>>> > > Tom<br>
>>> > ><br>
>>> ><br>
><br>
> _______________________________________________<br>
> amd-gfx mailing list<br>
> <a href="mailto:amd-gfx@lists.freedesktop.org" target="_blank" moz-do-not-send="true">
amd-gfx@lists.freedesktop.org</a><br>
> <a href="https://lists.freedesktop.org/mailman/listinfo/amd-gfx" rel="noreferrer" target="_blank" moz-do-not-send="true">
https://lists.freedesktop.org/mailman/listinfo/amd-gfx</a><br>
</blockquote>
</div>
</blockquote>
</body>
</html>