<div dir="ltr"><div dir="ltr"><div dir="ltr">IN the case when memory is slow, the rocm-smi outputs this:<div>======================== ROCm System Management Interface ========================</div><div>================================================================================================</div><div>GPU Temp AvgPwr SCLK MCLK PCLK Fan Perf PwrCap SCLK OD MCLK OD GPU%</div><div>GPU[0] <span style="white-space:pre"> </span>: WARNING: Empty SysFS value: pclk</div><div>GPU[0] <span style="white-space:pre"> </span>: WARNING: Unable to read /sys/class/drm/card0/device/gpu_busy_percent</div><div>0 30.0c N/A 400Mhz 933Mhz N/A 0% auto N/A 0% 0% N/A </div><div>================================================================================================</div><div>======================== End of ROCm SMI Log ========================</div><div><br></div><div>normal memory speed case gives following:</div><div>======================== ROCm System Management Interface ========================</div><div>================================================================================================</div><div>GPU Temp AvgPwr SCLK MCLK PCLK Fan Perf PwrCap SCLK OD MCLK OD GPU%</div><div>GPU[0] <span style="white-space:pre"> </span>: WARNING: Empty SysFS value: pclk</div><div>GPU[0] <span style="white-space:pre"> </span>: WARNING: Unable to read /sys/class/drm/card0/device/gpu_busy_percent</div><div>0 35.0c N/A 400Mhz 1200Mhz N/A 0% auto N/A 0% 0% N/A </div><div>================================================================================================</div><div>======================== End of ROCm SMI Log ========================</div><div><br></div><div>So there is a difference in MCLK - can this cause such a huge slowdown?</div><div><br></div><div>--</div><div>Lauri <br></div></div></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, Mar 12, 2019 at 6:39 PM Kuehling, Felix <<a href="mailto:Felix.Kuehling@amd.com">Felix.Kuehling@amd.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">[adding the list back]<br>
<br>
I'd suspect a problem related to memory clock. This is an APU where <br>
system memory is shared with the CPU, so if the SMU changes memory <br>
clocks that would affect CPU memory access performance. If the problem <br>
only occurs when OpenCL is running, then the compute power profile could <br>
have an effect here.<br>
<br>
Laurie, can you monitor the clocks during your tests using rocm-smi?<br>
<br>
Regards,<br>
Felix<br>
<br>
On 2019-03-11 1:15 p.m., Tom St Denis wrote:<br>
> Hi Lauri,<br>
><br>
> I don't have ROCm installed locally (not on that team at AMD) but I <br>
> can rope in some of the KFD folk and see what they say :-).<br>
><br>
> (in the mean time I should look into installing the ROCm stack on my <br>
> Ubuntu disk for experimentation...).<br>
><br>
> Only other thing that comes to mind is some sort of stutter due to <br>
> power/clock gating (or gfx off/etc). But that typically affects the <br>
> display/gpu side not the CPU side.<br>
><br>
> Felix: Any known issues with Raven and ROCm interacting over memory <br>
> bus performance?<br>
><br>
> Tom<br>
><br>
> On Mon, Mar 11, 2019 at 12:56 PM Lauri Ehrenpreis <<a href="mailto:laurioma@gmail.com" target="_blank">laurioma@gmail.com</a> <br>
> <mailto:<a href="mailto:laurioma@gmail.com" target="_blank">laurioma@gmail.com</a>>> wrote:<br>
><br>
> Hi!<br>
><br>
> The 100x memory slowdown is hard to belive indeed. I attached the<br>
> test program with my first e-mail which depends only on<br>
> rocm-opencl-dev package. Would you mind compiling it and checking<br>
> if it slows down memory for you as well?<br>
><br>
> steps:<br>
> 1) g++ cl_slow_test.cpp -o cl_slow_test -I<br>
> /opt/rocm/opencl/include/ -L /opt/rocm/opencl/lib/x86_64/ -lOpenCL<br>
> 2) logout from desktop env and disconnect hdmi/diplayport etc<br>
> 3) log in over ssh<br>
> 4) run the program ./cl_slow_test 1<br>
><br>
> For me it reproduced even without step 2 as well but less<br>
> reliably. moving mouse for example could make the memory speed<br>
> fast again.<br>
><br>
> --<br>
> Lauri<br>
><br>
><br>
><br>
> On Mon, Mar 11, 2019 at 6:33 PM Tom St Denis <<a href="mailto:tstdenis82@gmail.com" target="_blank">tstdenis82@gmail.com</a><br>
> <mailto:<a href="mailto:tstdenis82@gmail.com" target="_blank">tstdenis82@gmail.com</a>>> wrote:<br>
><br>
> Hi Lauri,<br>
><br>
> There's really no connection between the two other than they<br>
> run in the same package. I too run a 2400G (as my<br>
> workstation) and I got the same ~6.6GB/sec transfer rate but<br>
> without a CL app running ... The only logical reason is your<br>
> CL app is bottlenecking the APUs memory bus but you claim<br>
> "simply opening a context is enough" so something else is<br>
> going on.<br>
><br>
> Your last reply though says "with it running in the<br>
> background" so it's entirely possible the CPU isn't busy but<br>
> the package memory controller (shared between both the CPU and<br>
> GPU) is busy. For instance running xonotic in a 1080p window<br>
> on my 4K display reduced the memory test to 5.8GB/sec and<br>
> that's hardly a heavy memory bound GPU app.<br>
><br>
> The only other possible connection is the GPU is generating so<br>
> much heat that it's throttling the package which is also<br>
> unlikely if you have a proper HSF attached (I use the ones<br>
> that came in the retail boxes).<br>
><br>
> Cheers,<br>
> Tom<br>
><br>
</blockquote></div>