Slow memory access when using OpenCL without X11

Tue Mar 12 21:31:03 UTC 2019

However it's not only related to mclk and sclk. I tried this:
rocm-smi  --setsclk 2
rocm-smi  --setmclk 3
rocm-smi
========================        ROCm System Management Interface
========================
================================================================================================
GPU   Temp   AvgPwr   SCLK    MCLK    PCLK           Fan     Perf
PwrCap   SCLK OD   MCLK OD  GPU%
GPU[0] : WARNING: Empty SysFS value: pclk
GPU[0] : WARNING: Unable to read
/sys/class/drm/card0/device/gpu_busy_percent
0     34.0c  N/A      1240Mhz 1333Mhz N/A            0%      manual  N/A
  0%        0%       N/A
================================================================================================
========================               End of ROCm SMI Log
========================

./cl_slow_test 1
got 1 platforms 1 devices
speed 3919.777100 avg 3919.777100 mbytes/s
speed 3809.373291 avg 3864.575195 mbytes/s
speed 585.796814 avg 2771.649170 mbytes/s
speed 188.721848 avg 2125.917236 mbytes/s
speed 188.916367 avg 1738.517090 mbytes/s

So despite forcing max sclk and mclk the memory speed is still slow..

--
Lauri

On Tue, Mar 12, 2019 at 11:21 PM Lauri Ehrenpreis <laurioma at gmail.com>
wrote:

> IN the case when memory is slow, the rocm-smi outputs this:
> ========================        ROCm System Management Interface
> ========================
>
> ================================================================================================
> GPU   Temp   AvgPwr   SCLK    MCLK    PCLK           Fan     Perf
> PwrCap   SCLK OD   MCLK OD  GPU%
> GPU[0] : WARNING: Empty SysFS value: pclk
> GPU[0] : WARNING: Unable to read
> /sys/class/drm/card0/device/gpu_busy_percent
> 0     30.0c  N/A      400Mhz  933Mhz  N/A            0%      auto    N/A
>     0%        0%       N/A
>
> ================================================================================================
> ========================               End of ROCm SMI Log
> ========================
>
> normal memory speed case gives following:
> ========================        ROCm System Management Interface
> ========================
>
> ================================================================================================
> GPU   Temp   AvgPwr   SCLK    MCLK    PCLK           Fan     Perf
> PwrCap   SCLK OD   MCLK OD  GPU%
> GPU[0] : WARNING: Empty SysFS value: pclk
> GPU[0] : WARNING: Unable to read
> /sys/class/drm/card0/device/gpu_busy_percent
> 0     35.0c  N/A      400Mhz  1200Mhz N/A            0%      auto    N/A
>     0%        0%       N/A
>
> ================================================================================================
> ========================               End of ROCm SMI Log
> ========================
>
> So there is a difference in MCLK - can this cause such a huge slowdown?
>
> --
> Lauri
>
> On Tue, Mar 12, 2019 at 6:39 PM Kuehling, Felix <Felix.Kuehling at amd.com>
> wrote:
>
>> [adding the list back]
>>
>> I'd suspect a problem related to memory clock. This is an APU where
>> system memory is shared with the CPU, so if the SMU changes memory
>> clocks that would affect CPU memory access performance. If the problem
>> only occurs when OpenCL is running, then the compute power profile could
>> have an effect here.
>>
>> Laurie, can you monitor the clocks during your tests using rocm-smi?
>>
>> Regards,
>>    Felix
>>
>> On 2019-03-11 1:15 p.m., Tom St Denis wrote:
>> > Hi Lauri,
>> >
>> > I don't have ROCm installed locally (not on that team at AMD) but I
>> > can rope in some of the KFD folk and see what they say :-).
>> >
>> > (in the mean time I should look into installing the ROCm stack on my
>> > Ubuntu disk for experimentation...).
>> >
>> > Only other thing that comes to mind is some sort of stutter due to
>> > power/clock gating (or gfx off/etc).  But that typically affects the
>> > display/gpu side not the CPU side.
>> >
>> > Felix:  Any known issues with Raven and ROCm interacting over memory
>> > bus performance?
>> >
>> > Tom
>> >
>> > On Mon, Mar 11, 2019 at 12:56 PM Lauri Ehrenpreis <laurioma at gmail.com
>> > <mailto:laurioma at gmail.com>> wrote:
>> >
>> >     Hi!
>> >
>> >     The 100x memory slowdown is hard to belive indeed. I attached the
>> >     test program with my first e-mail which depends only on
>> >     rocm-opencl-dev package. Would you mind compiling it and checking
>> >     if it slows down memory for you as well?
>> >
>> >     steps:
>> >     1) g++ cl_slow_test.cpp -o cl_slow_test -I
>> >     /opt/rocm/opencl/include/ -L /opt/rocm/opencl/lib/x86_64/  -lOpenCL
>> >     2) logout from desktop env and disconnect hdmi/diplayport etc
>> >     3) log in over ssh
>> >     4) run the program ./cl_slow_test 1
>> >
>> >     For me it reproduced even without step 2 as well but less
>> >     reliably. moving mouse for example could make the memory speed
>> >     fast again.
>> >
>> >     --
>> >     Lauri
>> >
>> >
>> >
>> >     On Mon, Mar 11, 2019 at 6:33 PM Tom St Denis <tstdenis82 at gmail.com
>> >     <mailto:tstdenis82 at gmail.com>> wrote:
>> >
>> >         Hi Lauri,
>> >
>> >         There's really no connection between the two other than they
>> >         run in the same package.  I too run a 2400G (as my
>> >         workstation) and I got the same ~6.6GB/sec transfer rate but
>> >         without a CL app running ...  The only logical reason is your
>> >         CL app is bottlenecking the APUs memory bus but you claim
>> >         "simply opening a context is enough" so something else is
>> >         going on.
>> >
>> >         Your last reply though says "with it running in the
>> >         background" so it's entirely possible the CPU isn't busy but
>> >         the package memory controller (shared between both the CPU and
>> >         GPU) is busy.  For instance running xonotic in a 1080p window
>> >         on my 4K display reduced the memory test to 5.8GB/sec and
>> >         that's hardly a heavy memory bound GPU app.
>> >
>> >         The only other possible connection is the GPU is generating so
>> >         much heat that it's throttling the package which is also
>> >         unlikely if you have a proper HSF attached (I use the ones
>> >         that came in the retail boxes).
>> >
>> >         Cheers,
>> >         Tom
>> >
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20190312/f87588c8/attachment-0001.html>