Slow memory access when using OpenCL without X11

Lauri Ehrenpreis laurioma at gmail.com
Tue Mar 12 21:21:21 UTC 2019


IN the case when memory is slow, the rocm-smi outputs this:
========================        ROCm System Management Interface
========================
================================================================================================
GPU   Temp   AvgPwr   SCLK    MCLK    PCLK           Fan     Perf
PwrCap   SCLK OD   MCLK OD  GPU%
GPU[0] : WARNING: Empty SysFS value: pclk
GPU[0] : WARNING: Unable to read
/sys/class/drm/card0/device/gpu_busy_percent
0     30.0c  N/A      400Mhz  933Mhz  N/A            0%      auto    N/A
  0%        0%       N/A
================================================================================================
========================               End of ROCm SMI Log
========================

normal memory speed case gives following:
========================        ROCm System Management Interface
========================
================================================================================================
GPU   Temp   AvgPwr   SCLK    MCLK    PCLK           Fan     Perf
PwrCap   SCLK OD   MCLK OD  GPU%
GPU[0] : WARNING: Empty SysFS value: pclk
GPU[0] : WARNING: Unable to read
/sys/class/drm/card0/device/gpu_busy_percent
0     35.0c  N/A      400Mhz  1200Mhz N/A            0%      auto    N/A
  0%        0%       N/A
================================================================================================
========================               End of ROCm SMI Log
========================

So there is a difference in MCLK - can this cause such a huge slowdown?

--
Lauri

On Tue, Mar 12, 2019 at 6:39 PM Kuehling, Felix <Felix.Kuehling at amd.com>
wrote:

> [adding the list back]
>
> I'd suspect a problem related to memory clock. This is an APU where
> system memory is shared with the CPU, so if the SMU changes memory
> clocks that would affect CPU memory access performance. If the problem
> only occurs when OpenCL is running, then the compute power profile could
> have an effect here.
>
> Laurie, can you monitor the clocks during your tests using rocm-smi?
>
> Regards,
>    Felix
>
> On 2019-03-11 1:15 p.m., Tom St Denis wrote:
> > Hi Lauri,
> >
> > I don't have ROCm installed locally (not on that team at AMD) but I
> > can rope in some of the KFD folk and see what they say :-).
> >
> > (in the mean time I should look into installing the ROCm stack on my
> > Ubuntu disk for experimentation...).
> >
> > Only other thing that comes to mind is some sort of stutter due to
> > power/clock gating (or gfx off/etc).  But that typically affects the
> > display/gpu side not the CPU side.
> >
> > Felix:  Any known issues with Raven and ROCm interacting over memory
> > bus performance?
> >
> > Tom
> >
> > On Mon, Mar 11, 2019 at 12:56 PM Lauri Ehrenpreis <laurioma at gmail.com
> > <mailto:laurioma at gmail.com>> wrote:
> >
> >     Hi!
> >
> >     The 100x memory slowdown is hard to belive indeed. I attached the
> >     test program with my first e-mail which depends only on
> >     rocm-opencl-dev package. Would you mind compiling it and checking
> >     if it slows down memory for you as well?
> >
> >     steps:
> >     1) g++ cl_slow_test.cpp -o cl_slow_test -I
> >     /opt/rocm/opencl/include/ -L /opt/rocm/opencl/lib/x86_64/  -lOpenCL
> >     2) logout from desktop env and disconnect hdmi/diplayport etc
> >     3) log in over ssh
> >     4) run the program ./cl_slow_test 1
> >
> >     For me it reproduced even without step 2 as well but less
> >     reliably. moving mouse for example could make the memory speed
> >     fast again.
> >
> >     --
> >     Lauri
> >
> >
> >
> >     On Mon, Mar 11, 2019 at 6:33 PM Tom St Denis <tstdenis82 at gmail.com
> >     <mailto:tstdenis82 at gmail.com>> wrote:
> >
> >         Hi Lauri,
> >
> >         There's really no connection between the two other than they
> >         run in the same package.  I too run a 2400G (as my
> >         workstation) and I got the same ~6.6GB/sec transfer rate but
> >         without a CL app running ...  The only logical reason is your
> >         CL app is bottlenecking the APUs memory bus but you claim
> >         "simply opening a context is enough" so something else is
> >         going on.
> >
> >         Your last reply though says "with it running in the
> >         background" so it's entirely possible the CPU isn't busy but
> >         the package memory controller (shared between both the CPU and
> >         GPU) is busy.  For instance running xonotic in a 1080p window
> >         on my 4K display reduced the memory test to 5.8GB/sec and
> >         that's hardly a heavy memory bound GPU app.
> >
> >         The only other possible connection is the GPU is generating so
> >         much heat that it's throttling the package which is also
> >         unlikely if you have a proper HSF attached (I use the ones
> >         that came in the retail boxes).
> >
> >         Cheers,
> >         Tom
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20190312/91dbcf1e/attachment.html>


More information about the amd-gfx mailing list