Slow memory access when using OpenCL without X11

Kuehling, Felix Felix.Kuehling at amd.com
Tue Mar 12 16:39:16 UTC 2019


[adding the list back]

I'd suspect a problem related to memory clock. This is an APU where 
system memory is shared with the CPU, so if the SMU changes memory 
clocks that would affect CPU memory access performance. If the problem 
only occurs when OpenCL is running, then the compute power profile could 
have an effect here.

Laurie, can you monitor the clocks during your tests using rocm-smi?

Regards,
   Felix

On 2019-03-11 1:15 p.m., Tom St Denis wrote:
> Hi Lauri,
>
> I don't have ROCm installed locally (not on that team at AMD) but I 
> can rope in some of the KFD folk and see what they say :-).
>
> (in the mean time I should look into installing the ROCm stack on my 
> Ubuntu disk for experimentation...).
>
> Only other thing that comes to mind is some sort of stutter due to 
> power/clock gating (or gfx off/etc).  But that typically affects the 
> display/gpu side not the CPU side.
>
> Felix:  Any known issues with Raven and ROCm interacting over memory 
> bus performance?
>
> Tom
>
> On Mon, Mar 11, 2019 at 12:56 PM Lauri Ehrenpreis <laurioma at gmail.com 
> <mailto:laurioma at gmail.com>> wrote:
>
>     Hi!
>
>     The 100x memory slowdown is hard to belive indeed. I attached the
>     test program with my first e-mail which depends only on
>     rocm-opencl-dev package. Would you mind compiling it and checking
>     if it slows down memory for you as well?
>
>     steps:
>     1) g++ cl_slow_test.cpp -o cl_slow_test -I
>     /opt/rocm/opencl/include/ -L /opt/rocm/opencl/lib/x86_64/  -lOpenCL
>     2) logout from desktop env and disconnect hdmi/diplayport etc
>     3) log in over ssh
>     4) run the program ./cl_slow_test 1
>
>     For me it reproduced even without step 2 as well but less
>     reliably. moving mouse for example could make the memory speed
>     fast again.
>
>     --
>     Lauri
>
>
>
>     On Mon, Mar 11, 2019 at 6:33 PM Tom St Denis <tstdenis82 at gmail.com
>     <mailto:tstdenis82 at gmail.com>> wrote:
>
>         Hi Lauri,
>
>         There's really no connection between the two other than they
>         run in the same package.  I too run a 2400G (as my
>         workstation) and I got the same ~6.6GB/sec transfer rate but
>         without a CL app running ...  The only logical reason is your
>         CL app is bottlenecking the APUs memory bus but you claim
>         "simply opening a context is enough" so something else is
>         going on.
>
>         Your last reply though says "with it running in the
>         background" so it's entirely possible the CPU isn't busy but
>         the package memory controller (shared between both the CPU and
>         GPU) is busy.  For instance running xonotic in a 1080p window
>         on my 4K display reduced the memory test to 5.8GB/sec and
>         that's hardly a heavy memory bound GPU app.
>
>         The only other possible connection is the GPU is generating so
>         much heat that it's throttling the package which is also
>         unlikely if you have a proper HSF attached (I use the ones
>         that came in the retail boxes).
>
>         Cheers,
>         Tom
>


More information about the amd-gfx mailing list