Slow memory access when using OpenCL without X11
Lauri Ehrenpreis
laurioma at gmail.com
Tue Mar 12 21:21:21 UTC 2019
IN the case when memory is slow, the rocm-smi outputs this:
======================== ROCm System Management Interface
========================
================================================================================================
GPU Temp AvgPwr SCLK MCLK PCLK Fan Perf
PwrCap SCLK OD MCLK OD GPU%
GPU[0] : WARNING: Empty SysFS value: pclk
GPU[0] : WARNING: Unable to read
/sys/class/drm/card0/device/gpu_busy_percent
0 30.0c N/A 400Mhz 933Mhz N/A 0% auto N/A
0% 0% N/A
================================================================================================
======================== End of ROCm SMI Log
========================
normal memory speed case gives following:
======================== ROCm System Management Interface
========================
================================================================================================
GPU Temp AvgPwr SCLK MCLK PCLK Fan Perf
PwrCap SCLK OD MCLK OD GPU%
GPU[0] : WARNING: Empty SysFS value: pclk
GPU[0] : WARNING: Unable to read
/sys/class/drm/card0/device/gpu_busy_percent
0 35.0c N/A 400Mhz 1200Mhz N/A 0% auto N/A
0% 0% N/A
================================================================================================
======================== End of ROCm SMI Log
========================
So there is a difference in MCLK - can this cause such a huge slowdown?
--
Lauri
On Tue, Mar 12, 2019 at 6:39 PM Kuehling, Felix <Felix.Kuehling at amd.com>
wrote:
> [adding the list back]
>
> I'd suspect a problem related to memory clock. This is an APU where
> system memory is shared with the CPU, so if the SMU changes memory
> clocks that would affect CPU memory access performance. If the problem
> only occurs when OpenCL is running, then the compute power profile could
> have an effect here.
>
> Laurie, can you monitor the clocks during your tests using rocm-smi?
>
> Regards,
> Felix
>
> On 2019-03-11 1:15 p.m., Tom St Denis wrote:
> > Hi Lauri,
> >
> > I don't have ROCm installed locally (not on that team at AMD) but I
> > can rope in some of the KFD folk and see what they say :-).
> >
> > (in the mean time I should look into installing the ROCm stack on my
> > Ubuntu disk for experimentation...).
> >
> > Only other thing that comes to mind is some sort of stutter due to
> > power/clock gating (or gfx off/etc). But that typically affects the
> > display/gpu side not the CPU side.
> >
> > Felix: Any known issues with Raven and ROCm interacting over memory
> > bus performance?
> >
> > Tom
> >
> > On Mon, Mar 11, 2019 at 12:56 PM Lauri Ehrenpreis <laurioma at gmail.com
> > <mailto:laurioma at gmail.com>> wrote:
> >
> > Hi!
> >
> > The 100x memory slowdown is hard to belive indeed. I attached the
> > test program with my first e-mail which depends only on
> > rocm-opencl-dev package. Would you mind compiling it and checking
> > if it slows down memory for you as well?
> >
> > steps:
> > 1) g++ cl_slow_test.cpp -o cl_slow_test -I
> > /opt/rocm/opencl/include/ -L /opt/rocm/opencl/lib/x86_64/ -lOpenCL
> > 2) logout from desktop env and disconnect hdmi/diplayport etc
> > 3) log in over ssh
> > 4) run the program ./cl_slow_test 1
> >
> > For me it reproduced even without step 2 as well but less
> > reliably. moving mouse for example could make the memory speed
> > fast again.
> >
> > --
> > Lauri
> >
> >
> >
> > On Mon, Mar 11, 2019 at 6:33 PM Tom St Denis <tstdenis82 at gmail.com
> > <mailto:tstdenis82 at gmail.com>> wrote:
> >
> > Hi Lauri,
> >
> > There's really no connection between the two other than they
> > run in the same package. I too run a 2400G (as my
> > workstation) and I got the same ~6.6GB/sec transfer rate but
> > without a CL app running ... The only logical reason is your
> > CL app is bottlenecking the APUs memory bus but you claim
> > "simply opening a context is enough" so something else is
> > going on.
> >
> > Your last reply though says "with it running in the
> > background" so it's entirely possible the CPU isn't busy but
> > the package memory controller (shared between both the CPU and
> > GPU) is busy. For instance running xonotic in a 1080p window
> > on my 4K display reduced the memory test to 5.8GB/sec and
> > that's hardly a heavy memory bound GPU app.
> >
> > The only other possible connection is the GPU is generating so
> > much heat that it's throttling the package which is also
> > unlikely if you have a proper HSF attached (I use the ones
> > that came in the retail boxes).
> >
> > Cheers,
> > Tom
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20190312/91dbcf1e/attachment.html>
More information about the amd-gfx
mailing list