Slow memory access when using OpenCL without X11
Lauri Ehrenpreis
laurioma at gmail.com
Wed Mar 13 20:15:16 UTC 2019
For reproduction only the tiny cl_slow_test.cpp is needed which is attached
to first e-mail.
System information is following:
CPU: Ryzen5 2400G
Main board: Gigabyte AMD B450 AORUS mini itx:
https://www.gigabyte.com/Motherboard/B450-I-AORUS-PRO-WIFI-rev-10#kf
BIOS: F5 8.47 MB 2019/01/25 (latest)
Kernel: https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.0/ (amd64)
OS: Ubuntu 18.04 LTS
rocm-opencl-dev installation:
wget -qO - http://repo.radeon.com/rocm/apt/debian/rocm.gpg.key | sudo
apt-key add -
echo 'deb [arch=amd64] http://repo.radeon.com/rocm/apt/debian/ xenial main'
| sudo tee /etc/apt/sources.list.d/rocm.list
sudo apt install rocm-opencl-dev
Also exactly the same issue happens with this board:
https://www.gigabyte.com/Motherboard/GA-AB350-Gaming-3-rev-1x#kf
I have MSI and Asrock mini itx boards ready as well, So far didn't get
amdgpu & opencl working there but I'll try again tomorrow..
--
Lauri
On Wed, Mar 13, 2019 at 8:51 PM Kuehling, Felix <Felix.Kuehling at amd.com>
wrote:
> Hi Lauri,
>
> I still think the SMU is doing something funny, but rocm-smi isn't
> showing enough information to really see what's going on.
>
> On APUs the SMU firmware is embedded in the system BIOS. Unlike discrete
> GPUs, the SMU firmware is not loaded by the driver. You could try
> updating your system BIOS to the latest version available from your main
> board vendor and see if that makes a difference. It may include a newer
> version of the SMU firmware, potentially with a fix.
>
> If that doesn't help, we'd have to reproduce the problem in house to see
> what's happening, which may require the same main board and BIOS version
> you're using. We can ask our SMU firmware team if they've ever
> encountered your type of problem. But I don't want to give you too much
> hope. It's a tricky problem involving HW, firmware and multiple driver
> components in a fairly unusual configuration.
>
> Regards,
> Felix
>
> On 2019-03-13 7:28 a.m., Lauri Ehrenpreis wrote:
> > What I observe is that moving the mouse made the memory speed go up
> > and also it made mclk=1200Mhz in rocm-smi output.
> > However if I force mclk to 1200Mhz myself then memory speed is still
> > slow.
> >
> > So rocm-smi output when memory speed went fast due to mouse movement:
> > rocm-smi
> > ======================== ROCm System Management Interface
> > ========================
> >
> ================================================================================================
> > GPU Temp AvgPwr SCLK MCLK PCLK Fan Perf
> > PwrCap SCLK OD MCLK OD GPU%
> > GPU[0] : WARNING: Empty SysFS value: pclk
> > GPU[0] : WARNING: Unable to read
> > /sys/class/drm/card0/device/gpu_busy_percent
> > 0 44.0c N/A 400Mhz 1200Mhz N/A 0% manual N/A
> > 0% 0% N/A
> >
> ================================================================================================
> > ======================== End of ROCm SMI Log
> > ========================
> >
> > And rocm-smi output when I forced memclk=1200MHz myself:
> > rocm-smi --setmclk 2
> > rocm-smi
> > ======================== ROCm System Management Interface
> > ========================
> >
> ================================================================================================
> > GPU Temp AvgPwr SCLK MCLK PCLK Fan Perf
> > PwrCap SCLK OD MCLK OD GPU%
> > GPU[0] : WARNING: Empty SysFS value: pclk
> > GPU[0] : WARNING: Unable to read
> > /sys/class/drm/card0/device/gpu_busy_percent
> > 0 39.0c N/A 400Mhz 1200Mhz N/A 0% manual N/A
> > 0% 0% N/A
> >
> ================================================================================================
> > ======================== End of ROCm SMI Log
> > ========================
> >
> > So only difference is that temperature shows 44c when memory speed was
> > fast and 39c when it was slow. But mclk was 1200MHz and sclk was
> > 400MHz in both cases.
> > Can it be that rocm-smi just has a bug in reporting and mclk was not
> > actually 1200MHz when I forced it with rocm-smi --setmclk 2 ?
> > That would explain the different behaviour..
> >
> > If so then is there a programmatic way how to really guarantee the
> > high speed mclk? Basically I want do something similar in my program
> > what happens if I move
> > the mouse in desktop env and this way guarantee the normal memory
> > speed each time the program starts.
> >
> > --
> > Lauri
> >
> >
> > On Tue, Mar 12, 2019 at 11:36 PM Deucher, Alexander
> > <Alexander.Deucher at amd.com <mailto:Alexander.Deucher at amd.com>> wrote:
> >
> > Forcing the sclk and mclk high may impact the CPU frequency since
> > they share TDP.
> >
> > Alex
> >
> ------------------------------------------------------------------------
> > *From:* amd-gfx <amd-gfx-bounces at lists.freedesktop.org
> > <mailto:amd-gfx-bounces at lists.freedesktop.org>> on behalf of Lauri
> > Ehrenpreis <laurioma at gmail.com <mailto:laurioma at gmail.com>>
> > *Sent:* Tuesday, March 12, 2019 5:31 PM
> > *To:* Kuehling, Felix
> > *Cc:* Tom St Denis; amd-gfx at lists.freedesktop.org
> > <mailto:amd-gfx at lists.freedesktop.org>
> > *Subject:* Re: Slow memory access when using OpenCL without X11
> > However it's not only related to mclk and sclk. I tried this:
> > rocm-smi --setsclk 2
> > rocm-smi --setmclk 3
> > rocm-smi
> > ======================== ROCm System Management Interface
> > ========================
> >
> ================================================================================================
> > GPU Temp AvgPwr SCLK MCLK PCLK Fan Perf
> > PwrCap SCLK OD MCLK OD GPU%
> > GPU[0] : WARNING: Empty SysFS value: pclk
> > GPU[0] : WARNING: Unable to read
> > /sys/class/drm/card0/device/gpu_busy_percent
> > 0 34.0c N/A 1240Mhz 1333Mhz N/A 0%
> > manual N/A 0% 0% N/A
> >
> ================================================================================================
> > ======================== End of ROCm SMI Log
> > ========================
> >
> > ./cl_slow_test 1
> > got 1 platforms 1 devices
> > speed 3919.777100 avg 3919.777100 mbytes/s
> > speed 3809.373291 avg 3864.575195 mbytes/s
> > speed 585.796814 avg 2771.649170 mbytes/s
> > speed 188.721848 avg 2125.917236 mbytes/s
> > speed 188.916367 avg 1738.517090 mbytes/s
> >
> > So despite forcing max sclk and mclk the memory speed is still slow..
> >
> > --
> > Lauri
> >
> >
> > On Tue, Mar 12, 2019 at 11:21 PM Lauri Ehrenpreis
> > <laurioma at gmail.com <mailto:laurioma at gmail.com>> wrote:
> >
> > IN the case when memory is slow, the rocm-smi outputs this:
> > ======================== ROCm System Management
> > Interface ========================
> >
> ================================================================================================
> > GPU Temp AvgPwr SCLK MCLK PCLK Fan
> > Perf PwrCap SCLK OD MCLK OD GPU%
> > GPU[0] : WARNING: Empty SysFS value: pclk
> > GPU[0] : WARNING: Unable to read
> > /sys/class/drm/card0/device/gpu_busy_percent
> > 0 30.0c N/A 400Mhz 933Mhz N/A 0%
> > auto N/A 0% 0% N/A
> >
> ================================================================================================
> > ======================== End of ROCm SMI Log
> > ========================
> >
> > normal memory speed case gives following:
> > ======================== ROCm System Management
> > Interface ========================
> >
> ================================================================================================
> > GPU Temp AvgPwr SCLK MCLK PCLK Fan
> > Perf PwrCap SCLK OD MCLK OD GPU%
> > GPU[0] : WARNING: Empty SysFS value: pclk
> > GPU[0] : WARNING: Unable to read
> > /sys/class/drm/card0/device/gpu_busy_percent
> > 0 35.0c N/A 400Mhz 1200Mhz N/A 0%
> > auto N/A 0% 0% N/A
> >
> ================================================================================================
> > ======================== End of ROCm SMI Log
> > ========================
> >
> > So there is a difference in MCLK - can this cause such a huge
> > slowdown?
> >
> > --
> > Lauri
> >
> > On Tue, Mar 12, 2019 at 6:39 PM Kuehling, Felix
> > <Felix.Kuehling at amd.com <mailto:Felix.Kuehling at amd.com>> wrote:
> >
> > [adding the list back]
> >
> > I'd suspect a problem related to memory clock. This is an
> > APU where
> > system memory is shared with the CPU, so if the SMU
> > changes memory
> > clocks that would affect CPU memory access performance. If
> > the problem
> > only occurs when OpenCL is running, then the compute power
> > profile could
> > have an effect here.
> >
> > Laurie, can you monitor the clocks during your tests using
> > rocm-smi?
> >
> > Regards,
> > Felix
> >
> > On 2019-03-11 1:15 p.m., Tom St Denis wrote:
> > > Hi Lauri,
> > >
> > > I don't have ROCm installed locally (not on that team at
> > AMD) but I
> > > can rope in some of the KFD folk and see what they say :-).
> > >
> > > (in the mean time I should look into installing the ROCm
> > stack on my
> > > Ubuntu disk for experimentation...).
> > >
> > > Only other thing that comes to mind is some sort of
> > stutter due to
> > > power/clock gating (or gfx off/etc). But that typically
> > affects the
> > > display/gpu side not the CPU side.
> > >
> > > Felix: Any known issues with Raven and ROCm interacting
> > over memory
> > > bus performance?
> > >
> > > Tom
> > >
> > > On Mon, Mar 11, 2019 at 12:56 PM Lauri Ehrenpreis
> > <laurioma at gmail.com <mailto:laurioma at gmail.com>
> > > <mailto:laurioma at gmail.com <mailto:laurioma at gmail.com>>>
> > wrote:
> > >
> > > Hi!
> > >
> > > The 100x memory slowdown is hard to belive indeed. I
> > attached the
> > > test program with my first e-mail which depends only on
> > > rocm-opencl-dev package. Would you mind compiling it
> > and checking
> > > if it slows down memory for you as well?
> > >
> > > steps:
> > > 1) g++ cl_slow_test.cpp -o cl_slow_test -I
> > > /opt/rocm/opencl/include/ -L
> > /opt/rocm/opencl/lib/x86_64/ -lOpenCL
> > > 2) logout from desktop env and disconnect
> > hdmi/diplayport etc
> > > 3) log in over ssh
> > > 4) run the program ./cl_slow_test 1
> > >
> > > For me it reproduced even without step 2 as well but
> > less
> > > reliably. moving mouse for example could make the
> > memory speed
> > > fast again.
> > >
> > > --
> > > Lauri
> > >
> > >
> > >
> > > On Mon, Mar 11, 2019 at 6:33 PM Tom St Denis
> > <tstdenis82 at gmail.com <mailto:tstdenis82 at gmail.com>
> > > <mailto:tstdenis82 at gmail.com
> > <mailto:tstdenis82 at gmail.com>>> wrote:
> > >
> > > Hi Lauri,
> > >
> > > There's really no connection between the two
> > other than they
> > > run in the same package. I too run a 2400G (as my
> > > workstation) and I got the same ~6.6GB/sec
> > transfer rate but
> > > without a CL app running ... The only logical
> > reason is your
> > > CL app is bottlenecking the APUs memory bus but
> > you claim
> > > "simply opening a context is enough" so
> > something else is
> > > going on.
> > >
> > > Your last reply though says "with it running in the
> > > background" so it's entirely possible the CPU
> > isn't busy but
> > > the package memory controller (shared between
> > both the CPU and
> > > GPU) is busy. For instance running xonotic in a
> > 1080p window
> > > on my 4K display reduced the memory test to
> > 5.8GB/sec and
> > > that's hardly a heavy memory bound GPU app.
> > >
> > > The only other possible connection is the GPU is
> > generating so
> > > much heat that it's throttling the package which
> > is also
> > > unlikely if you have a proper HSF attached (I
> > use the ones
> > > that came in the retail boxes).
> > >
> > > Cheers,
> > > Tom
> > >
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20190313/996b1977/attachment-0001.html>
More information about the amd-gfx
mailing list