amdgpu: [powerplay] failed to send message 148 ret is 0

Mikulas Patocka mpatocka at redhat.com
Tue Oct 30 16:32:16 UTC 2018



On Mon, 29 Oct 2018, Alex Deucher wrote:

> On Thu, Oct 25, 2018 at 4:46 PM Mikulas Patocka <mpatocka at redhat.com> wrote:
> >
> >
> >
> > On Wed, 24 Oct 2018, Mikulas Patocka wrote:
> >
> > > Hi
> > >
> > > I have a Sapphire Pulse RX 570 ITX graphics card.
> > >
> > > On Linux, I get errors "amdgpu: [powerplay] failed to send message 148 ret
> > > is 0" and the system is stuck for several seconds when they happen. The
> > > card works, except for these errors and occasional delays.
> >
> > I've found that PP_PCIE_DPM_MASK causes there errors. If I turn this bit
> > off in amdgpu.ppfeaturemask, there are no more any errors. (and turning it
> > off also fixes hibernation problems)
> >
> > Should it be turned off automatically in response to these errors?
> 
> What platform are you running on?  Are you running in a VM?  The
> driver accesses pci config space on the bridge to determine the pcie
> gen and lane caps of the platform to determine what clocks and lanes
> are valid.  See amdgpu_device_get_pcie_info().  It would be good to
> figure out why this is not working on your platform.
> 
> Alex

It's not a VM. It's an old motherboard with dual socket F. It has HT2000 
north bridge and HT1000 south bridge. It has two PCIe-v1 8-lane slots.

I've found the bug - pcie_get_speed_cap incorrectly tests the lnkcap 
variable against values that are not bit-masks, so that the PCIe port is 
incorrectly reported as 8GB/s capable. When I fix these tests, the errors 
are gone.

Mikulas


More information about the amd-gfx mailing list