amdgpu: [powerplay] failed to send message 148 ret is 0
Mikulas Patocka
mpatocka at redhat.com
Tue Oct 30 16:32:16 UTC 2018
On Mon, 29 Oct 2018, Alex Deucher wrote:
> On Thu, Oct 25, 2018 at 4:46 PM Mikulas Patocka <mpatocka at redhat.com> wrote:
> >
> >
> >
> > On Wed, 24 Oct 2018, Mikulas Patocka wrote:
> >
> > > Hi
> > >
> > > I have a Sapphire Pulse RX 570 ITX graphics card.
> > >
> > > On Linux, I get errors "amdgpu: [powerplay] failed to send message 148 ret
> > > is 0" and the system is stuck for several seconds when they happen. The
> > > card works, except for these errors and occasional delays.
> >
> > I've found that PP_PCIE_DPM_MASK causes there errors. If I turn this bit
> > off in amdgpu.ppfeaturemask, there are no more any errors. (and turning it
> > off also fixes hibernation problems)
> >
> > Should it be turned off automatically in response to these errors?
>
> What platform are you running on? Are you running in a VM? The
> driver accesses pci config space on the bridge to determine the pcie
> gen and lane caps of the platform to determine what clocks and lanes
> are valid. See amdgpu_device_get_pcie_info(). It would be good to
> figure out why this is not working on your platform.
>
> Alex
It's not a VM. It's an old motherboard with dual socket F. It has HT2000
north bridge and HT1000 south bridge. It has two PCIe-v1 8-lane slots.
I've found the bug - pcie_get_speed_cap incorrectly tests the lnkcap
variable against values that are not bit-masks, so that the PCIe port is
incorrectly reported as 8GB/s capable. When I fix these tests, the errors
are gone.
Mikulas
More information about the amd-gfx
mailing list