amdgpu: [powerplay] failed to send message 148 ret is 0
Alex Deucher
alexdeucher at gmail.com
Tue Oct 30 18:41:07 UTC 2018
Nice work. Thanks for tracking this down!
Alex
On Tue, Oct 30, 2018 at 12:32 PM Mikulas Patocka <mpatocka at redhat.com> wrote:
>
>
>
> On Mon, 29 Oct 2018, Alex Deucher wrote:
>
> > On Thu, Oct 25, 2018 at 4:46 PM Mikulas Patocka <mpatocka at redhat.com> wrote:
> > >
> > >
> > >
> > > On Wed, 24 Oct 2018, Mikulas Patocka wrote:
> > >
> > > > Hi
> > > >
> > > > I have a Sapphire Pulse RX 570 ITX graphics card.
> > > >
> > > > On Linux, I get errors "amdgpu: [powerplay] failed to send message 148 ret
> > > > is 0" and the system is stuck for several seconds when they happen. The
> > > > card works, except for these errors and occasional delays.
> > >
> > > I've found that PP_PCIE_DPM_MASK causes there errors. If I turn this bit
> > > off in amdgpu.ppfeaturemask, there are no more any errors. (and turning it
> > > off also fixes hibernation problems)
> > >
> > > Should it be turned off automatically in response to these errors?
> >
> > What platform are you running on? Are you running in a VM? The
> > driver accesses pci config space on the bridge to determine the pcie
> > gen and lane caps of the platform to determine what clocks and lanes
> > are valid. See amdgpu_device_get_pcie_info(). It would be good to
> > figure out why this is not working on your platform.
> >
> > Alex
>
> It's not a VM. It's an old motherboard with dual socket F. It has HT2000
> north bridge and HT1000 south bridge. It has two PCIe-v1 8-lane slots.
>
> I've found the bug - pcie_get_speed_cap incorrectly tests the lnkcap
> variable against values that are not bit-masks, so that the PCIe port is
> incorrectly reported as 8GB/s capable. When I fix these tests, the errors
> are gone.
>
> Mikulas
More information about the amd-gfx
mailing list