Expecting to revert commit 55285e21f045 "fbdev/efifb: Release PCI device ..."

Daniel Vetter daniel.vetter at ffwll.ch
Mon Dec 20 19:25:02 UTC 2021


Adding more amdgpu folks.

Smells like this runtime pm leak is papering over an issue in the
amdgpu driver. Other bug reports are also only about amdgpu.ko it
seems, would an unconditional pm_runtime_get_sync() in
amdgpu_pci_probe() also work? If you do it before the
drm_aperture_remove_conflicting_pci_framebuffers() which throws out
efifb it should be equivalent to the efifb revert.

That would avoid the revert for everyone else, but revert's fine too
imo this close to holidays. I don't think actually fixing this rpm fun
in a late -rc is realistic, these tend to be very gnarly.
-Daniel

On Mon, Dec 20, 2021 at 6:38 PM Linus Torvalds
<torvalds at linux-foundation.org> wrote:
>
> So my lovely eldest daughter came back from NYC for xmas, and two days
> later was informed that she had been in contact with somebody who
> tested positive. And sure enough, so did she.
>
> I'm all vaxxed up and boostered, and feeling fine with just a slightly
> sore throat and no other symptoms, but I did test positive too a few
> days after that. And as a result I'm distancing in my office. It turns
> out that is not so hugely different from my normal life.
>
> One difference *is* that I now have an inflatable mattress and am
> sleeping here by my desktop, and as a result last night I noticed that
> my monitors didn't go to sleep. I had to turn them off manually to get
> the backlight to go away.
>
> So this morning I start to search for the reason, and find this listed
> by the regression bot, and sure enough, reverting commit 55285e21f045
> ("fbdev/efifb: Release PCI device's runtime PM ref during FB destroy")
> makes it all work again.
>
> I've reverted it in my private testing tree, in the hope that somebody
> figures out the actual real cause. But if that doesn't happen, I
> expect to revert it in my main git tree too before rc7.
>
> So this is mainly just a heads-up. I'm not seeing anything interesting
> in the kernel logs, apart from the usual unending stream of
>
>   [drm] PCIE GART of 256M enabled (table at 0x000000F400000000).
>   [drm] UVD and UVD ENC initialized successfully.
>   [drm] VCE initialized successfully.
>   [drm] PCIE GART of 256M enabled (table at 0x000000F400000000).
>   [drm] UVD and UVD ENC initialized successfully.
>   [drm] VCE initialized successfully.
>
> that happens randomly (sometimes every minute, sometimes hours apart)
> with or without that commit.
>
> This is some random Radeon card with two monitors attached (PCI ID
> 1002:67df rev e7, subsystem ID 1da2:e353).
>
> The symptoms are exactly as reported elsewhere:
>
>       https://lore.kernel.org/linux-fbdev/8a27c986-4767-bd29-2073-6c4ffed49bba@jetfuse.net/
>       https://linux-regtracking.leemhuis.info/regzbot/regression/8a27c986-4767-bd29-2073-6c4ffed49bba@jetfuse.net/
>       https://bugzilla.kernel.org/show_bug.cgi?id=215203
>
> ie I can lock my desktop, wait a few seconds, see the "No signal,
> enabling power management" on my monitors, but when that actually
> happens, the desktop lockscreen just comes right back.
>
> I suspect/assume that the screen off event triggers some status event
> on the GPU side, and now some pm logic then just wakes things right up
> again.
>
>                Linus



-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


More information about the amd-gfx mailing list