Annoying AMDGPU boot-time warning due to simplefb / amdgpu resource clash
Jocelyn Falempe
jfalempe at redhat.com
Tue Jun 28 12:41:53 UTC 2022
On 28/06/2022 10:43, Thomas Zimmermann wrote:
> Hi
>
> Am 27.06.22 um 19:25 schrieb Linus Torvalds:
>> On Mon, Jun 27, 2022 at 1:02 AM Javier Martinez Canillas
>> <javierm at redhat.com> wrote:
>>>
>>> The flag was dropped because it was causing drivers that requested their
>>> memory resource with pci_request_region() to fail with -EBUSY (e.g: the
>>> vmwgfx driver):
>>>
>>> https://www.spinics.net/lists/dri-devel/msg329672.html
>>
>> See, *that* link would have been useful in the commit.
>>
>> Rather than the useless link it has.
>>
>> Anyway, removing the busy bit just made things worse.
>>
>>>> If simplefb is actually still using that frame buffer, it's a problem.
>>>> If it isn't, then maybe that resource should have been released?
>>>
>>> It's supposed to be released once amdgpu asks for conflicting
>>> framebuffers
>>> to be removed calling
>>> drm_aperture_remove_conflicting_pci_framebuffers().
>>
>> That most definitely doesn't happen. This is on a running system:
>>
>> [torvalds at ryzen linux]$ cat /proc/iomem | grep BOOTFB
>> 00000000-00000000 : BOOTFB
>>
>> so I suspect that the BUSY bit was never the problem - even for
>> vmwgfx). The problem was that simplefb doesn't remove its resource.
>>
>> Guys, the *reason* for resource management is to catch people that
>> trample over each other's resources.
>>
>> You literally basically disabled the code that checked for it by
>> removing the BUSY flag, and just continued to have conflicting
>> resources.
>>
>> That isn't a "fix", that is literally "we are ignoring and breaking
>> the whole reason that the resource tree exists, but we'll still use it
>> for no good reason".
>
> The EFI/VESA framebuffer is represented by a platform device. The BUSY
> flag we removed is in the 'sysfb' code that creates this device. The
> BOOTFB resource you see in your /proc/iomem is the framebuffer memory.
> The code is in sysfb_create_simplefb() [1]
>
> Later during boot a device driver, 'simplefb' or 'simpledrm', binds to
> the device and reserves the framebuffer memory for rendering into it.
> For example in simpledrm. [2] At that point a BUSY flag is set for that
> reservation.
>
>>
>> Yeah, yeah, most modern drivers ignore the IO resource tree, because
>> they end up working on another resource level entirely: they work on
>> not the IO resources, but on the "driver level" instead, and just
>> attach to PCI devices.
>>
>> So these days, few enough drivers even care about the IO resource
>> tree, and it's mostly used for (a) legacy devices (think ISA) and (b)
>> the actual bus resource handling (so the PCI code itself uses it to
>> sort out resource use and avoid conflicts, but PCI drivers themselves
>> generally then don't care, because the bus has "taken care of it".
>>
>> So that's why the amdgpu driver itself doesn't care about resource
>> allocations, and we only get a warning for that memory type case, not
>> for any deeper resource case.
>>
>> And apparently the vmwgfx driver still uses that legacy "let's claim
>> all PCI resources in the resource tree" instead of just claiming the
>> device itself. Which is why it hit this whole BOOTFB resource thing
>> even harder.
>>
>> But the real bug is that BOOTFB seems to claim this resource even
>> after it is done with it and other drivers want to take over.
>
> Once amdgpu wants to take over, it has to remove the the platform device
> that represents the EFI framebuffer. It does so by calling the
> drm_aperture_ function, which in turn calls
> platform_device_unregister(). Afterwards, the platform device, driver
> and BOOTFB range are supposed to be entirely gone.
>
> Unfortunately, this currently only works if a driver is bound to the
> platform device. Without simpledrm or simplefb, amdgpu won't find the
> platform device to remove.
>
> I guess, what happens on your system is that sysfb create a device for
> the EFI framebuffer and then amdgpu comes and doesn't find it for
> removal. And later you see these warnings because BOOTFB is still around.
>
> Javier already provided patches for this scenario, which are in the DRM
> tree. From drm-next, please cherry-pick
>
> 0949ee75da6c ("firmware: sysfb: Make sysfb_create_simplefb() return a
> pdev pointer")
>
> bc824922b264 ("firmware: sysfb: Add sysfb_disable() helper function")
>
> 873eb3b11860 ("fbdev: Disable sysfb device registration when removing
> conflicting FBs")
>
> for testing. With these patches, amdgpu will find the sysfb device and
> unregister it.
>
> The patches are queued up for the next merge window. If they resolve the
> issue, we'll already send with the next round of fixes.
I was able to reproduce the warning with kernel v5.19-rc4, a radeon GPU
and the following config:
CONFIG_SYSFB=y
CONFIG_SYSFB_SIMPLEFB=y
# CONFIG_DRM_SIMPLEDRM is not set
# CONFIG_FB_SIMPLE is not set
After applying the 3 patches you mentioned, the issue is resolved. (at
least on my setup).
Best regards,
--
Jocelyn
>
> Best regards
> Thomas
>
> [1]
> https://elixir.bootlin.com/linux/latest/source/drivers/firmware/sysfb_simplefb.c#L115
>
> [2]
> https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/tiny/simpledrm.c#L544
>
>
>>
>> Not the BUSY bit.
>>
>> Linus
>
More information about the amd-gfx
mailing list