Annoying AMDGPU boot-time warning due to simplefb / amdgpu resource clash

Jocelyn Falempe jfalempe at redhat.com
Tue Jun 28 12:41:53 UTC 2022


On 28/06/2022 10:43, Thomas Zimmermann wrote:
> Hi
> 
> Am 27.06.22 um 19:25 schrieb Linus Torvalds:
>> On Mon, Jun 27, 2022 at 1:02 AM Javier Martinez Canillas
>> <javierm at redhat.com> wrote:
>>>
>>> The flag was dropped because it was causing drivers that requested their
>>> memory resource with pci_request_region() to fail with -EBUSY (e.g: the
>>> vmwgfx driver):
>>>
>>> https://www.spinics.net/lists/dri-devel/msg329672.html
>>
>> See, *that* link would have been useful in the commit.
>>
>> Rather than the useless link it has.
>>
>> Anyway, removing the busy bit just made things worse.
>>
>>>> If simplefb is actually still using that frame buffer, it's a problem.
>>>> If it isn't, then maybe that resource should have been released?
>>>
>>> It's supposed to be released once amdgpu asks for conflicting 
>>> framebuffers
>>> to be removed calling 
>>> drm_aperture_remove_conflicting_pci_framebuffers().
>>
>> That most definitely doesn't happen. This is on a running system:
>>
>>    [torvalds at ryzen linux]$ cat /proc/iomem | grep BOOTFB
>>          00000000-00000000 : BOOTFB
>>
>> so I suspect that the BUSY bit was never the problem - even for
>> vmwgfx). The problem was that simplefb doesn't remove its resource.
>>
>> Guys, the *reason* for resource management is to catch people that
>> trample over each other's resources.
>>
>> You literally basically disabled the code that checked for it by
>> removing the BUSY flag, and just continued to have conflicting
>> resources.
>>
>> That isn't a "fix", that is literally "we are ignoring and breaking
>> the whole reason that the resource tree exists, but we'll still use it
>> for no good reason".
> 
> The EFI/VESA framebuffer is represented by a platform device. The BUSY 
> flag we removed is in the 'sysfb' code that creates this device. The 
> BOOTFB resource you see in your /proc/iomem is the framebuffer memory. 
> The code is in sysfb_create_simplefb() [1]
> 
> Later during boot a device driver, 'simplefb' or 'simpledrm', binds to 
> the device and reserves the framebuffer memory for rendering into it. 
> For example in simpledrm. [2] At that point a BUSY flag is set for that 
> reservation.
> 
>>
>> Yeah, yeah, most modern drivers ignore the IO resource tree, because
>> they end up working on another resource level entirely: they work on
>> not the IO resources, but on the "driver level" instead, and just
>> attach to PCI devices.
>>
>> So these days, few enough drivers even care about the IO resource
>> tree, and it's mostly used for (a) legacy devices (think ISA) and (b)
>> the actual bus resource handling (so the PCI code itself uses it to
>> sort out resource use and avoid conflicts, but PCI drivers themselves
>> generally then don't care, because the bus has "taken care of it".
>>
>> So that's why the amdgpu driver itself doesn't care about resource
>> allocations, and we only get a warning for that memory type case, not
>> for any deeper resource case.
>>
>> And apparently the vmwgfx driver still uses that legacy "let's claim
>> all PCI resources in the resource tree" instead of just claiming the
>> device itself. Which is why it hit this whole BOOTFB resource thing
>> even harder.
>>
>> But the real bug is that BOOTFB seems to claim this resource even
>> after it is done with it and other drivers want to take over.
> 
> Once amdgpu wants to take over, it has to remove the the platform device 
> that represents the EFI framebuffer. It does so by calling the 
> drm_aperture_ function, which in turn calls 
> platform_device_unregister(). Afterwards, the platform device, driver 
> and BOOTFB range are supposed to be entirely gone.
> 
> Unfortunately, this currently only works if a driver is bound to the 
> platform device. Without simpledrm or simplefb, amdgpu won't find the 
> platform device to remove.
> 
> I guess, what happens on your system is that sysfb create a device for 
> the EFI framebuffer and then amdgpu comes and doesn't find it for 
> removal. And later you see these warnings because BOOTFB is still around.
> 
> Javier already provided patches for this scenario, which are in the DRM 
> tree. From drm-next, please cherry-pick
> 
>    0949ee75da6c ("firmware: sysfb: Make sysfb_create_simplefb() return a 
> pdev pointer")
> 
>    bc824922b264 ("firmware: sysfb: Add sysfb_disable() helper function")
> 
>    873eb3b11860 ("fbdev: Disable sysfb device registration when removing 
> conflicting FBs")
> 
> for testing. With these patches, amdgpu will find the sysfb device and 
> unregister it.
> 
> The patches are queued up for the next merge window. If they resolve the 
> issue, we'll already send with the next round of fixes.

I was able to reproduce the warning with kernel v5.19-rc4, a radeon GPU 
and the following config:

CONFIG_SYSFB=y
CONFIG_SYSFB_SIMPLEFB=y
# CONFIG_DRM_SIMPLEDRM is not set
# CONFIG_FB_SIMPLE is not set

After applying the 3 patches you mentioned, the issue is resolved. (at 
least on my setup).

Best regards,

-- 

Jocelyn

> 
> Best regards
> Thomas
> 
> [1] 
> https://elixir.bootlin.com/linux/latest/source/drivers/firmware/sysfb_simplefb.c#L115 
> 
> [2] 
> https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/tiny/simpledrm.c#L544 
> 
> 
>>
>> Not the BUSY bit.
>>
>>                       Linus
> 



More information about the amd-gfx mailing list