Couple of issues with amdgpu on my WX4100

Christian König christian.koenig at amd.com
Mon Jan 4 17:39:33 UTC 2021


Am 04.01.21 um 17:45 schrieb Alex Williamson:
> On Mon, 4 Jan 2021 12:34:34 +0100
> Christian König <christian.koenig at amd.com> wrote:
>
>> Hi Maxim,
>>
>> I can't help with the display related stuff. Probably best approach to
>> get this fixes would be to open up a bug tracker for this on FDO.
>>
>> But I'm the one who implemented the resizeable BAR support and your
>> analysis of the problem sounds about correct to me.
>>
>> The reason why this works on Linux is most likely because we restore the
>> BAR size on resume (and maybe during initial boot as well).
>>
>> See this patch for reference:
>>
>> commit d3252ace0bc652a1a244455556b6a549f969bf99
>> Author: Christian König <ckoenig.leichtzumerken at gmail.com>
>> Date:   Fri Jun 29 19:54:55 2018 -0500
>>
>>       PCI: Restore resized BAR state on resume
>>
>>       Resize BARs after resume to the expected size again.
>>
>>       BugLink: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.kernel.org%2Fshow_bug.cgi%3Fid%3D199959&data=04%7C01%7Cchristian.koenig%40amd.com%7C942176d2e6aa4a4f3a4208d8b0d032bd%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637453755549960615%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=3rsR%2Fx4uTpjtXFNqlJyFBteMmZMjWf3Neci7lUlkh88%3D&reserved=0
>>       Fixes: d6895ad39f3b ("drm/amdgpu: resize VRAM BAR for CPU access v6")
>>       Fixes: 276b738deb5b ("PCI: Add resizable BAR infrastructure")
>>       Signed-off-by: Christian König <christian.koenig at amd.com>
>>       Signed-off-by: Bjorn Helgaas <bhelgaas at google.com>
>>       CC: stable at vger.kernel.org      # v4.15+
>>
>>
>> It should be trivial to add this to the reset module as well. Most
>> likely even completely vendor independent since I'm not sure what a bus
>> reset will do to this configuration and restoring it all the time should
>> be the most defensive approach.
> Hmm, this should already be used by the bus/slot reset path:
>
> pci_bus_restore_locked()/pci_slot_restore_locked()
>   pci_dev_restore()
>    pci_restore_state()
>     pci_restore_rebar_state()
>
> VFIO support for resizeable BARs has been on my todo list, but I don't
> have access to any systems that have both a capable device and >4G
> decoding enabled in the BIOS.  If we have a consistent view of the BAR
> size after the BARs are expanded, I'm not sure why it doesn't just
> work.  FWIW, QEMU currently hides the REBAR capability to the guest
> because the kernel driver doesn't support emulation through config
> space (ie. it's read-only, which the spec doesn't support).

In this case the guest shouldn't be able to change the config at all and 
I have no idea what's going wrong here.

> AIUI, resource allocation can fail when enabling REBAR support, which
> is a problem if the failure occurs on the host but not the guest since
> we have no means via the hardware protocol to expose such a condition.
> Therefore the model I was considering for vfio-pci would be to simply
> pre-enable REBAR at the max size.

That's a rather bad idea. See our GPUs for example return way more than 
they actually need.

E.g. a Polaris usually returns 4GiB even when only 2GiB are installed, 
because 4GiB is just the maximum amount of RAM you can put together with 
the ASIC on a board.

Some devices even return a mask of all 1 even when they need only 2MiB, 
resulting in nearly 1TiB of wasted address space with this approach.

Regards,
Christian.

>    It might be sufficiently safe to
> test BAR expansion on initialization and then allow user control, but
> I'm concerned that resource availability could change while already in
> use by the user.  Thanks,
>
> Alex
>



More information about the amd-gfx mailing list