Couple of issues with amdgpu on my WX4100
Christian König
christian.koenig at amd.com
Mon Jan 4 17:39:33 UTC 2021
Am 04.01.21 um 17:45 schrieb Alex Williamson:
> On Mon, 4 Jan 2021 12:34:34 +0100
> Christian König <christian.koenig at amd.com> wrote:
>
>> Hi Maxim,
>>
>> I can't help with the display related stuff. Probably best approach to
>> get this fixes would be to open up a bug tracker for this on FDO.
>>
>> But I'm the one who implemented the resizeable BAR support and your
>> analysis of the problem sounds about correct to me.
>>
>> The reason why this works on Linux is most likely because we restore the
>> BAR size on resume (and maybe during initial boot as well).
>>
>> See this patch for reference:
>>
>> commit d3252ace0bc652a1a244455556b6a549f969bf99
>> Author: Christian König <ckoenig.leichtzumerken at gmail.com>
>> Date: Fri Jun 29 19:54:55 2018 -0500
>>
>> PCI: Restore resized BAR state on resume
>>
>> Resize BARs after resume to the expected size again.
>>
>> BugLink: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.kernel.org%2Fshow_bug.cgi%3Fid%3D199959&data=04%7C01%7Cchristian.koenig%40amd.com%7C942176d2e6aa4a4f3a4208d8b0d032bd%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637453755549960615%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=3rsR%2Fx4uTpjtXFNqlJyFBteMmZMjWf3Neci7lUlkh88%3D&reserved=0
>> Fixes: d6895ad39f3b ("drm/amdgpu: resize VRAM BAR for CPU access v6")
>> Fixes: 276b738deb5b ("PCI: Add resizable BAR infrastructure")
>> Signed-off-by: Christian König <christian.koenig at amd.com>
>> Signed-off-by: Bjorn Helgaas <bhelgaas at google.com>
>> CC: stable at vger.kernel.org # v4.15+
>>
>>
>> It should be trivial to add this to the reset module as well. Most
>> likely even completely vendor independent since I'm not sure what a bus
>> reset will do to this configuration and restoring it all the time should
>> be the most defensive approach.
> Hmm, this should already be used by the bus/slot reset path:
>
> pci_bus_restore_locked()/pci_slot_restore_locked()
> pci_dev_restore()
> pci_restore_state()
> pci_restore_rebar_state()
>
> VFIO support for resizeable BARs has been on my todo list, but I don't
> have access to any systems that have both a capable device and >4G
> decoding enabled in the BIOS. If we have a consistent view of the BAR
> size after the BARs are expanded, I'm not sure why it doesn't just
> work. FWIW, QEMU currently hides the REBAR capability to the guest
> because the kernel driver doesn't support emulation through config
> space (ie. it's read-only, which the spec doesn't support).
In this case the guest shouldn't be able to change the config at all and
I have no idea what's going wrong here.
> AIUI, resource allocation can fail when enabling REBAR support, which
> is a problem if the failure occurs on the host but not the guest since
> we have no means via the hardware protocol to expose such a condition.
> Therefore the model I was considering for vfio-pci would be to simply
> pre-enable REBAR at the max size.
That's a rather bad idea. See our GPUs for example return way more than
they actually need.
E.g. a Polaris usually returns 4GiB even when only 2GiB are installed,
because 4GiB is just the maximum amount of RAM you can put together with
the ASIC on a board.
Some devices even return a mask of all 1 even when they need only 2MiB,
resulting in nearly 1TiB of wasted address space with this approach.
Regards,
Christian.
> It might be sufficiently safe to
> test BAR expansion on initialization and then allow user control, but
> I'm concerned that resource availability could change while already in
> use by the user. Thanks,
>
> Alex
>
More information about the amd-gfx
mailing list