Couple of issues with amdgpu on my WX4100
Christian König
christian.koenig at amd.com
Fri Jan 15 11:29:26 UTC 2021
Am 06.01.21 um 21:21 schrieb Maxim Levitsky:
> On Mon, 2021-01-04 at 09:45 -0700, Alex Williamson wrote:
>> On Mon, 4 Jan 2021 12:34:34 +0100
>> Christian König <christian.koenig at amd.com> wrote:
>>
>>> Hi Maxim,
>>>
>>> I can't help with the display related stuff. Probably best approach to
>>> get this fixes would be to open up a bug tracker for this on FDO.
>>>
>>> But I'm the one who implemented the resizeable BAR support and your
>>> analysis of the problem sounds about correct to me.
>>>
>>> The reason why this works on Linux is most likely because we restore the
>>> BAR size on resume (and maybe during initial boot as well).
>>>
>>> See this patch for reference:
>>>
>>> commit d3252ace0bc652a1a244455556b6a549f969bf99
>>> Author: Christian König <ckoenig.leichtzumerken at gmail.com>
>>> Date: Fri Jun 29 19:54:55 2018 -0500
>>>
>>> PCI: Restore resized BAR state on resume
>>>
>>> Resize BARs after resume to the expected size again.
>>>
>>> BugLink: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.kernel.org%2Fshow_bug.cgi%3Fid%3D199959&data=04%7C01%7Cchristian.koenig%40amd.com%7C04878f8babc64386353908d8b280a23b%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637455612845286179%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=iRk9S4IgfQHZgVf1m1n%2F9LpOQzO41pLoc7EWmzH%2Fym4%3D&reserved=0
>>> Fixes: d6895ad39f3b ("drm/amdgpu: resize VRAM BAR for CPU access v6")
>>> Fixes: 276b738deb5b ("PCI: Add resizable BAR infrastructure")
>>> Signed-off-by: Christian König <christian.koenig at amd.com>
>>> Signed-off-by: Bjorn Helgaas <bhelgaas at google.com>
>>> CC: stable at vger.kernel.org # v4.15+
>>>
> Hi!
> Thanks for the feedback!
>
> So I went over qemu code and indeed the qemu (as opposed to the kernel
> where I tried to hide the PCI_EXT_CAP_ID_REBAR) indeed does hide this
> pci capability from the guest.
>
> However exactly as Alex mentioned the kernel does indeed restore
> the rebar state, and even with that code patched out I found out that
> rebar state persists across the reset that the vendor_reset module
> does (BACO I think).
>
> Therefore the Linux guest sees the full 4G bar and happily uses it,
> while the windows guest's driver apparently has a bug when the bar
> is that large.
>
> I patched the amdgpu to resize the bar to various other sizes, and
> the windows driver apparently works up to a 2GB bar.
>
> So pretty much other than a bug in the windows driver, and fact
> that VFIO doesn't support resizable bars there is nothing wrong here.
>
> Since my system does support above 4G decoding and I do have a nice
> vfio friendly device that does support a resizable bar, I do volunteer
> to add support for this to VFIO as time and resources permit.
>
> Also it would be nice if it was either possible to make amdgpu
> (or the whole system) optionally avoid resizing bars when a
> kernel command line / module param is given,
> or even better let the amdgpu resize the bar to its original
> size when it is unloaded which IMHO is the best solution
> for this problem.
>
> I think I can prepare a patch to make amdgpu restore
> the bar size on unload if you think that
> this is the right solution.
Coming back to this topic now, sorry been a bit busy over the last few days.
Basically I don't think that amdgpu should do anything when it quits.
What you should rather do is to resize the BAR to the default value of
the BIOS when you trigger the device reset.
>>> It should be trivial to add this to the reset module as well. Most
>>> likely even completely vendor independent since I'm not sure what a bus
>>> reset will do to this configuration and restoring it all the time should
>>> be the most defensive approach.
>> Hmm, this should already be used by the bus/slot reset path:
>>
>> pci_bus_restore_locked()/pci_slot_restore_locked()
>> pci_dev_restore()
>> pci_restore_state()
>> pci_restore_rebar_state()
>>
>> VFIO support for resizeable BARs has been on my todo list, but I don't
>> have access to any systems that have both a capable device and >4G
>> decoding enabled in the BIOS. If we have a consistent view of the BAR
>> size after the BARs are expanded, I'm not sure why it doesn't just
>> work. FWIW, QEMU currently hides the REBAR capability to the guest
>> because the kernel driver doesn't support emulation through config
>> space (ie. it's read-only, which the spec doesn't support).
>>
>> AIUI, resource allocation can fail when enabling REBAR support, which
>> is a problem if the failure occurs on the host but not the guest since
>> we have no means via the hardware protocol to expose such a condition.
>> Therefore the model I was considering for vfio-pci would be to simply
>> pre-enable REBAR at the max size. It might be sufficiently safe to
>> test BAR expansion on initialization and then allow user control, but
>> I'm concerned that resource availability could change while already in
>> use by the user. Thanks,
> As mentioned in other replies in this thread and what my first
> thought about this, this will indeed will break on devices which
> don't accurately report the maximum bar size that they actually need.
> Even the spec itself says that it is vendor specific to determine the
> optimal bar size.
>
> We can also allow guest to resize the bar and if that fails,
> expose the error via a virtual AER message on the root port
> where the device is attached?
Sounds like it might work in theory, but I'm not an expert for KVM.
Regards,
Christian.
>
> I personally don't know if this is possible/worth it.
>
>
> Best regards,
> Maxim Levitsky
>
>> Alex
>
More information about the dri-devel
mailing list