[PATCH v6 5/5] drm/amdgpu: resize VRAM BAR for CPU access v2
Bjorn Helgaas
helgaas at kernel.org
Fri Jun 2 20:26:31 UTC 2017
On Fri, Jun 02, 2017 at 11:32:21AM +0200, Christian König wrote:
> Hi Bjorn,
>
> sorry for not responding earlier and thanks for picking this thread
> up again.
>
> Am 01.06.2017 um 22:14 schrieb Bjorn Helgaas:
> >[+cc ADMGPU, DRM folks]
> >
> >On Tue, May 09, 2017 at 06:49:07PM +0200, Christian König wrote:
> >>[SNIP]
> >>+/**
> >>+ * amdgpu_resize_bar0 - try to resize BAR0
> >>+ *
> >>+ * @adev: amdgpu_device pointer
> >>+ *
> >>+ * Try to resize BAR0 to make all VRAM CPU accessible.
> >>+ */
> >>+void amdgpu_resize_bar0(struct amdgpu_device *adev)
> >>+{
> >>+ u64 space_needed = roundup_pow_of_two(adev->mc.real_vram_size);
> >>+ u32 rbar_size = order_base_2(((space_needed >> 20) | 1)) -1;
> >>+ u16 cmd;
> >>+ int r;
> >>+
> >>+ /* Free the doorbell mapping, it most likely needs to move as well */
> >>+ amdgpu_doorbell_fini(adev);
> >>+ pci_release_resource(adev->pdev, 2);
> >>+
> >>+ /* Disable memory decoding while we change the BAR addresses and size */
> >>+ pci_read_config_word(adev->pdev, PCI_COMMAND, &cmd);
> >>+ pci_write_config_word(adev->pdev, PCI_COMMAND,
> >>+ cmd & ~PCI_COMMAND_MEMORY);
> >>+
> >>+ r = pci_resize_resource(adev->pdev, 0, rbar_size);
> >>+ if (r == -ENOSPC)
> >>+ DRM_INFO("Not enough PCI address space for a large BAR.");
> >>+ else if (r && r != -ENOTSUPP)
> >>+ DRM_ERROR("Problem resizing BAR0 (%d).", r);
> >>+
> >>+ pci_write_config_word(adev->pdev, PCI_COMMAND, cmd);
> >>+
> >>+ /* When the doorbell BAR isn't available we have no chance of
> >>+ * using the device.
> >>+ */
> >>+ BUG_ON(amdgpu_doorbell_init(adev));
> >This amdgpu_doorbell_fini()/amdgpu_doorbell_init() thing doesn't look
> >right. amdgpu_device_init() only calls amdgpu_doorbell_init() for
> >"adev->asic_type >= CHIP_BONAIRE", but we call it unconditionally
> >here.
> >
> >This is the call graph:
> >
> > amdgpu_device_init
> > adev->rmmio_base = pci_resource_start(adev->pdev, 5) # 2 for < BONAIRE
> > adev->rmmio = ioremap(adev->rmmio_base, ...)
> > DRM_INFO("register mmio base: 0x%08X\n", (uint32_t)adev->rmmio_base)
> > if (adev->asic_type >= CHIP_BONAIRE) {
> > amdgpu_doorbell_init
> > adev->doorbell.base = pci_resource_start(adev->pdev, 2)
> > adev->doorbell.ptr = ioremap(adev->doorbell.base, ...)
> > }
> > amdgpu_init
> > gmc_v7_0_sw_init # gmc_v7_0_ip_funcs.sw_init
> > gmc_v7_0_mc_init
> > + amdgpu_resize_bar0
> > + amdgpu_doorbell_fini
> > + pci_release_resource(adev->pdev, 2)
> > + pci_resize_resource(adev->pdev, 0, size)
> > + amdgpu_doorbell_init
> > adev->mc.aper_base = pci_resource_start(adev->pdev, 0)
> >
> >If "asic_type < CHIP_BONAIRE", we ioremapped BAR 2 in
> >amdgpu_device_init(), then we released the resource here and never
> >updated the ioremap.
>
> The first hardware with a resizeable BAR I could find is a Tonga,
> and that is even a generation later than Bonaire.
>
> So we are never going to call this code on earlier hardware generations.
The problem with that is that it's impossible for a code reader to
verify that. So adding a check is ugly but I think makes it more
readable.
> But I think I will just move the asic_type check into the function
> to be absolute sure.
>
> > From the PCI core perspective, it would be much cleaner to do the BAR
> >resize before the driver calls pci_enable_device(). If that could be
> >done, there would be no need for this sort of shutdown/reinit stuff
> >and we wouldn't have to worry about issues like these. The amdgpu
> >init path is pretty complicated, so I don't know whether this is
> >possible.
>
> I completely agree on this and it is actually the approach I tried first.
>
> There are just two problems with this approach:
> 1. When the amdgpu driver is loaded there can already be the VGA
> console, Vesa or EFI driver active for the device and displaying the
> splash screen.
>
> When we resize and most likely relocate the BAR while those drivers
> are active it will certainly cause problems.
>
> What amdgpu does before trying to resize the BAR is kicking out
> other driver and making sure it has exclusive access to the
> hardware.
I don't understand the problem here yet. If you need to enable the
device, then disable it, resize, and re-enable it, that's fine.
The important thing I'm looking for is that the resize happens before
a pci_enable_device(), because pci_enable_device() is the sync point
where the PCI core enables resources and makes them available to the
driver. Drivers know that they can't look at the resources before
that point. There's a little bit of text about this in [1].
> 2. Without taking a look at the registers you don't know how much
> memory there actually is on the board.
>
> We could always resize it to the maximum supported, but that would
> mean we could easily waste 128GB of address space while the hardware
> only has 8GB of VRAM.
>
> That would not necessarily hurt us when we have enough address
> space, but at least kind of sucks.
Enable, read regs, disable, kick out other drivers, resize, enable.
Does that solve this problem?
> >I would also like to simplify the driver usage model and get the
> >PCI_COMMAND_MEMORY twiddling into the PCI core instead of the driver.
> >Ideally, the driver would do something like this:
> >
> > pci_resize_resource(adev->pdev, 0, rbar_size);
> > pci_enable_device(adev->pdev);
> >
> >And the PCI core would be something along these lines:
> >
> > int pci_resize_resource(dev, bar, size)
> > {
> > if (pci_is_enabled(dev))
> > return -EBUSY;
> >
> > pci_disable_decoding(dev); # turn off MEM, IO decoding
> > pci_release_resources(dev); # (all of them)
> > err = pci_resize_bar(dev, bar, size); # change BAR size (only)
> > if (err)
> > return err;
> >
> > pci_assign_resources(dev); # reassign all "dev" resources
> > return 0;
> > }
>
> I already tried the approach with releasing all resources, but it
> didn't worked so well.
>
> When resizing fails because we don't have enough address space then
> we at least want to get back to a working config.
>
> Releasing everything makes that rather tricky, since I would then
> need to keep a backup of the old config and try to restore it.
If resizing fails because of lack of address space, I would expect the
PCI core to at least restore to the previous state. If it doesn't, I
think that would be a defect.
Having the driver specify the BARs it thinks might cause issues feels
like a crutch.
> Additional to that I'm not sure if releasing the register BAR and
> relocating it works with amdgpu.
If the BAR can't be relocated, that sounds like a hardware defect. If
that's really the case, you could mark it IORESOURCE_PCI_FIXED so we
don't move it. Or if it's an amdgpu defect, e.g., if amdgpu doesn't
re-read the resource addresses after pci_enable_device(), you should
fix amdgpu.
[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/PCI/pci.txt#n255
More information about the amd-gfx
mailing list