Regression: drm: Lobotomize set_busid nonsense for !pci drivers (a325725633c2)
Laszlo Ersek
lersek at redhat.com
Mon Oct 3 19:36:31 UTC 2016
On 10/03/16 21:12, Daniel Vetter wrote:
> On Mon, Oct 3, 2016 at 4:15 PM, Laszlo Ersek <lersek at redhat.com> wrote:
>> With kernel commit a325725633c2 applied, the drmGetBusid() call in
>> get_drm_info() [hw/xfree86/os-support/linux/lnx_platform.c] returns
>> "virtio0".
>>
>> Without kernel commit a325725633c2 in place, the same function call
>> produces "pci:0000:00:02.0".
>>
>> I think that the idea of kernel commit a325725633c2:
>>
>> We already have a fallback in place to fill out the unique from
>> dev->unique, which is set to something reasonable in drm_dev_alloc.
>>
>> is inappropriate for virtio devices.
>>
>> While it is true that "virtioN" is unique across the guest system, those
>> identifiers are not stable.
>>
>> # ls -1d /sys/devices/pci*/*/virtio*
>> /sys/devices/pci0000:00/0000:00:02.0/virtio0
>> /sys/devices/pci0000:00/0000:00:03.0/virtio1
>> /sys/devices/pci0000:00/0000:00:05.0/virtio2
>> /sys/devices/pci0000:00/0000:00:06.0/virtio3
>> /sys/devices/pci0000:00/0000:00:09.0/virtio4
>>
>> If you tweak the PCI addresses of other virtio devices on the QEMU
>> command line, while keeping the PCI address of the virtio-vga device
>> intact, the "virtioN" identifier of the virtio-vga device may change in
>> an unspecified / unexpected manner.
>>
>> From the above listing, it seems like higher PCI Device numbers happen
>> to end up with higher "virtioN" identifiers. While this is unspecified /
>> non-guaranteed behavior, in practice it means the following:
>>
>> Moving the "virtio1" device from PCI address 00:03.0 to 00:01.0, while
>> preserving the PCI address of virtio-vga as 00:02.0, will bump
>> virtio-vga to "virtio1" from "virtio0" (and sink that other device from
>> "virtio1" to "virtio0").
>>
>> I think that unstable identifiers don't qualify for BusID use,
>> regardless of the actual format of the IDs in question.
>>
>> Searching the patch quickly, drm_virtio_set_busid() is the only
>> implementation of the "drm_driver.set_busid" callback that delegates the
>> task to drm_pci_set_busid(). All other implementations of this callback
>> were provided by drm_platform_set_busid().
>>
>> drm_platform_set_busid() would ultimately format
>> "dev->platformdev->name" and "dev->platformdev->id" into the bus
>> identifier. Replacing that common logic with the drm_dev_alloc()
>> fallback that is already in place is probably justified: judging from
>> "virtio0", the fallback likely captures the exact same information, just
>> with a different format.
>>
>> However, drm_virtio_set_busid() used to implement a different logic
>> (encoding different information). It intentionally used stable PCI
>> addresses rather than (platformdev->name, platformdev->id).
>>
>> So, I think that removing drm_virtio_set_busid() was an actual
>> regression. I propose that the related hunks of a325725633c2 be
>> reverted. For virtio-vga and virtio-gpu-pci, "virtioN" is an unstable
>> and inappropraite BusID pattern.
>
> Jumping in a bit late here, and maybe I'm not coherent (flu and all
> that), but I'd still like to nuke the set_busid callback for virtio.
> The trouble here seems to be that virtio has a bit a confusion about
> what it considers to be the parent device it wants to expose to
> userspace. And userspace is less than clueful about walking up the
> device hierarchy to figure this out on its own.
>
> Anyway looking at drm_virtio_init in virtgpu_drm_bus.c we already have
> a special case for when we're on a PCI bus. I think the better way to
> fix this issue would be to:
> - again export drm_drv_set_unique to drivers
> - overwrite drm's choice of unique identifier with a call to
>
> drm_dev_set_unique(dev, dev->pdev);
>
> in that if block. With a very big comment that this only exists for
> backwards compat with userspace's expectations.
>
> Since virtio is newer than drm1.1 we don't need to care about the
> backwards compat cruft in the pci set_busid function, and it would be
> great to restrict that as much as possible imo. Something like the
> below essentially (entirely untested):
>
> diff --git a/drivers/gpu/drm/virtio/virtgpu_drm_bus.c
> b/drivers/gpu/drm/virtio/virtgpu_drm_bus.c
> index a59d0e309bfc..1fcf739bf509 100644
> --- a/drivers/gpu/drm/virtio/virtgpu_drm_bus.c
> +++ b/drivers/gpu/drm/virtio/virtgpu_drm_bus.c
> @@ -68,6 +68,10 @@ int drm_virtio_init(struct drm_driver *driver,
> struct virtio_device *vdev)
> dev->pdev = pdev;
> if (vga)
> virtio_pci_kick_out_firmware_fb(pdev);
> +
> + ret = drm_dev_set_unique(dev, dev_name(dev->pdev));
> + if (ret)
> + goto err_free;
> }
>
> ret = drm_dev_register(dev, 0);
>
> Cheers, Daniel
On the flu front: thank you for taking the time to think about this and
to respond despite being ill. Get better!
On the patch front: normally I wouldn't oppose experimentation and
shooting untested patches to users for testing -- that's okay in my
opinion. However, in this case the original patch regressed userspace
*first*; we know the culprit hunks, and those hunks can be very easily
reverted without dropping the entire patch.
Considering Emil's feedback, he wouldn't have let the patch
(a325725633c2) through review if he had noticed that hunk -- similarly
to the amdgpu hunk, which he did notice, in v3.
I think that experimentation, and asking users to test on (virtual)
hardware that is not easily available to the patch submitter, are all
fine -- as long as the public tree is in working shape. Currently, it is
not; with the above in mind, I strongly prefer the (partial) revert.
If you'd like to slay the remaining drm_pci_set_busid() calls, which
affect both virtio-gpu and amdgpu, that could be an entirely justified
change, a 100% improvement, and I'll be more than happy to assist with
testing that (the virtio-gpu case at least) -- *after* the tree is
returned to working shape (including stable), now that we're able to do
that.
Personally, I'm not a DRM "expert" by any means -- this is actually the
first thread on dri-devel that I've even read --, and this is about the
time I can squeeze out right now for this issue. I should help with the
testing as stated above, but we also should have a more convenient
schedule for that testing.
Thanks
Laszlo
More information about the dri-devel
mailing list