Regression: drm: Lobotomize set_busid nonsense for !pci drivers (a325725633c2)

Laszlo Ersek lersek at redhat.com
Mon Oct 3 20:44:17 UTC 2016


On 10/03/16 22:34, Daniel Vetter wrote:
> On Mon, Oct 3, 2016 at 9:36 PM, Laszlo Ersek <lersek at redhat.com> wrote:
>> On 10/03/16 21:12, Daniel Vetter wrote:
>>> On Mon, Oct 3, 2016 at 4:15 PM, Laszlo Ersek <lersek at redhat.com> wrote:
>>>> With kernel commit a325725633c2 applied, the drmGetBusid() call in
>>>> get_drm_info() [hw/xfree86/os-support/linux/lnx_platform.c] returns
>>>> "virtio0".
>>>>
>>>> Without kernel commit a325725633c2 in place, the same function call
>>>> produces "pci:0000:00:02.0".
>>>>
>>>> I think that the idea of kernel commit a325725633c2:
>>>>
>>>>     We already have a fallback in place to fill out the unique from
>>>>     dev->unique, which is set to something reasonable in drm_dev_alloc.
>>>>
>>>> is inappropriate for virtio devices.
>>>>
>>>> While it is true that "virtioN" is unique across the guest system, those
>>>> identifiers are not stable.
>>>>
>>>> # ls -1d /sys/devices/pci*/*/virtio*
>>>> /sys/devices/pci0000:00/0000:00:02.0/virtio0
>>>> /sys/devices/pci0000:00/0000:00:03.0/virtio1
>>>> /sys/devices/pci0000:00/0000:00:05.0/virtio2
>>>> /sys/devices/pci0000:00/0000:00:06.0/virtio3
>>>> /sys/devices/pci0000:00/0000:00:09.0/virtio4
>>>>
>>>> If you tweak the PCI addresses of other virtio devices on the QEMU
>>>> command line, while keeping the PCI address of the virtio-vga device
>>>> intact, the "virtioN" identifier of the virtio-vga device may change in
>>>> an unspecified / unexpected manner.
>>>>
>>>> From the above listing, it seems like higher PCI Device numbers happen
>>>> to end up with higher "virtioN" identifiers. While this is unspecified /
>>>> non-guaranteed behavior, in practice it means the following:
>>>>
>>>> Moving the "virtio1" device from PCI address 00:03.0 to 00:01.0, while
>>>> preserving the PCI address of virtio-vga as 00:02.0, will bump
>>>> virtio-vga to "virtio1" from "virtio0" (and sink that other device from
>>>> "virtio1" to "virtio0").
>>>>
>>>> I think that unstable identifiers don't qualify for BusID use,
>>>> regardless of the actual format of the IDs in question.
>>>>
>>>> Searching the patch quickly, drm_virtio_set_busid() is the only
>>>> implementation of the "drm_driver.set_busid" callback that delegates the
>>>> task to drm_pci_set_busid(). All other implementations of this callback
>>>> were provided by drm_platform_set_busid().
>>>>
>>>> drm_platform_set_busid() would ultimately format
>>>> "dev->platformdev->name" and "dev->platformdev->id" into the bus
>>>> identifier. Replacing that common logic with the drm_dev_alloc()
>>>> fallback that is already in place is probably justified: judging from
>>>> "virtio0", the fallback likely captures the exact same information, just
>>>> with a different format.
>>>>
>>>> However, drm_virtio_set_busid() used to implement a different logic
>>>> (encoding different information). It intentionally used stable PCI
>>>> addresses rather than (platformdev->name, platformdev->id).
>>>>
>>>> So, I think that removing drm_virtio_set_busid() was an actual
>>>> regression. I propose that the related hunks of a325725633c2 be
>>>> reverted. For virtio-vga and virtio-gpu-pci, "virtioN" is an unstable
>>>> and inappropraite BusID pattern.
>>>
>>> Jumping in a bit late here, and maybe I'm not coherent (flu and all
>>> that), but I'd still like to nuke the set_busid callback for virtio.
>>> The trouble here seems to be that virtio has a bit a confusion about
>>> what it considers to be the parent device it wants to expose to
>>> userspace. And userspace is less than clueful about walking up the
>>> device hierarchy to figure this out on its own.
>>>
>>> Anyway looking at drm_virtio_init in virtgpu_drm_bus.c we already have
>>> a special case for when we're on a PCI bus. I think the better way to
>>> fix this issue would be to:
>>> - again export drm_drv_set_unique to drivers
>>> - overwrite drm's choice of unique identifier with a call to
>>>
>>> drm_dev_set_unique(dev, dev->pdev);
>>>
>>> in that if block. With a very big comment that this only exists for
>>> backwards compat with userspace's expectations.
>>>
>>> Since virtio is newer than drm1.1 we don't need to care about the
>>> backwards compat cruft in the pci set_busid function, and it would be
>>> great to restrict that as much as possible imo. Something like the
>>> below essentially (entirely untested):
>>>
>>> diff --git a/drivers/gpu/drm/virtio/virtgpu_drm_bus.c
>>> b/drivers/gpu/drm/virtio/virtgpu_drm_bus.c
>>> index a59d0e309bfc..1fcf739bf509 100644
>>> --- a/drivers/gpu/drm/virtio/virtgpu_drm_bus.c
>>> +++ b/drivers/gpu/drm/virtio/virtgpu_drm_bus.c
>>> @@ -68,6 +68,10 @@ int drm_virtio_init(struct drm_driver *driver,
>>> struct virtio_device *vdev)
>>>                 dev->pdev = pdev;
>>>                 if (vga)
>>>                         virtio_pci_kick_out_firmware_fb(pdev);
>>> +
>>> +               ret = drm_dev_set_unique(dev, dev_name(dev->pdev));
>>> +               if (ret)
>>> +                       goto err_free;
>>>         }
>>>
>>>         ret = drm_dev_register(dev, 0);
>>>
>>> Cheers, Daniel
>>
>>
>> On the flu front: thank you for taking the time to think about this and
>> to respond despite being ill. Get better!
>>
>> On the patch front: normally I wouldn't oppose experimentation and
>> shooting untested patches to users for testing -- that's okay in my
>> opinion. However, in this case the original patch regressed userspace
>> *first*; we know the culprit hunks, and those hunks can be very easily
>> reverted without dropping the entire patch.
>>
>> Considering Emil's feedback, he wouldn't have let the patch
>> (a325725633c2) through review if he had noticed that hunk -- similarly
>> to the amdgpu hunk, which he did notice, in v3.
>>
>> I think that experimentation, and asking users to test on (virtual)
>> hardware that is not easily available to the patch submitter, are all
>> fine -- as long as the public tree is in working shape. Currently, it is
>> not; with the above in mind, I strongly prefer the (partial) revert.
>>
>> If you'd like to slay the remaining drm_pci_set_busid() calls, which
>> affect both virtio-gpu and amdgpu, that could be an entirely justified
>> change, a 100% improvement, and I'll be more than happy to assist with
>> testing that (the virtio-gpu case at least) -- *after* the tree is
>> returned to working shape (including stable), now that we're able to do
>> that.
>>
>> Personally, I'm not a DRM "expert" by any means -- this is actually the
>> first thread on dri-devel that I've even read --, and this is about the
>> time I can squeeze out right now for this issue. I should help with the
>> testing as stated above, but we also should have a more convenient
>> schedule for that testing.
> 
> I didn't say I'm blocking your fix for this, nor did I want to imply
> that. It's just been my experience that bug reporters tend to
> disappear fast.

Valid experience. Doesn't apply to me.

> So if you have time, please test the above snippet
> (without your patch of course). If you don't, no harm done, since I've
> been trying to untangle the busid mess since years, so a few more
> won't matter ;-)

Can you please send the patch in a form that I can apply with git-am,
and ensure that it will build? (I assume "entirely untested" implies
"not build tested".) I think a patch attachment on-list, or an inline
patch sent to me privately should be fine (so as not to confuse
subscribers and/or any patch bot / patchwork instance scraping the
list). I guess a threaded message with a subject-prefix different from
[PATCH] might work as well.

(It's not that I'm too lazy to re-code your line-wrapped patch manually
-- instead, you want me to interfere with your patch before testing it
as little as possible. Git-am or even a fetchable remote are best for that.)

Thanks!
Laszlo


More information about the dri-devel mailing list