Regression: drm: Lobotomize set_busid nonsense for !pci drivers (a325725633c2)

Emil Velikov emil.l.velikov at gmail.com
Mon Oct 3 11:34:06 UTC 2016


On 30 September 2016 at 18:26, Laszlo Ersek <lersek at redhat.com> wrote:
> On 09/30/16 18:38, Hans de Goede wrote:
>> Hi,
>>
>> On 30-09-16 17:33, Laszlo Ersek wrote:
>>> On 09/30/16 16:59, Hans de Goede wrote:
>>>> Hi,
>>>>
>>>> On 30-09-16 16:51, Laszlo Ersek wrote:
>>>>> On 09/30/16 12:35, Hans de Goede wrote:
>>>>>
>>>>>> Attached are 2 patches against the xserver which should fix this,
>>>>>> please give them a try.
>>>>>
>>>>> Sorry about the delay.
>>>>>
>>>>> The patches don't seem to fix the issue for me. Please see the Xorg log
>>>>> attached.
>>>>>
>>>>> I tested the patches as follows. Given that my bisection had been done
>>>>> in a Fedora 24 guest, using
>>>>>
>>>>>   xorg-x11-server-1.18.4-4.fc24
>>>>>   http://koji.fedoraproject.org/koji/buildinfo?buildID=794494
>>>>>
>>>>> I now rebuilt the guest kernel exactly at the failing commit (a325725
>>>>> "drm: Lobotomize set_busid nonsense for !pci drivers"), and first
>>>>> reproduced the issue with the above X server.
>>>>>
>>>>> Then, I ported your patches to "xorg-server-1.18.4" (using the upstream
>>>>> xserver tree), and rebuilt the Fedora package with the backport. For
>>>>> the
>>>>> backport, I had to cherry-pick the following two patches from master
>>>>> first:
>>>>>
>>>>> 1 ca8d88e50310 xfree86: recognize primary BUS_PCI device in
>>>>>                xf86IsPrimaryPlatform()
>>>>> 2 ea91db4b8331 config: fix GPUDevice fail when AutoAddGPU off + BusID
>>>>>
>>>>> This way your patches applied cleanly. (Cherry pick #1 above is
>>>>> actually
>>>>> necessary for semantics, while cherry pick #2 is needed for a clean
>>>>> context only, and has no impact for this test.)
>>>>>
>>>>> That is, in total, I added the following four patches to the Fedora 24
>>>>> package:
>>>>>
>>>>> 1 xfree86: recognize primary BUS_PCI device in xf86IsPrimaryPlatform()
>>>>> 2 config: fix GPUDevice fail when AutoAddGPU off + BusID
>>>>> 3 xfree86: Make adding unclaimed devices as GPU devices a separate step
>>>>> 4 xfree86: Try harder to find atleast 1 non GPU Screen
>>>>>
>>>>> You can find the scratch build that I used for testing here:
>>>>>
>>>>>   xorg-x11-server-1.18.4-4.hans_bz1366842_2.fc24
>>>>>   http://koji.fedoraproject.org/koji/taskinfo?taskID=15875087
>>>>>
>>>>> Another reason I used F24's X server as basis, rather than upstream
>>>>> HEAD, is that Fedora 24 is pretty young, and it's already on kernel
>>>>> 4.7.4, and I believe it will soon move to kernel 4.8, without
>>>>> (necessarily) rebasing its X server package to upstream. IOW the kernel
>>>>> upgrade to 4.8 will break X in Fedora 24 too, and then I expect the
>>>>> Fedora X maintainers would have to cherry pick those two patches as
>>>>> dependencies just the same.
>>>>>
>>>>> To summarize, the patches don't seem to help. I shall nonetheless thank
>>>>> you for spending your Friday on this!
>>>>
>>>> Hmm, do you have a xorg.conf file lying around somewhere, the message
>>>> about the xserver not being able to find an entry for screen 0 does
>>>> not make sense ...
>>>
>>> Good catch, I actually had two files under "/etc/X11/xorg.conf.d/":
>>>
>>> * "00-keyboard.conf", from package "systemd-229-13.fc24.x86_64", with
>>> contents
>>>
>>> ------------
>>> # Read and parsed by systemd-localed. It's probably wise not to edit
>>> this file
>>> # manually too freely.
>>> Section "InputClass"
>>>         Identifier "system-keyboard"
>>>         MatchIsKeyboard "on"
>>>         Option "XkbLayout" "us"
>>> EndSection
>>> ------------
>>>
>>> * "01-resolution.conf", which I had created, in order to set the
>>> preferred display resolution:
>>>
>>> ------------
>>> Section "Screen"
>>>   Identifier "Default Screen"
>>>   Device     "Default Device"
>>>   Monitor    "Default Monitor"
>>> EndSection
>>>
>>> Section "Device"
>>>   Identifier "Default Device"
>>>   Driver     "modesetting"
>>> EndSection
>>>
>>> Section "Monitor"
>>>   Identifier "Default Monitor"
>>>   Option     "PreferredMode"   "640x480"
>>> # Option     "PreferredMode"   "1440x900"
>>> EndSection
>>> ------------
>>>
>>> I removed these files now, and repeated the test. Again, the X server
>>> wouldn't start, but I think the log file looks a bit different now.
>>> Attached.
>>
>> Ah, ok so it seems that my initial analysis is wrong, the problem
>> is not a re-occuring of the device getting identified as a GPU screen,
>> libdrm sorta depends on bus-ids and the lack of one is causing the
>> server to misbehave. I guess that even with a xorg.conf things
>> will fail with the troublesome kernel version (might be worth
>> trying).
>>
>> Emil's analysis seems to be spot on. This does not seem easily
>> fixable in userspace / does seem like a real regression as it
>> even breaks things when specifying the device through xorg.conf
>> (I or so I believe) which is something which uses to work ...
>
> In order to check this hypothesis, I did the following:
> - I downgraded my xorg-x11-server installation to the most recent
> official F24 packages, that is, "1.18.4-4.fc24",
> - I kept the kernel that I built exactly at the regressive commit
> (a325725633c2)
> - I modified "01-resolution.conf" (see it above in the context) like this:
>
> ----
> Section "Device"
>   Identifier "Default Device"
>   Driver     "modesetting"
>   BusID      "PCI:00:02:0" <------------ new option added
> EndSection
> ----
>
> where BusID matches the B/D/F of the virtio-vga device from "lspci".
>
> This setup (modulo the kernel of course) was known to work, but now the
> X server actually segfaults (apparently in the
> xf86PlatformDeviceCheckBusID() function). Please find the logfile attached.
>
> (NB: this is unrelated to upstream commit de9ce6757c2e -- which the
> pristine FC24 build lacks -- because I don't set AutoAddGPU to "off" --
> it is left at its default "on" value.)
>
Where is this upstream commit again - it shows as unknown for the
kernel, xserver and libdrm ?

So my theory was a bit off - SetVersion is the one responsible to set
the "BusID", as retrieved by drmGetBusID, regardless if drmOpen or
open is used.

Here's a bit of a brain dump from the other day:

 - The commit mentioned 'affects' the drmSetBusid/DRM_IOCTL_SET_UNIQUE
userspace codepaths.
 - The latter itself is dri1/legacy (xserver hw/xfree86/dri/) which is
not functional for platform devices.
The latter of which seems to be the case for virt-gpu based on the
kernel module.
 - The modesetting driver should/cannot reach the above xserver codepath

That said, it seems that (at least some) userspace expects a PCI
device despite the kernel module 'advertising' itself as platform one
:-\

Going through the xserver layers is a bit inspiring I'm wondering if
we can not get a strace before/after the xserver commit
ca8d88e50310a0d440a127c22a0a383cc149f408 ? It will help us track
things a lot quicker/easier.


Thanks
Emil


More information about the dri-devel mailing list