[PATCH libdrm] libdrm: Fix issue about differrent domainID but same BDF

Deng, Emily Emily.Deng at amd.com
Wed Apr 24 09:19:44 UTC 2019


Hi Emil,
    I don't understand your idea clear about follow, what about the case that only has 1 GPU, and don't support pci_domain? For this case, it still need to fallback to pci_domain_ok=0.
>That aside, I think we can do a slightly better fix. Have you tried:
> - resetting the pci_domain_ok=1 on each iteration, and
> - continuing to the next device when the second
>drmSetInterfaceVersion() call fails

Best wishes
Emily Deng


>-----Original Message-----
>From: Emil Velikov <emil.l.velikov at gmail.com>
>Sent: Friday, February 15, 2019 11:02 PM
>To: Deng, Emily <Emily.Deng at amd.com>
>Cc: amd-gfx mailing list <amd-gfx at lists.freedesktop.org>
>Subject: Re: [PATCH libdrm] libdrm: Fix issue about differrent domainID but
>same BDF
>
>Hi Emily,
>
>Please note that code outside of amdgpu/ is used by all open source drivers.
>Thus patches should have dri-deve@ in to/cc as mentioned in CONTRIBUTING
>
>On Thu, 14 Feb 2019 at 07:53, Emily Deng <Emily.Deng at amd.com> wrote:
>>
>> For multiple GPUs which has the same BDF, but has different domain ID,
>> the drmOpenByBusid will return the wrong fd when startx.
>>
>> The reproduce sequence as below:
>> 1. Call drmOpenByBusid to open Card0, then will return the right fd0,
>> and the
>> fd0 is master privilege;
>> 2. Call drmOpenByBusid to open Card1. In function drmOpenByBusid, it
>> will open Card0 first, this time, the fd1 for opening Card0 is not
>> master privilege, and will call drmSetInterfaceVersion to identify the
>> domain ID feature, as the fd1 is not master privilege, then
>> drmSetInterfaceVersion will fail, and then won't compare domain ID, then
>return the wrong fd for Card1.
>>
>> Solution:
>> First loop search the best match fd about drm 1.4.
>>
>First and foremost, I wish we can stop using using these legacy APIs.
>They're fairly fragile and as you can see the are strange things happening.
>We could instead use drmGetDevices2() to gather a list of devices and pick the
>one we're interested.
>
>That aside, I think we can do a slightly better fix. Have you tried:
> - resetting the pci_domain_ok=1 on each iteration, and
> - continuing to the next device when the second
>drmSetInterfaceVersion() call fails
>
>AFAICT it should produce the same result, while being shorter and faster.
>
>Thanks
>-Emil


More information about the amd-gfx mailing list