[PATCH] drm/xe: Fix fault on fd close when wedged
Lucas De Marchi
lucas.demarchi at intel.com
Tue Dec 17 22:20:51 UTC 2024
On Thu, Dec 12, 2024 at 08:30:03AM -0800, Matthew Brost wrote:
>On Wed, Dec 11, 2024 at 10:26:15PM -0600, Lucas De Marchi wrote:
>> On Wed, Dec 11, 2024 at 07:51:57PM -0800, Matthew Brost wrote:
>> > On Wed, Dec 11, 2024 at 02:53:32PM -0800, Lucas De Marchi wrote:
>> > > If device is wedged, the final run ticks update for the client should be
>> > > skipped as it's already unmapped. Fix this pagefault when forcing a
>> >
>> > Where does exec queue get unmapped on wedging a device?
>>
>> it's the lrc we are trying to read: lrc->bo, with offset == timestamp.
>>
>> I thought it was part of the the xe_gt_declare_wedged(), but I'm not
>> following what triggers it - the only thing that should trigger that
>> would be the xe_lrc_put() and the refcount reaching 0. Something is not
>> adding up here, I will have to trace the destroy to see what's going on.
>>
>
>Quick reached the same conclusion - accessing the LRC BO should be safe
>until the final put, hence my question. It does appear something else
>weird is going on here.
finally had some time to analyze this again. So the issue is actually a
between unbind and close, because the test in question here does:
fd = drm_open_driver(DRIVER_XE);
...
fd = xe_sysfs_driver_do(fd, pci_slot, XE_SYSFS_DRIVER_REBIND);
drm_close_driver(fd);
note that the application is "leaking" the first fd when it opens the device
again. When the fd is closed on termination, the device already went
through unbind (i.e. xe_pci_remove()).
I will send a new patch to fix that.
Lucas De Marchi
>
>Matt
>
>> Lucas De Marchi
More information about the Intel-xe
mailing list