[PATCH v5 3/4] drm/xe: Use device wedged event

Aravind Iddamsetty aravind.iddamsetty at linux.intel.com
Tue Sep 17 08:18:52 UTC 2024


On 17/09/24 13:33, Ghimiray, Himal Prasad wrote:
>
>
> On 17-09-2024 12:08, Raag Jadav wrote:
>> On Tue, Sep 17, 2024 at 10:11:05AM +0530, Ghimiray, Himal Prasad wrote:
>>> On 17-09-2024 09:32, Raag Jadav wrote:
>>>> This was previously attempted as xe specific reset uevent but dropped
>>>> in commit 77a0d4d1cea2 ("drm/xe/uapi: Remove reset uevent for now")
>>>> as part of refactoring.
>>>>
>>>> Now that we have device wedged event supported by DRM core, make use
>>>> of it. With this in place userspace will be notified of wedged device,
>>>> on the basis of which, userspace may take respective action to recover
>>>> the device.
>>>
>>>
>>> As per earlier discussions, the UAPI was also supposed to provide the reason
>>> for wedging( which is supposedly used by L0). IS that requirement nomore in
>>> place ?
>>
>> Wondering how does that contribute to the usecase?
>
>
> ZES_EVENT_TYPE_FLAG_DEVICE_RESET_REQUIRED uses zesDeviceGetState
>
> "Get information about the state of the device - if a reset is required, reasons for the reset and if the device has been repaired. "
>
> https://spec.oneapi.io/level-zero/latest/sysman/api.html#zes__api_8h_1aec73230b938f08ad632d0b7817b66183
L0 doesn't read this uevent to know the reason, this uevent is for L0 to know that reset is required 
https://spec.oneapi.io/level-zero/latest/sysman/api.html#_CPPv4N21zes_event_type_flag_t41ZES_EVENT_TYPE_FLAG_DEVICE_RESET_REQUIREDE.


The reason is via a different API via https://spec.oneapi.io/level-zero/latest/sysman/api.html#zesdevicegetstate for which they can open
any IOCTL which will fail with -ECANCELED  when device is wedged and by that they can know the reason.


Thanks,
Aravind.
>
>>
>> Raag


More information about the Intel-xe mailing list