[PATCH] accel/ivpu: Trigger device recovery on engine reset/resume failure
Jacek Lawrynowicz
jacek.lawrynowicz at linux.intel.com
Mon Jun 2 13:05:26 UTC 2025
Hi,
On 5/28/2025 7:53 PM, Lizhi Hou wrote:
>
> On 5/28/25 08:42, Jacek Lawrynowicz wrote:
>> From: Karol Wachowski <karol.wachowski at intel.com>
>>
>> Trigger full device recovery when the driver fails to restore device state
>> via engine reset and resume operations. This is necessary because, even if
>> submissions from a faulty context are blocked, the NPU may still process
>> previously submitted faulty jobs if the engine reset fails to abort them.
>> Such jobs can continue to generate faults and occupy device resources.
>> When engine reset is ineffective, the only way to recover is to perform
>> a full device recovery.
>>
>> Fixes: dad945c27a42 ("accel/ivpu: Add handling of VPU_JSM_STATUS_MVNCI_CONTEXT_VIOLATION_HW")
>> Cc: <stable at vger.kernel.org> # v6.15+
>> Signed-off-by: Karol Wachowski <karol.wachowski at intel.com>
>> Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz at linux.intel.com>
>> ---
>> drivers/accel/ivpu/ivpu_job.c | 6 ++++--
>> drivers/accel/ivpu/ivpu_jsm_msg.c | 9 +++++++--
>> 2 files changed, 11 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/accel/ivpu/ivpu_job.c b/drivers/accel/ivpu/ivpu_job.c
>> index 1c8e283ad9854..fae8351aa3309 100644
>> --- a/drivers/accel/ivpu/ivpu_job.c
>> +++ b/drivers/accel/ivpu/ivpu_job.c
>> @@ -986,7 +986,8 @@ void ivpu_context_abort_work_fn(struct work_struct *work)
>> return;
>> if (vdev->fw->sched_mode == VPU_SCHEDULING_MODE_HW)
>> - ivpu_jsm_reset_engine(vdev, 0);
>> + if (ivpu_jsm_reset_engine(vdev, 0))
>> + return;
>
> Is it possible the context aborting is entered again before the full device recovery work is executed?
This is a good point but ivpu_context_abort_work_fn() is triggered by an IRQ and the first thing we do when triggering recovery is disabling IRQs.
The recovery work also flushes context_abort_work before staring to tear down everything, so we should be safe.
Regards,
Jacek
More information about the dri-devel
mailing list