[PATCH 2/3] accel/habanalabs: reset device if scrubbing failed

Ofir Bitton obitton at habana.ai
Tue Jun 13 07:54:20 UTC 2023


On 12/06/2023 15:07, Oded Gabbay wrote:

If scrubbing memory after user released device has failed it means
the device is in a bad state and should be reset.

Signed-off-by: Oded Gabbay <ogabbay at kernel.org><mailto:ogabbay at kernel.org>
---
 drivers/accel/habanalabs/common/device.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/accel/habanalabs/common/device.c b/drivers/accel/habanalabs/common/device.c
index 5e61761b8c11..d7d9198b2103 100644
--- a/drivers/accel/habanalabs/common/device.c
+++ b/drivers/accel/habanalabs/common/device.c
@@ -454,8 +454,10 @@ static void hpriv_release(struct kref *ref)
                /* Scrubbing is handled within hl_device_reset(), so here need to do it directly */
                int rc = hdev->asic_funcs->scrub_device_mem(hdev);

-               if (rc)
+               if (rc) {
                        dev_err(hdev->dev, "failed to scrub memory from hpriv release (%d)\n", rc);
+                       hl_device_reset(hdev, HL_DRV_RESET_HARD);
+               }
        }

        /* Now we can mark the compute_ctx as not active. Even if a reset is running in a different


Reviewed-by: Ofir Bitton <obitton at habana.ai<mailto:obitton at habana.ai>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/dri-devel/attachments/20230613/479675e0/attachment.htm>


More information about the dri-devel mailing list