[PATCH v3] drm/amdgpu: Check fence emitted count to identify bad jobs
Christian König
christian.koenig at amd.com
Thu Nov 21 08:12:52 UTC 2024
Yeah, just wanted to point out the unused variable as well.
With that fixed the patch is Reviewed-by: Christian König
<christian.koenig at amd.com>
Regards,
Christian.
Am 21.11.24 um 07:49 schrieb Fan, Shikang:
>
> [AMD Official Use Only - AMD Internal Distribution Only]
>
>
> I forgot to delete the unused counter "j" from the patch, I'll remove
> it when submit the patch to the branch.
>
> Thanks,
> Shikang
>
> ------------------------------------------------------------------------
> *From:* Fan, Shikang <Shikang.Fan at amd.com>
> *Sent:* Thursday, November 21, 2024 2:47 PM
> *To:* amd-gfx at lists.freedesktop.org <amd-gfx at lists.freedesktop.org>;
> Koenig, Christian <Christian.Koenig at amd.com>
> *Cc:* Deng, Emily <Emily.Deng at amd.com>
> *Subject:* Re: [PATCH v3] drm/amdgpu: Check fence emitted count to
> identify bad jobs
> + at Koenig, Christian <mailto:Christian.Koenig at amd.com>
>
> Hi Christian,
> Could you please help review this patch? I removed the timeout wait in
> the function.
>
> Thanks,
> Shikang
>
> ------------------------------------------------------------------------
> *From:* Shikang Fan <shikang.fan at amd.com>
> *Sent:* Thursday, November 21, 2024 11:48 AM
> *To:* amd-gfx at lists.freedesktop.org <amd-gfx at lists.freedesktop.org>
> *Cc:* Fan, Shikang <Shikang.Fan at amd.com>; Deng, Emily <Emily.Deng at amd.com>
> *Subject:* [PATCH v3] drm/amdgpu: Check fence emitted count to
> identify bad jobs
> In SRIOV, when host driver performs MODE 1 reset and notifies FLR to
> guest driver, there is a small chance that there is no job running on hw
> but the driver has not updated the pending list yet, causing the driver
> not respond the FLR request. Modify the has_job_running function to
> make sure if there is still running job.
>
> v2: Use amdgpu_fence_count_emitted to determine job running status.
> v3: Remove the timeout wait in has_job_running
>
> Signed-off-by: Emily Deng <Emily.Deng at amd.com>
> Signed-off-by: Shikang Fan <shikang.fan at amd.com>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 15 +++++++--------
> 1 file changed, 7 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index b3ca911e55d6..f53889ce71a8 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -5222,15 +5222,18 @@ static int amdgpu_device_reset_sriov(struct
> amdgpu_device *adev,
> }
>
> /**
> - * amdgpu_device_has_job_running - check if there is any job in
> mirror list
> + * amdgpu_device_has_job_running - check if there is any unfinished job
> *
> * @adev: amdgpu_device pointer
> *
> - * check if there is any job in mirror list
> + * check if there is any job running on the device when guest driver
> receives
> + * FLR notification from host driver. If there are still jobs
> running, then
> + * the guest driver will not respond the FLR reset. Instead, let the
> job hit
> + * the timeout and guest driver then issue the reset request.
> */
> bool amdgpu_device_has_job_running(struct amdgpu_device *adev)
> {
> - int i;
> + int i, j;
> struct drm_sched_job *job;
>
> for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
> @@ -5239,11 +5242,7 @@ bool amdgpu_device_has_job_running(struct
> amdgpu_device *adev)
> if (!amdgpu_ring_sched_ready(ring))
> continue;
>
> - spin_lock(&ring->sched.job_list_lock);
> - job = list_first_entry_or_null(&ring->sched.pending_list,
> - struct drm_sched_job,
> list);
> - spin_unlock(&ring->sched.job_list_lock);
> - if (job)
> + if (amdgpu_fence_count_emitted(ring))
> return true;
> }
> return false;
> --
> 2.34.1
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20241121/684167bf/attachment-0001.htm>
More information about the amd-gfx
mailing list