[PATCH] drm/amdgpu: change the fence ring wait timeout
Paul Menzel
pmenzel at molgen.mpg.de
Wed Jan 13 07:57:28 UTC 2021
Dear Roy,
Am 13.01.21 um 07:36 schrieb Roy Sun:
> This fix bug where when the engine hang, the fence ring will wait without quit and cause kernel crash
Please wrap commit message body after 75 characters. (checkpatch.pl
should have shown you that.)
Some typos:
1. fix*es*
2. hang*s*
3. Please add a dot/period at the end of sentences.
If there is a Linux kernel crash, please add the trace.
Please elaborate also. Where and why does the Linux kernel crash, and
how does your patch fix this.
How can the bug be reproduced? On what hardware did you see it?
> Signed-off-by: Roy Sun <Roy.Sun at amd.com>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 48 ++++++++++++++++++++---
> 1 file changed, 43 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> index 6b0aeee61b8b..738ea65077ea 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> @@ -41,6 +41,8 @@
> #include "amdgpu.h"
> #include "amdgpu_trace.h"
>
> +#define AMDGPU_FENCE_TIMEOUT msecs_to_jiffies(1000)
> +#define AMDGPU_FENCE_GFX_XGMI_TIMEOUT msecs_to_jiffies(2000)
Where did you get the values from, or why did you choose them.
> /*
> * Fences
> * Fences mark an event in the GPUs pipeline and are used
> @@ -104,6 +106,38 @@ static void amdgpu_fence_write(struct amdgpu_ring *ring, u32 seq)
> *drv->cpu_addr = cpu_to_le32(seq);
> }
>
> +/**
> + * amdgpu_fence_wait_timeout - get the fence wait timeout
> + *
> + * @ring: ring the fence is associated with
> + *
> + * Returns the value of the fence wait timeout.
> + */
> +long amdgpu_fence_wait_timeout(struct amdgpu_ring *ring)
See below, but it should be `signed long`.
> +{
> + long tmo_gfx, tmo_mm, tmo;
> + struct amdgpu_device *adev = ring->adev;
> + tmo_mm = tmo_gfx = AMDGPU_FENCE_TIMEOUT;
> + if (amdgpu_sriov_vf(adev)) {
> + tmo_mm = 8 * AMDGPU_FENCE_TIMEOUT;
> + }
See coding style. No {} needed for one line statements.
> + if (amdgpu_sriov_runtime(adev)) {
> + tmo_gfx = 8 * AMDGPU_FENCE_TIMEOUT;
So why not define a special SRIOV macro too. But add a comment, why it
should be eight times that.
> + } else if (adev->gmc.xgmi.hive_id) {
> + tmo_gfx = AMDGPU_FENCE_GFX_XGMI_TIMEOUT;
> + }
Ditto.
> + if (ring->funcs->type == AMDGPU_RING_TYPE_UVD ||
> + ring->funcs->type == AMDGPU_RING_TYPE_VCE ||
> + ring->funcs->type == AMDGPU_RING_TYPE_UVD_ENC ||
> + ring->funcs->type == AMDGPU_RING_TYPE_VCN_DEC ||
> + ring->funcs->type == AMDGPU_RING_TYPE_VCN_ENC ||
> + ring->funcs->type == AMDGPU_RING_TYPE_VCN_JPEG)
Align on `ring`?
> + tmo = tmo_mm;
> + else
> + tmo = tmo_gfx;
> + return tmo;
The return could happen in the branches.
> +}
> +
> /**
> * amdgpu_fence_read - read a fence value
> *
> @@ -166,10 +200,12 @@ int amdgpu_fence_emit(struct amdgpu_ring *ring, struct dma_fence **f,
> rcu_read_unlock();
>
> if (old) {
> - r = dma_fence_wait(old, false);
> + long timeout;
> + timeout = amdgpu_fence_wait_timeout(ring);
> + r = dma_fence_wait_timeout(old, false, timeout);
The signature is `signed long` as far as I see.
> dma_fence_put(old);
> if (r)
> - return r;
> + return r < 0 ? r : 0;
> }
> }
>
> @@ -343,10 +379,12 @@ int amdgpu_fence_wait_empty(struct amdgpu_ring *ring)
> return 0;
> }
> rcu_read_unlock();
> -
> - r = dma_fence_wait(fence, false);
> +
> + long timeout;
> + timeout = amdgpu_fence_wait_timeout(ring);
> + r = dma_fence_wait_timeout(fence, false, timeout);
> dma_fence_put(fence);
> - return r;
> + return r < 0 ? r : 0;
> }
>
> /**
>
It looks like this is your first patch submission, so welcome to the
Linux kernel community. Please keep in mind, that reviewer time is
precious and scarce, and that formal and simple mistakes should not
happen. No idea, if AMD has some kind of “onboarding” process for their
newly employed Linux kernel contributors.
I am looking forward to your the patch iteration and your next
contributions.
Kind regards,
Paul
More information about the amd-gfx
mailing list