[PATCH 1/2] drm/amdgpu: Reset IH OVERFLOW_CLEAR bit after writing rptr

Wed Jan 17 12:27:42 UTC 2024

Am 16.01.24 um 11:31 schrieb Friedrich Vock:
> On 16.01.24 08:03, Christian König wrote:
>> Am 15.01.24 um 12:18 schrieb Friedrich Vock:
>>> [SNIP]
>>>>> +    if (ih->overflow) {
>>>>> +        tmp = RREG32(mmIH_RB_CNTL);
>>>>> +        tmp &= ~IH_RB_CNTL__WPTR_OVERFLOW_CLEAR_MASK;
>>>>> +        WREG32(mmIH_RB_CNTL, tmp);
>>>>> +        ih->overflow = false;
>>>>> +    }
>>>>
>>>> Well that is an extremely bad idea. We already reset the overflow
>>>> after reading the WPTR.
>>>
>>> This is not resetting the overflow bit. This is resetting a "clear
>>> overflow" bit. I don't have the hardware docs, but the name (and my
>>> observations) strongly suggest that setting this bit actually prevents
>>> the hardware from setting the overflow bit ever again.
>>
>> Well that doesn't make any sense at all. The hardware documentation
>> clearly states that this bit is write only and should always read as
>> zero.
>>
>> Setting this bit will clear the overflow flag in the WPTR register and
>> clearing it has no effect at all.
>>
>> I could only ping the hw engineer responsible for this block to double
>> check if the documentation is somehow outdated, but I really doubt so.
>>
> I see. I wish I had access to the documentation,

Well, doesn't Valve has an NDA in place?

> but I don't, so all I
> can do is tell you what I observe the hardware doing. I've tested this
> on both a Steam Deck (OSSYS 5.2.0) and an RX 6700 XT (OSSYS 5.0.3). On
> both systems, launching a bunch of shaders that cause page faults leads
> to lots of "[gfxhub] page fault" messages in dmesg, followed by an
> "amdgpu: IH ring buffer overflow".

Well that is certainly a bug, maybe even the same thing we have seen on 
Vega and MI.

What we could do is to try to apply the same workaround to re-route the 
page faults to a different IH ring.

See those patches here as well:

commit 516bc3d8dd7965f1a8a3ea453857f14d95971e62
Author: Christian König <christian.koenig at amd.com>
Date:   Fri Nov 2 15:00:16 2018 +0100

     drm/amdgpu: reroute VMC and UMD to IH ring 1

     Page faults can easily overwhelm the interrupt handler.

     So to make sure that we never lose valuable interrupts on the 
primary ring
     we re-route page faults to IH ring 1.

commit b849aaa41c914a0fd88003f88cb04420a873c624
Author: Christian König <christian.koenig at amd.com>
Date:   Mon Mar 4 19:34:34 2019 +0100

     drm/amdgpu: also reroute VMC and UMD to IH ring 1 on Vega 20

     Same patch we alredy did for Vega10. Just re-route page faults to a 
separate
     ring to avoid drowning in interrupts.

>
> If I re-launch the same set of shaders after the GPU has soft-recovered,
> the "amdgpu: IH ring buffer overflow" message is missing, even though
> the same amount of page faults should've been triggered at roughly the
> same rate. Running with this patch applied makes more "amdgpu: IH ring
> buffer overflow" messages appear after relaunching the faulting shaders
> (but not when processing any non-faulting work).

That is actually the expected behavior. There should be a limit on the 
number of faults written to the ring so that the ring never overflows.

>
> The only possible conclusion I can draw from this is that clearing that
> bit *does* have an effect, and I don't think it's far-fetched to assume
> the IH ring buffer overflows still happen after re-launching the
> faulting shaders but go undetected so far.

Well that can only mean that the hw documentation is incorrect.

Either the value is not write only trigger bit as documented or we need 
an additional read of the register for it to take effect or something 
like this.

>>> Right now, IH overflows, even if they occur repeatedly, only get
>>> registered once. If not registering IH overflows can trivially lead to
>>> system crashes, it's amdgpu's current handling that is broken.
>>
>> It's years that we last tested this but according to the HW
>> documentation this should work fine.
>>
>> What could potentially happen is that the IH has silenced the source
>> of the overflow. We never implemented resetting those, but in this
>> case that here won't help either.
>>
> If the IH silenced the page faults (which quite clearly cause the
> overflow here), then how are the page faults still logged in dmesg?

There should be a hardware rate limit for the page faults, e.g. there 
can only be X faults reported in N clock cycles and then a delay is 
inserted.

>>>
>>> The possibility of a repeated IH overflow in between reading the wptr
>>> and updating the rptr is a good point, but how can we detect that at
>>> all? It seems to me like we can't set the OVERFLOW_CLEAR bit at all
>>> then, because we're guaranteed to miss any overflows that happen while
>>> the bit is set.
>>
>> When an IH overflow is signaled we clear that flag by writing 1 into
>> the OVERFLOW_CLEAR bit and skip one entry in the IH ring buffer.
>>
>> What can of course happen is that the IH ring buffer overflows more
>> than this single entry and we process IVs which are potentially
>> corrupted, but we won't miss any additional overflows since we only
>> start processing after resetting the flag.
>>
>> An IH overflow is also something you should *never* see in a
>> production system. This is purely for driver bringup and as fallback
>> when there is a severe incorrect programming of the HW.
>>
>> The only exception of that is page fault handling on MI products
>> because of a hardware bug, to mitigate this we are processing page
>> faults on a separate IH ring on those parts.
>>
>> On all other hw generations the IH should have some rate limit for the
>> number of faults generated per second, so that the CPU is always able
>> to catch up.
>
> I'm wondering if there is another bug in here somewhere. Your
> explanation of how it's supposed to work makes a lot of sense, but from
> what I can tell it doesn't work that way when I test it.
>
> From the printk_ratelimit stats it would seem like >2000 faults arrive
> in less than a second, so perhaps your theory about fault interrupt
> ratelimiting not working is correct (but it's hard for me to verify what
> is going on without the documentation).

I'm going to ping the relevant engineer and putting someone on the task 
to take a look.

Thanks,
Christian.

>
> Regards,
> Friedrich
>
>>
>> Regards,
>> Christian.
>>
>>>
>>> Regards,
>>> Friedrich
>>>
>>>>
>>>> When you clear the overflow again when updating the RPTR you could
>>>> loose another overflow which might have happened in between and so
>>>> potentially process corrupted IVs.
>>>>
>>>> That can trivially crash the system.
>>>>
>>>> Regards,
>>>> Christian.
>>>>
>>>>>   }
>>>>>
>>>>>   static int cik_ih_early_init(void *handle)
>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/cz_ih.c
>>>>> b/drivers/gpu/drm/amd/amdgpu/cz_ih.c
>>>>> index b8c47e0cf37a..076559668573 100644
>>>>> --- a/drivers/gpu/drm/amd/amdgpu/cz_ih.c
>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/cz_ih.c
>>>>> @@ -215,7 +215,7 @@ static u32 cz_ih_get_wptr(struct amdgpu_device
>>>>> *adev,
>>>>>       tmp = RREG32(mmIH_RB_CNTL);
>>>>>       tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, WPTR_OVERFLOW_CLEAR, 1);
>>>>>       WREG32(mmIH_RB_CNTL, tmp);
>>>>> -
>>>>> +    ih->overflow = true;
>>>>>
>>>>>   out:
>>>>>       return (wptr & ih->ptr_mask);
>>>>> @@ -266,7 +266,19 @@ static void cz_ih_decode_iv(struct amdgpu_device
>>>>> *adev,
>>>>>   static void cz_ih_set_rptr(struct amdgpu_device *adev,
>>>>>                  struct amdgpu_ih_ring *ih)
>>>>>   {
>>>>> +    u32 tmp;
>>>>> +
>>>>>       WREG32(mmIH_RB_RPTR, ih->rptr);
>>>>> +
>>>>> +    /* If we overflowed previously (and thus set the OVERFLOW_CLEAR
>>>>> bit),
>>>>> +     * reset it here to detect more overflows if they occur.
>>>>> +     */
>>>>> +    if (ih->overflow) {
>>>>> +        tmp = RREG32(mmIH_RB_CNTL);
>>>>> +        tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, WPTR_OVERFLOW_CLEAR, 
>>>>> 0);
>>>>> +        WREG32(mmIH_RB_CNTL, tmp);
>>>>> +        ih->overflow = false;
>>>>> +    }
>>>>>   }
>>>>>
>>>>>   static int cz_ih_early_init(void *handle)
>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/iceland_ih.c
>>>>> b/drivers/gpu/drm/amd/amdgpu/iceland_ih.c
>>>>> index aecad530b10a..1a5e668643d1 100644
>>>>> --- a/drivers/gpu/drm/amd/amdgpu/iceland_ih.c
>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/iceland_ih.c
>>>>> @@ -214,7 +214,7 @@ static u32 iceland_ih_get_wptr(struct
>>>>> amdgpu_device *adev,
>>>>>       tmp = RREG32(mmIH_RB_CNTL);
>>>>>       tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, WPTR_OVERFLOW_CLEAR, 1);
>>>>>       WREG32(mmIH_RB_CNTL, tmp);
>>>>> -
>>>>> +    ih->overflow = true;
>>>>>
>>>>>   out:
>>>>>       return (wptr & ih->ptr_mask);
>>>>> @@ -265,7 +265,19 @@ static void iceland_ih_decode_iv(struct
>>>>> amdgpu_device *adev,
>>>>>   static void iceland_ih_set_rptr(struct amdgpu_device *adev,
>>>>>                   struct amdgpu_ih_ring *ih)
>>>>>   {
>>>>> +    u32 tmp;
>>>>> +
>>>>>       WREG32(mmIH_RB_RPTR, ih->rptr);
>>>>> +
>>>>> +    /* If we overflowed previously (and thus set the OVERFLOW_CLEAR
>>>>> bit),
>>>>> +     * reset it here to detect more overflows if they occur.
>>>>> +     */
>>>>> +    if (ih->overflow) {
>>>>> +        tmp = RREG32(mmIH_RB_CNTL);
>>>>> +        tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, WPTR_OVERFLOW_CLEAR, 
>>>>> 0);
>>>>> +        WREG32(mmIH_RB_CNTL, tmp);
>>>>> +        ih->overflow = false;
>>>>> +    }
>>>>>   }
>>>>>
>>>>>   static int iceland_ih_early_init(void *handle)
>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/ih_v6_0.c
>>>>> b/drivers/gpu/drm/amd/amdgpu/ih_v6_0.c
>>>>> index d9ed7332d805..ce8f7feec713 100644
>>>>> --- a/drivers/gpu/drm/amd/amdgpu/ih_v6_0.c
>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/ih_v6_0.c
>>>>> @@ -418,6 +418,8 @@ static u32 ih_v6_0_get_wptr(struct amdgpu_device
>>>>> *adev,
>>>>>       tmp = RREG32_NO_KIQ(ih_regs->ih_rb_cntl);
>>>>>       tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, WPTR_OVERFLOW_CLEAR, 1);
>>>>>       WREG32_NO_KIQ(ih_regs->ih_rb_cntl, tmp);
>>>>> +    ih->overflow = true;
>>>>> +
>>>>>   out:
>>>>>       return (wptr & ih->ptr_mask);
>>>>>   }
>>>>> @@ -459,6 +461,7 @@ static void ih_v6_0_irq_rearm(struct
>>>>> amdgpu_device *adev,
>>>>>   static void ih_v6_0_set_rptr(struct amdgpu_device *adev,
>>>>>                      struct amdgpu_ih_ring *ih)
>>>>>   {
>>>>> +    u32 tmp;
>>>>>       struct amdgpu_ih_regs *ih_regs;
>>>>>
>>>>>       if (ih->use_doorbell) {
>>>>> @@ -472,6 +475,16 @@ static void ih_v6_0_set_rptr(struct
>>>>> amdgpu_device *adev,
>>>>>           ih_regs = &ih->ih_regs;
>>>>>           WREG32(ih_regs->ih_rb_rptr, ih->rptr);
>>>>>       }
>>>>> +
>>>>> +    /* If we overflowed previously (and thus set the OVERFLOW_CLEAR
>>>>> bit),
>>>>> +     * reset it here to detect more overflows if they occur.
>>>>> +     */
>>>>> +    if (ih->overflow) {
>>>>> +        tmp = RREG32_NO_KIQ(ih->ih_regs.ih_rb_cntl);
>>>>> +        tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, WPTR_OVERFLOW_CLEAR, 
>>>>> 0);
>>>>> +        WREG32_NO_KIQ(ih->ih_regs.ih_rb_cntl, tmp);
>>>>> +        ih->overflow = false;
>>>>> +    }
>>>>>   }
>>>>>
>>>>>   /**
>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/ih_v6_1.c
>>>>> b/drivers/gpu/drm/amd/amdgpu/ih_v6_1.c
>>>>> index 8fb05eae340a..668788ad34d9 100644
>>>>> --- a/drivers/gpu/drm/amd/amdgpu/ih_v6_1.c
>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/ih_v6_1.c
>>>>> @@ -418,6 +418,8 @@ static u32 ih_v6_1_get_wptr(struct amdgpu_device
>>>>> *adev,
>>>>>       tmp = RREG32_NO_KIQ(ih_regs->ih_rb_cntl);
>>>>>       tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, WPTR_OVERFLOW_CLEAR, 1);
>>>>>       WREG32_NO_KIQ(ih_regs->ih_rb_cntl, tmp);
>>>>> +    ih->overflow = true;
>>>>> +
>>>>>   out:
>>>>>       return (wptr & ih->ptr_mask);
>>>>>   }
>>>>> @@ -459,6 +461,7 @@ static void ih_v6_1_irq_rearm(struct
>>>>> amdgpu_device *adev,
>>>>>   static void ih_v6_1_set_rptr(struct amdgpu_device *adev,
>>>>>                      struct amdgpu_ih_ring *ih)
>>>>>   {
>>>>> +    u32 tmp;
>>>>>       struct amdgpu_ih_regs *ih_regs;
>>>>>
>>>>>       if (ih->use_doorbell) {
>>>>> @@ -472,6 +475,16 @@ static void ih_v6_1_set_rptr(struct
>>>>> amdgpu_device *adev,
>>>>>           ih_regs = &ih->ih_regs;
>>>>>           WREG32(ih_regs->ih_rb_rptr, ih->rptr);
>>>>>       }
>>>>> +
>>>>> +    /* If we overflowed previously (and thus set the OVERFLOW_CLEAR
>>>>> bit),
>>>>> +     * reset it here to detect more overflows if they occur.
>>>>> +     */
>>>>> +    if (ih->overflow) {
>>>>> +        tmp = RREG32_NO_KIQ(ih->ih_regs.ih_rb_cntl);
>>>>> +        tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, WPTR_OVERFLOW_CLEAR, 
>>>>> 0);
>>>>> +        WREG32_NO_KIQ(ih->ih_regs.ih_rb_cntl, tmp);
>>>>> +        ih->overflow = false;
>>>>> +    }
>>>>>   }
>>>>>
>>>>>   /**
>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/navi10_ih.c
>>>>> b/drivers/gpu/drm/amd/amdgpu/navi10_ih.c
>>>>> index e64b33115848..0bdac923cb4d 100644
>>>>> --- a/drivers/gpu/drm/amd/amdgpu/navi10_ih.c
>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/navi10_ih.c
>>>>> @@ -442,6 +442,7 @@ static u32 navi10_ih_get_wptr(struct
>>>>> amdgpu_device *adev,
>>>>>       tmp = RREG32_NO_KIQ(ih_regs->ih_rb_cntl);
>>>>>       tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, WPTR_OVERFLOW_CLEAR, 1);
>>>>>       WREG32_NO_KIQ(ih_regs->ih_rb_cntl, tmp);
>>>>> +    ih->overflow = true;
>>>>>   out:
>>>>>       return (wptr & ih->ptr_mask);
>>>>>   }
>>>>> @@ -483,6 +484,7 @@ static void navi10_ih_irq_rearm(struct
>>>>> amdgpu_device *adev,
>>>>>   static void navi10_ih_set_rptr(struct amdgpu_device *adev,
>>>>>                      struct amdgpu_ih_ring *ih)
>>>>>   {
>>>>> +    u32 tmp;
>>>>>       struct amdgpu_ih_regs *ih_regs;
>>>>>
>>>>>       if (ih == &adev->irq.ih_soft)
>>>>> @@ -499,6 +501,16 @@ static void navi10_ih_set_rptr(struct
>>>>> amdgpu_device *adev,
>>>>>           ih_regs = &ih->ih_regs;
>>>>>           WREG32(ih_regs->ih_rb_rptr, ih->rptr);
>>>>>       }
>>>>> +
>>>>> +    /* If we overflowed previously (and thus set the OVERFLOW_CLEAR
>>>>> bit),
>>>>> +     * reset it here to detect more overflows if they occur.
>>>>> +     */
>>>>> +    if (ih->overflow) {
>>>>> +        tmp = RREG32_NO_KIQ(ih->ih_regs.ih_rb_cntl);
>>>>> +        tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, WPTR_OVERFLOW_CLEAR, 
>>>>> 0);
>>>>> +        WREG32_NO_KIQ(ih->ih_regs.ih_rb_cntl, tmp);
>>>>> +        ih->overflow = false;
>>>>> +    }
>>>>>   }
>>>>>
>>>>>   /**
>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/si_ih.c
>>>>> b/drivers/gpu/drm/amd/amdgpu/si_ih.c
>>>>> index 9a24f17a5750..ff35056d2b54 100644
>>>>> --- a/drivers/gpu/drm/amd/amdgpu/si_ih.c
>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/si_ih.c
>>>>> @@ -119,6 +119,7 @@ static u32 si_ih_get_wptr(struct amdgpu_device
>>>>> *adev,
>>>>>           tmp = RREG32(IH_RB_CNTL);
>>>>>           tmp |= IH_RB_CNTL__WPTR_OVERFLOW_CLEAR_MASK;
>>>>>           WREG32(IH_RB_CNTL, tmp);
>>>>> +        ih->overflow = true;
>>>>>       }
>>>>>       return (wptr & ih->ptr_mask);
>>>>>   }
>>>>> @@ -147,7 +148,18 @@ static void si_ih_decode_iv(struct amdgpu_device
>>>>> *adev,
>>>>>   static void si_ih_set_rptr(struct amdgpu_device *adev,
>>>>>                  struct amdgpu_ih_ring *ih)
>>>>>   {
>>>>> +    u32 tmp;
>>>>> +
>>>>>       WREG32(IH_RB_RPTR, ih->rptr);
>>>>> +
>>>>> +    /* If we overflowed previously (and thus set the OVERFLOW_CLEAR
>>>>> bit),
>>>>> +     * reset it here to detect more overflows if they occur.
>>>>> +     */
>>>>> +    if (ih->overflow) {
>>>>> +        tmp = RREG32(IH_RB_CNTL);
>>>>> +        tmp &= ~IH_RB_CNTL__WPTR_OVERFLOW_CLEAR_MASK;
>>>>> +        WREG32(IH_RB_CNTL, tmp);
>>>>> +    }
>>>>>   }
>>>>>
>>>>>   static int si_ih_early_init(void *handle)
>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/tonga_ih.c
>>>>> b/drivers/gpu/drm/amd/amdgpu/tonga_ih.c
>>>>> index 917707bba7f3..6f5090d3db48 100644
>>>>> --- a/drivers/gpu/drm/amd/amdgpu/tonga_ih.c
>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/tonga_ih.c
>>>>> @@ -218,6 +218,7 @@ static u32 tonga_ih_get_wptr(struct amdgpu_device
>>>>> *adev,
>>>>>       tmp = RREG32(mmIH_RB_CNTL);
>>>>>       tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, WPTR_OVERFLOW_CLEAR, 1);
>>>>>       WREG32(mmIH_RB_CNTL, tmp);
>>>>> +    ih->overflow = true;
>>>>>
>>>>>   out:
>>>>>       return (wptr & ih->ptr_mask);
>>>>> @@ -268,6 +269,8 @@ static void tonga_ih_decode_iv(struct
>>>>> amdgpu_device *adev,
>>>>>   static void tonga_ih_set_rptr(struct amdgpu_device *adev,
>>>>>                     struct amdgpu_ih_ring *ih)
>>>>>   {
>>>>> +    u32 tmp;
>>>>> +
>>>>>       if (ih->use_doorbell) {
>>>>>           /* XXX check if swapping is necessary on BE */
>>>>>           *ih->rptr_cpu = ih->rptr;
>>>>> @@ -275,6 +278,16 @@ static void tonga_ih_set_rptr(struct
>>>>> amdgpu_device *adev,
>>>>>       } else {
>>>>>           WREG32(mmIH_RB_RPTR, ih->rptr);
>>>>>       }
>>>>> +
>>>>> +    /* If we overflowed previously (and thus set the OVERFLOW_CLEAR
>>>>> bit),
>>>>> +     * reset it here to detect more overflows if they occur.
>>>>> +     */
>>>>> +    if (ih->overflow) {
>>>>> +        tmp = RREG32(mmIH_RB_CNTL);
>>>>> +        tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, WPTR_OVERFLOW_CLEAR, 
>>>>> 0);
>>>>> +        WREG32(mmIH_RB_CNTL, tmp);
>>>>> +        ih->overflow = false;
>>>>> +    }
>>>>>   }
>>>>>
>>>>>   static int tonga_ih_early_init(void *handle)
>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/vega10_ih.c
>>>>> b/drivers/gpu/drm/amd/amdgpu/vega10_ih.c
>>>>> index d364c6dd152c..bb005924f194 100644
>>>>> --- a/drivers/gpu/drm/amd/amdgpu/vega10_ih.c
>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/vega10_ih.c
>>>>> @@ -372,6 +372,7 @@ static u32 vega10_ih_get_wptr(struct
>>>>> amdgpu_device *adev,
>>>>>       tmp = RREG32_NO_KIQ(ih_regs->ih_rb_cntl);
>>>>>       tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, WPTR_OVERFLOW_CLEAR, 1);
>>>>>       WREG32_NO_KIQ(ih_regs->ih_rb_cntl, tmp);
>>>>> +    ih->overflow = true;
>>>>>
>>>>>   out:
>>>>>       return (wptr & ih->ptr_mask);
>>>>> @@ -413,6 +414,7 @@ static void vega10_ih_irq_rearm(struct
>>>>> amdgpu_device *adev,
>>>>>   static void vega10_ih_set_rptr(struct amdgpu_device *adev,
>>>>>                      struct amdgpu_ih_ring *ih)
>>>>>   {
>>>>> +    u32 tmp;
>>>>>       struct amdgpu_ih_regs *ih_regs;
>>>>>
>>>>>       if (ih == &adev->irq.ih_soft)
>>>>> @@ -429,6 +431,16 @@ static void vega10_ih_set_rptr(struct
>>>>> amdgpu_device *adev,
>>>>>           ih_regs = &ih->ih_regs;
>>>>>           WREG32(ih_regs->ih_rb_rptr, ih->rptr);
>>>>>       }
>>>>> +
>>>>> +    /* If we overflowed previously (and thus set the OVERFLOW_CLEAR
>>>>> bit),
>>>>> +     * reset it here to detect more overflows if they occur.
>>>>> +     */
>>>>> +    if (ih->overflow) {
>>>>> +        tmp = RREG32_NO_KIQ(ih->ih_regs.ih_rb_cntl);
>>>>> +        tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, WPTR_OVERFLOW_CLEAR, 
>>>>> 0);
>>>>> +        WREG32_NO_KIQ(ih->ih_regs.ih_rb_cntl, tmp);
>>>>> +        ih->overflow = false;
>>>>> +    }
>>>>>   }
>>>>>
>>>>>   /**
>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/vega20_ih.c
>>>>> b/drivers/gpu/drm/amd/amdgpu/vega20_ih.c
>>>>> index ddfc6941f9d5..bb725a970697 100644
>>>>> --- a/drivers/gpu/drm/amd/amdgpu/vega20_ih.c
>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/vega20_ih.c
>>>>> @@ -420,6 +420,7 @@ static u32 vega20_ih_get_wptr(struct
>>>>> amdgpu_device *adev,
>>>>>       tmp = RREG32_NO_KIQ(ih_regs->ih_rb_cntl);
>>>>>       tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, WPTR_OVERFLOW_CLEAR, 1);
>>>>>       WREG32_NO_KIQ(ih_regs->ih_rb_cntl, tmp);
>>>>> +    ih->overflow = true;
>>>>>
>>>>>   out:
>>>>>       return (wptr & ih->ptr_mask);
>>>>> @@ -462,6 +463,7 @@ static void vega20_ih_irq_rearm(struct
>>>>> amdgpu_device *adev,
>>>>>   static void vega20_ih_set_rptr(struct amdgpu_device *adev,
>>>>>                      struct amdgpu_ih_ring *ih)
>>>>>   {
>>>>> +    u32 tmp;
>>>>>       struct amdgpu_ih_regs *ih_regs;
>>>>>
>>>>>       if (ih == &adev->irq.ih_soft)
>>>>> @@ -478,6 +480,16 @@ static void vega20_ih_set_rptr(struct
>>>>> amdgpu_device *adev,
>>>>>           ih_regs = &ih->ih_regs;
>>>>>           WREG32(ih_regs->ih_rb_rptr, ih->rptr);
>>>>>       }
>>>>> +
>>>>> +    /* If we overflowed previously (and thus set the OVERFLOW_CLEAR
>>>>> bit),
>>>>> +     * reset it here to detect more overflows if they occur.
>>>>> +     */
>>>>> +    if (ih->overflow) {
>>>>> +        tmp = RREG32_NO_KIQ(ih->ih_regs.ih_rb_cntl);
>>>>> +        tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, WPTR_OVERFLOW_CLEAR, 
>>>>> 0);
>>>>> +        WREG32_NO_KIQ(ih->ih_regs.ih_rb_cntl, tmp);
>>>>> +        ih->overflow = false;
>>>>> +    }
>>>>>   }
>>>>>
>>>>>   /**
>>>>> -- 
>>>>> 2.43.0
>>>>>
>>>>
>>