[PATCH 1/2] drm/amdgpu: Reset IH OVERFLOW_CLEAR bit after writing rptr

Friedrich Vock friedrich.vock at gmx.de
Tue Jan 16 10:31:24 UTC 2024


On 16.01.24 08:03, Christian König wrote:
> Am 15.01.24 um 12:18 schrieb Friedrich Vock:
>> Adding the original Ccs from the thread since they seemed to be missing
>> in the reply.
>>
>> On 15.01.24 11:55, Christian König wrote:
>>> Am 14.01.24 um 14:00 schrieb Friedrich Vock:
>>>> Allows us to detect subsequent IH ring buffer overflows as well.
>>>
>>> Well that suggested handling here is certainly broken, see below.
>>>
>>>>
>>>> Cc: Joshua Ashton <joshua at froggi.es>
>>>> Cc: Alex Deucher <alexander.deucher at amd.com>
>>>> Cc: stable at vger.kernel.org
>>>>
>>>> Signed-off-by: Friedrich Vock <friedrich.vock at gmx.de>
>>>> ---
>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h  |  2 ++
>>>>   drivers/gpu/drm/amd/amdgpu/cik_ih.c     | 13 +++++++++++++
>>>>   drivers/gpu/drm/amd/amdgpu/cz_ih.c      | 14 +++++++++++++-
>>>>   drivers/gpu/drm/amd/amdgpu/iceland_ih.c | 14 +++++++++++++-
>>>>   drivers/gpu/drm/amd/amdgpu/ih_v6_0.c    | 13 +++++++++++++
>>>>   drivers/gpu/drm/amd/amdgpu/ih_v6_1.c    | 13 +++++++++++++
>>>>   drivers/gpu/drm/amd/amdgpu/navi10_ih.c  | 12 ++++++++++++
>>>>   drivers/gpu/drm/amd/amdgpu/si_ih.c      | 12 ++++++++++++
>>>>   drivers/gpu/drm/amd/amdgpu/tonga_ih.c   | 13 +++++++++++++
>>>>   drivers/gpu/drm/amd/amdgpu/vega10_ih.c  | 12 ++++++++++++
>>>>   drivers/gpu/drm/amd/amdgpu/vega20_ih.c  | 12 ++++++++++++
>>>>   11 files changed, 128 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h
>>>> index 508f02eb0cf8..6041ec727f06 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h
>>>> @@ -69,6 +69,8 @@ struct amdgpu_ih_ring {
>>>>       unsigned        rptr;
>>>>       struct amdgpu_ih_regs    ih_regs;
>>>>
>>>> +    bool overflow;
>>>> +
>>>>       /* For waiting on IH processing at checkpoint. */
>>>>       wait_queue_head_t wait_process;
>>>>       uint64_t        processed_timestamp;
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/cik_ih.c
>>>> b/drivers/gpu/drm/amd/amdgpu/cik_ih.c
>>>> index 6f7c031dd197..807cc30c9e33 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/cik_ih.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/cik_ih.c
>>>> @@ -204,6 +204,7 @@ static u32 cik_ih_get_wptr(struct amdgpu_device
>>>> *adev,
>>>>           tmp = RREG32(mmIH_RB_CNTL);
>>>>           tmp |= IH_RB_CNTL__WPTR_OVERFLOW_CLEAR_MASK;
>>>>           WREG32(mmIH_RB_CNTL, tmp);
>>>> +        ih->overflow = true;
>>>>       }
>>>>       return (wptr & ih->ptr_mask);
>>>>   }
>>>> @@ -274,7 +275,19 @@ static void cik_ih_decode_iv(struct
>>>> amdgpu_device *adev,
>>>>   static void cik_ih_set_rptr(struct amdgpu_device *adev,
>>>>                   struct amdgpu_ih_ring *ih)
>>>>   {
>>>> +    u32 tmp;
>>>> +
>>>>       WREG32(mmIH_RB_RPTR, ih->rptr);
>>>> +
>>>> +    /* If we overflowed previously (and thus set the OVERFLOW_CLEAR
>>>> bit),
>>>> +     * reset it here to detect more overflows if they occur.
>>>> +     */
>>>> +    if (ih->overflow) {
>>>> +        tmp = RREG32(mmIH_RB_CNTL);
>>>> +        tmp &= ~IH_RB_CNTL__WPTR_OVERFLOW_CLEAR_MASK;
>>>> +        WREG32(mmIH_RB_CNTL, tmp);
>>>> +        ih->overflow = false;
>>>> +    }
>>>
>>> Well that is an extremely bad idea. We already reset the overflow
>>> after reading the WPTR.
>>
>> This is not resetting the overflow bit. This is resetting a "clear
>> overflow" bit. I don't have the hardware docs, but the name (and my
>> observations) strongly suggest that setting this bit actually prevents
>> the hardware from setting the overflow bit ever again.
>
> Well that doesn't make any sense at all. The hardware documentation
> clearly states that this bit is write only and should always read as
> zero.
>
> Setting this bit will clear the overflow flag in the WPTR register and
> clearing it has no effect at all.
>
> I could only ping the hw engineer responsible for this block to double
> check if the documentation is somehow outdated, but I really doubt so.
>
I see. I wish I had access to the documentation, but I don't, so all I
can do is tell you what I observe the hardware doing. I've tested this
on both a Steam Deck (OSSYS 5.2.0) and an RX 6700 XT (OSSYS 5.0.3). On
both systems, launching a bunch of shaders that cause page faults leads
to lots of "[gfxhub] page fault" messages in dmesg, followed by an
"amdgpu: IH ring buffer overflow".

If I re-launch the same set of shaders after the GPU has soft-recovered,
the "amdgpu: IH ring buffer overflow" message is missing, even though
the same amount of page faults should've been triggered at roughly the
same rate. Running with this patch applied makes more "amdgpu: IH ring
buffer overflow" messages appear after relaunching the faulting shaders
(but not when processing any non-faulting work).

The only possible conclusion I can draw from this is that clearing that
bit *does* have an effect, and I don't think it's far-fetched to assume
the IH ring buffer overflows still happen after re-launching the
faulting shaders but go undetected so far.

>> Right now, IH overflows, even if they occur repeatedly, only get
>> registered once. If not registering IH overflows can trivially lead to
>> system crashes, it's amdgpu's current handling that is broken.
>
> It's years that we last tested this but according to the HW
> documentation this should work fine.
>
> What could potentially happen is that the IH has silenced the source
> of the overflow. We never implemented resetting those, but in this
> case that here won't help either.
>
If the IH silenced the page faults (which quite clearly cause the
overflow here), then how are the page faults still logged in dmesg?
>>
>> The possibility of a repeated IH overflow in between reading the wptr
>> and updating the rptr is a good point, but how can we detect that at
>> all? It seems to me like we can't set the OVERFLOW_CLEAR bit at all
>> then, because we're guaranteed to miss any overflows that happen while
>> the bit is set.
>
> When an IH overflow is signaled we clear that flag by writing 1 into
> the OVERFLOW_CLEAR bit and skip one entry in the IH ring buffer.
>
> What can of course happen is that the IH ring buffer overflows more
> than this single entry and we process IVs which are potentially
> corrupted, but we won't miss any additional overflows since we only
> start processing after resetting the flag.
>
> An IH overflow is also something you should *never* see in a
> production system. This is purely for driver bringup and as fallback
> when there is a severe incorrect programming of the HW.
>
> The only exception of that is page fault handling on MI products
> because of a hardware bug, to mitigate this we are processing page
> faults on a separate IH ring on those parts.
>
> On all other hw generations the IH should have some rate limit for the
> number of faults generated per second, so that the CPU is always able
> to catch up.

I'm wondering if there is another bug in here somewhere. Your
explanation of how it's supposed to work makes a lot of sense, but from
what I can tell it doesn't work that way when I test it.

 From the printk_ratelimit stats it would seem like >2000 faults arrive
in less than a second, so perhaps your theory about fault interrupt
ratelimiting not working is correct (but it's hard for me to verify what
is going on without the documentation).

Regards,
Friedrich

>
> Regards,
> Christian.
>
>>
>> Regards,
>> Friedrich
>>
>>>
>>> When you clear the overflow again when updating the RPTR you could
>>> loose another overflow which might have happened in between and so
>>> potentially process corrupted IVs.
>>>
>>> That can trivially crash the system.
>>>
>>> Regards,
>>> Christian.
>>>
>>>>   }
>>>>
>>>>   static int cik_ih_early_init(void *handle)
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/cz_ih.c
>>>> b/drivers/gpu/drm/amd/amdgpu/cz_ih.c
>>>> index b8c47e0cf37a..076559668573 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/cz_ih.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/cz_ih.c
>>>> @@ -215,7 +215,7 @@ static u32 cz_ih_get_wptr(struct amdgpu_device
>>>> *adev,
>>>>       tmp = RREG32(mmIH_RB_CNTL);
>>>>       tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, WPTR_OVERFLOW_CLEAR, 1);
>>>>       WREG32(mmIH_RB_CNTL, tmp);
>>>> -
>>>> +    ih->overflow = true;
>>>>
>>>>   out:
>>>>       return (wptr & ih->ptr_mask);
>>>> @@ -266,7 +266,19 @@ static void cz_ih_decode_iv(struct amdgpu_device
>>>> *adev,
>>>>   static void cz_ih_set_rptr(struct amdgpu_device *adev,
>>>>                  struct amdgpu_ih_ring *ih)
>>>>   {
>>>> +    u32 tmp;
>>>> +
>>>>       WREG32(mmIH_RB_RPTR, ih->rptr);
>>>> +
>>>> +    /* If we overflowed previously (and thus set the OVERFLOW_CLEAR
>>>> bit),
>>>> +     * reset it here to detect more overflows if they occur.
>>>> +     */
>>>> +    if (ih->overflow) {
>>>> +        tmp = RREG32(mmIH_RB_CNTL);
>>>> +        tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, WPTR_OVERFLOW_CLEAR, 0);
>>>> +        WREG32(mmIH_RB_CNTL, tmp);
>>>> +        ih->overflow = false;
>>>> +    }
>>>>   }
>>>>
>>>>   static int cz_ih_early_init(void *handle)
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/iceland_ih.c
>>>> b/drivers/gpu/drm/amd/amdgpu/iceland_ih.c
>>>> index aecad530b10a..1a5e668643d1 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/iceland_ih.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/iceland_ih.c
>>>> @@ -214,7 +214,7 @@ static u32 iceland_ih_get_wptr(struct
>>>> amdgpu_device *adev,
>>>>       tmp = RREG32(mmIH_RB_CNTL);
>>>>       tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, WPTR_OVERFLOW_CLEAR, 1);
>>>>       WREG32(mmIH_RB_CNTL, tmp);
>>>> -
>>>> +    ih->overflow = true;
>>>>
>>>>   out:
>>>>       return (wptr & ih->ptr_mask);
>>>> @@ -265,7 +265,19 @@ static void iceland_ih_decode_iv(struct
>>>> amdgpu_device *adev,
>>>>   static void iceland_ih_set_rptr(struct amdgpu_device *adev,
>>>>                   struct amdgpu_ih_ring *ih)
>>>>   {
>>>> +    u32 tmp;
>>>> +
>>>>       WREG32(mmIH_RB_RPTR, ih->rptr);
>>>> +
>>>> +    /* If we overflowed previously (and thus set the OVERFLOW_CLEAR
>>>> bit),
>>>> +     * reset it here to detect more overflows if they occur.
>>>> +     */
>>>> +    if (ih->overflow) {
>>>> +        tmp = RREG32(mmIH_RB_CNTL);
>>>> +        tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, WPTR_OVERFLOW_CLEAR, 0);
>>>> +        WREG32(mmIH_RB_CNTL, tmp);
>>>> +        ih->overflow = false;
>>>> +    }
>>>>   }
>>>>
>>>>   static int iceland_ih_early_init(void *handle)
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/ih_v6_0.c
>>>> b/drivers/gpu/drm/amd/amdgpu/ih_v6_0.c
>>>> index d9ed7332d805..ce8f7feec713 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/ih_v6_0.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/ih_v6_0.c
>>>> @@ -418,6 +418,8 @@ static u32 ih_v6_0_get_wptr(struct amdgpu_device
>>>> *adev,
>>>>       tmp = RREG32_NO_KIQ(ih_regs->ih_rb_cntl);
>>>>       tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, WPTR_OVERFLOW_CLEAR, 1);
>>>>       WREG32_NO_KIQ(ih_regs->ih_rb_cntl, tmp);
>>>> +    ih->overflow = true;
>>>> +
>>>>   out:
>>>>       return (wptr & ih->ptr_mask);
>>>>   }
>>>> @@ -459,6 +461,7 @@ static void ih_v6_0_irq_rearm(struct
>>>> amdgpu_device *adev,
>>>>   static void ih_v6_0_set_rptr(struct amdgpu_device *adev,
>>>>                      struct amdgpu_ih_ring *ih)
>>>>   {
>>>> +    u32 tmp;
>>>>       struct amdgpu_ih_regs *ih_regs;
>>>>
>>>>       if (ih->use_doorbell) {
>>>> @@ -472,6 +475,16 @@ static void ih_v6_0_set_rptr(struct
>>>> amdgpu_device *adev,
>>>>           ih_regs = &ih->ih_regs;
>>>>           WREG32(ih_regs->ih_rb_rptr, ih->rptr);
>>>>       }
>>>> +
>>>> +    /* If we overflowed previously (and thus set the OVERFLOW_CLEAR
>>>> bit),
>>>> +     * reset it here to detect more overflows if they occur.
>>>> +     */
>>>> +    if (ih->overflow) {
>>>> +        tmp = RREG32_NO_KIQ(ih->ih_regs.ih_rb_cntl);
>>>> +        tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, WPTR_OVERFLOW_CLEAR, 0);
>>>> +        WREG32_NO_KIQ(ih->ih_regs.ih_rb_cntl, tmp);
>>>> +        ih->overflow = false;
>>>> +    }
>>>>   }
>>>>
>>>>   /**
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/ih_v6_1.c
>>>> b/drivers/gpu/drm/amd/amdgpu/ih_v6_1.c
>>>> index 8fb05eae340a..668788ad34d9 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/ih_v6_1.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/ih_v6_1.c
>>>> @@ -418,6 +418,8 @@ static u32 ih_v6_1_get_wptr(struct amdgpu_device
>>>> *adev,
>>>>       tmp = RREG32_NO_KIQ(ih_regs->ih_rb_cntl);
>>>>       tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, WPTR_OVERFLOW_CLEAR, 1);
>>>>       WREG32_NO_KIQ(ih_regs->ih_rb_cntl, tmp);
>>>> +    ih->overflow = true;
>>>> +
>>>>   out:
>>>>       return (wptr & ih->ptr_mask);
>>>>   }
>>>> @@ -459,6 +461,7 @@ static void ih_v6_1_irq_rearm(struct
>>>> amdgpu_device *adev,
>>>>   static void ih_v6_1_set_rptr(struct amdgpu_device *adev,
>>>>                      struct amdgpu_ih_ring *ih)
>>>>   {
>>>> +    u32 tmp;
>>>>       struct amdgpu_ih_regs *ih_regs;
>>>>
>>>>       if (ih->use_doorbell) {
>>>> @@ -472,6 +475,16 @@ static void ih_v6_1_set_rptr(struct
>>>> amdgpu_device *adev,
>>>>           ih_regs = &ih->ih_regs;
>>>>           WREG32(ih_regs->ih_rb_rptr, ih->rptr);
>>>>       }
>>>> +
>>>> +    /* If we overflowed previously (and thus set the OVERFLOW_CLEAR
>>>> bit),
>>>> +     * reset it here to detect more overflows if they occur.
>>>> +     */
>>>> +    if (ih->overflow) {
>>>> +        tmp = RREG32_NO_KIQ(ih->ih_regs.ih_rb_cntl);
>>>> +        tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, WPTR_OVERFLOW_CLEAR, 0);
>>>> +        WREG32_NO_KIQ(ih->ih_regs.ih_rb_cntl, tmp);
>>>> +        ih->overflow = false;
>>>> +    }
>>>>   }
>>>>
>>>>   /**
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/navi10_ih.c
>>>> b/drivers/gpu/drm/amd/amdgpu/navi10_ih.c
>>>> index e64b33115848..0bdac923cb4d 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/navi10_ih.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/navi10_ih.c
>>>> @@ -442,6 +442,7 @@ static u32 navi10_ih_get_wptr(struct
>>>> amdgpu_device *adev,
>>>>       tmp = RREG32_NO_KIQ(ih_regs->ih_rb_cntl);
>>>>       tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, WPTR_OVERFLOW_CLEAR, 1);
>>>>       WREG32_NO_KIQ(ih_regs->ih_rb_cntl, tmp);
>>>> +    ih->overflow = true;
>>>>   out:
>>>>       return (wptr & ih->ptr_mask);
>>>>   }
>>>> @@ -483,6 +484,7 @@ static void navi10_ih_irq_rearm(struct
>>>> amdgpu_device *adev,
>>>>   static void navi10_ih_set_rptr(struct amdgpu_device *adev,
>>>>                      struct amdgpu_ih_ring *ih)
>>>>   {
>>>> +    u32 tmp;
>>>>       struct amdgpu_ih_regs *ih_regs;
>>>>
>>>>       if (ih == &adev->irq.ih_soft)
>>>> @@ -499,6 +501,16 @@ static void navi10_ih_set_rptr(struct
>>>> amdgpu_device *adev,
>>>>           ih_regs = &ih->ih_regs;
>>>>           WREG32(ih_regs->ih_rb_rptr, ih->rptr);
>>>>       }
>>>> +
>>>> +    /* If we overflowed previously (and thus set the OVERFLOW_CLEAR
>>>> bit),
>>>> +     * reset it here to detect more overflows if they occur.
>>>> +     */
>>>> +    if (ih->overflow) {
>>>> +        tmp = RREG32_NO_KIQ(ih->ih_regs.ih_rb_cntl);
>>>> +        tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, WPTR_OVERFLOW_CLEAR, 0);
>>>> +        WREG32_NO_KIQ(ih->ih_regs.ih_rb_cntl, tmp);
>>>> +        ih->overflow = false;
>>>> +    }
>>>>   }
>>>>
>>>>   /**
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/si_ih.c
>>>> b/drivers/gpu/drm/amd/amdgpu/si_ih.c
>>>> index 9a24f17a5750..ff35056d2b54 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/si_ih.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/si_ih.c
>>>> @@ -119,6 +119,7 @@ static u32 si_ih_get_wptr(struct amdgpu_device
>>>> *adev,
>>>>           tmp = RREG32(IH_RB_CNTL);
>>>>           tmp |= IH_RB_CNTL__WPTR_OVERFLOW_CLEAR_MASK;
>>>>           WREG32(IH_RB_CNTL, tmp);
>>>> +        ih->overflow = true;
>>>>       }
>>>>       return (wptr & ih->ptr_mask);
>>>>   }
>>>> @@ -147,7 +148,18 @@ static void si_ih_decode_iv(struct amdgpu_device
>>>> *adev,
>>>>   static void si_ih_set_rptr(struct amdgpu_device *adev,
>>>>                  struct amdgpu_ih_ring *ih)
>>>>   {
>>>> +    u32 tmp;
>>>> +
>>>>       WREG32(IH_RB_RPTR, ih->rptr);
>>>> +
>>>> +    /* If we overflowed previously (and thus set the OVERFLOW_CLEAR
>>>> bit),
>>>> +     * reset it here to detect more overflows if they occur.
>>>> +     */
>>>> +    if (ih->overflow) {
>>>> +        tmp = RREG32(IH_RB_CNTL);
>>>> +        tmp &= ~IH_RB_CNTL__WPTR_OVERFLOW_CLEAR_MASK;
>>>> +        WREG32(IH_RB_CNTL, tmp);
>>>> +    }
>>>>   }
>>>>
>>>>   static int si_ih_early_init(void *handle)
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/tonga_ih.c
>>>> b/drivers/gpu/drm/amd/amdgpu/tonga_ih.c
>>>> index 917707bba7f3..6f5090d3db48 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/tonga_ih.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/tonga_ih.c
>>>> @@ -218,6 +218,7 @@ static u32 tonga_ih_get_wptr(struct amdgpu_device
>>>> *adev,
>>>>       tmp = RREG32(mmIH_RB_CNTL);
>>>>       tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, WPTR_OVERFLOW_CLEAR, 1);
>>>>       WREG32(mmIH_RB_CNTL, tmp);
>>>> +    ih->overflow = true;
>>>>
>>>>   out:
>>>>       return (wptr & ih->ptr_mask);
>>>> @@ -268,6 +269,8 @@ static void tonga_ih_decode_iv(struct
>>>> amdgpu_device *adev,
>>>>   static void tonga_ih_set_rptr(struct amdgpu_device *adev,
>>>>                     struct amdgpu_ih_ring *ih)
>>>>   {
>>>> +    u32 tmp;
>>>> +
>>>>       if (ih->use_doorbell) {
>>>>           /* XXX check if swapping is necessary on BE */
>>>>           *ih->rptr_cpu = ih->rptr;
>>>> @@ -275,6 +278,16 @@ static void tonga_ih_set_rptr(struct
>>>> amdgpu_device *adev,
>>>>       } else {
>>>>           WREG32(mmIH_RB_RPTR, ih->rptr);
>>>>       }
>>>> +
>>>> +    /* If we overflowed previously (and thus set the OVERFLOW_CLEAR
>>>> bit),
>>>> +     * reset it here to detect more overflows if they occur.
>>>> +     */
>>>> +    if (ih->overflow) {
>>>> +        tmp = RREG32(mmIH_RB_CNTL);
>>>> +        tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, WPTR_OVERFLOW_CLEAR, 0);
>>>> +        WREG32(mmIH_RB_CNTL, tmp);
>>>> +        ih->overflow = false;
>>>> +    }
>>>>   }
>>>>
>>>>   static int tonga_ih_early_init(void *handle)
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/vega10_ih.c
>>>> b/drivers/gpu/drm/amd/amdgpu/vega10_ih.c
>>>> index d364c6dd152c..bb005924f194 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/vega10_ih.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/vega10_ih.c
>>>> @@ -372,6 +372,7 @@ static u32 vega10_ih_get_wptr(struct
>>>> amdgpu_device *adev,
>>>>       tmp = RREG32_NO_KIQ(ih_regs->ih_rb_cntl);
>>>>       tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, WPTR_OVERFLOW_CLEAR, 1);
>>>>       WREG32_NO_KIQ(ih_regs->ih_rb_cntl, tmp);
>>>> +    ih->overflow = true;
>>>>
>>>>   out:
>>>>       return (wptr & ih->ptr_mask);
>>>> @@ -413,6 +414,7 @@ static void vega10_ih_irq_rearm(struct
>>>> amdgpu_device *adev,
>>>>   static void vega10_ih_set_rptr(struct amdgpu_device *adev,
>>>>                      struct amdgpu_ih_ring *ih)
>>>>   {
>>>> +    u32 tmp;
>>>>       struct amdgpu_ih_regs *ih_regs;
>>>>
>>>>       if (ih == &adev->irq.ih_soft)
>>>> @@ -429,6 +431,16 @@ static void vega10_ih_set_rptr(struct
>>>> amdgpu_device *adev,
>>>>           ih_regs = &ih->ih_regs;
>>>>           WREG32(ih_regs->ih_rb_rptr, ih->rptr);
>>>>       }
>>>> +
>>>> +    /* If we overflowed previously (and thus set the OVERFLOW_CLEAR
>>>> bit),
>>>> +     * reset it here to detect more overflows if they occur.
>>>> +     */
>>>> +    if (ih->overflow) {
>>>> +        tmp = RREG32_NO_KIQ(ih->ih_regs.ih_rb_cntl);
>>>> +        tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, WPTR_OVERFLOW_CLEAR, 0);
>>>> +        WREG32_NO_KIQ(ih->ih_regs.ih_rb_cntl, tmp);
>>>> +        ih->overflow = false;
>>>> +    }
>>>>   }
>>>>
>>>>   /**
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/vega20_ih.c
>>>> b/drivers/gpu/drm/amd/amdgpu/vega20_ih.c
>>>> index ddfc6941f9d5..bb725a970697 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/vega20_ih.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/vega20_ih.c
>>>> @@ -420,6 +420,7 @@ static u32 vega20_ih_get_wptr(struct
>>>> amdgpu_device *adev,
>>>>       tmp = RREG32_NO_KIQ(ih_regs->ih_rb_cntl);
>>>>       tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, WPTR_OVERFLOW_CLEAR, 1);
>>>>       WREG32_NO_KIQ(ih_regs->ih_rb_cntl, tmp);
>>>> +    ih->overflow = true;
>>>>
>>>>   out:
>>>>       return (wptr & ih->ptr_mask);
>>>> @@ -462,6 +463,7 @@ static void vega20_ih_irq_rearm(struct
>>>> amdgpu_device *adev,
>>>>   static void vega20_ih_set_rptr(struct amdgpu_device *adev,
>>>>                      struct amdgpu_ih_ring *ih)
>>>>   {
>>>> +    u32 tmp;
>>>>       struct amdgpu_ih_regs *ih_regs;
>>>>
>>>>       if (ih == &adev->irq.ih_soft)
>>>> @@ -478,6 +480,16 @@ static void vega20_ih_set_rptr(struct
>>>> amdgpu_device *adev,
>>>>           ih_regs = &ih->ih_regs;
>>>>           WREG32(ih_regs->ih_rb_rptr, ih->rptr);
>>>>       }
>>>> +
>>>> +    /* If we overflowed previously (and thus set the OVERFLOW_CLEAR
>>>> bit),
>>>> +     * reset it here to detect more overflows if they occur.
>>>> +     */
>>>> +    if (ih->overflow) {
>>>> +        tmp = RREG32_NO_KIQ(ih->ih_regs.ih_rb_cntl);
>>>> +        tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, WPTR_OVERFLOW_CLEAR, 0);
>>>> +        WREG32_NO_KIQ(ih->ih_regs.ih_rb_cntl, tmp);
>>>> +        ih->overflow = false;
>>>> +    }
>>>>   }
>>>>
>>>>   /**
>>>> --
>>>> 2.43.0
>>>>
>>>
>


More information about the amd-gfx mailing list