Re: 回复: [REGRESSION] amdgpu: async system error exception from hdp_v5_0_flush_hdp()

Alex Deucher alexdeucher at gmail.com
Tue Apr 22 13:00:49 UTC 2025


On Mon, Apr 21, 2025 at 10:21 PM Alexey Klimov <alexey.klimov at linaro.org> wrote:
>
> On Thu Apr 17, 2025 at 2:08 PM BST, Alex Deucher wrote:
> > On Wed, Apr 16, 2025 at 8:43 PM Fugang Duan <fugang.duan at cixtech.com> wrote:
> >>
> >> 发件人: Alex Deucher <alexdeucher at gmail.com> 发送时间: 2025年4月16日 22:49
> >> >收件人: Alexey Klimov <alexey.klimov at linaro.org>
> >> >On Wed, Apr 16, 2025 at 9:48 AM Alexey Klimov <alexey.klimov at linaro.org> wrote:
> >> >>
> >> >> On Wed Apr 16, 2025 at 4:12 AM BST, Fugang Duan wrote:
> >> >> > 发件人: Alexey Klimov <alexey.klimov at linaro.org> 发送时间: 2025年4月16
> >> >日 2:28
> >> >> >>#regzbot introduced: v6.12..v6.13
> >> >>
> >> >> [..]
> >> >>
> >> >> >>The only change related to hdp_v5_0_flush_hdp() was
> >> >> >>cf424020e040 drm/amdgpu/hdp5.0: do a posting read when flushing HDP
> >> >> >>
> >> >> >>Reverting that commit ^^ did help and resolved that problem. Before
> >> >> >>sending revert as-is I was interested to know if there supposed to
> >> >> >>be a proper fix for this or maybe someone is interested to debug this or
> >> >have any suggestions.
> >> >> >>
> >> >> > Can you revert the change and try again
> >> >> > https://gitlab.com/linux-kernel/linux/-/commit/cf424020e040be35df05b
> >> >> > 682b546b255e74a420f
> >> >>
> >> >> Please read my email in the first place.
> >> >> Let me quote just in case:
> >> >>
> >> >> >The only change related to hdp_v5_0_flush_hdp() was
> >> >> >cf424020e040 drm/amdgpu/hdp5.0: do a posting read when flushing HDP
> >> >>
> >> >> >Reverting that commit ^^ did help and resolved that problem.
> >> >
> >> >We can't really revert the change as that will lead to coherency problems.  What
> >> >is the page size on your system?  Does the attached patch fix it?
> >> >
> >> >Alex
> >> >
> >> 4K page size.  We can try the fix if we got the environment.
> >
> > OK.  that patch won't change anything then.  Can you try this patch instead?
>
> Config I am using is basically defconfig wrt memory parameters, yeah, i use 4k.
>
> So I tested that patch, thank you, and some other different configurations --
> nothing helped. Exactly the same behaviour with the same backtrace.

Did you test the first (4k check) or the second (don't remap on ARM) patch?

>
> So it seems that it is firmware problem after all?

There is no GPU firmware involved in this operation.  It's just a
posted write.  E.g., we write to a register to flush the HDP write
queue and then read the register back to make sure the write posted.
If the second patch didn't help, then perhaps there is some issue with
MMIO access on your platform?

Alex

>
> Thanks,
> Alexey


More information about the amd-gfx mailing list