Re: 回复: [REGRESSION] amdgpu: async system error exception from hdp_v5_0_flush_hdp()
Alex Deucher
alexdeucher at gmail.com
Thu Apr 24 15:44:14 UTC 2025
On Tue, Apr 22, 2025 at 11:59 AM Alexey Klimov <alexey.klimov at linaro.org> wrote:
>
> On Tue Apr 22, 2025 at 2:00 PM BST, Alex Deucher wrote:
> > On Mon, Apr 21, 2025 at 10:21 PM Alexey Klimov <alexey.klimov at linaro.org> wrote:
> >>
> >> On Thu Apr 17, 2025 at 2:08 PM BST, Alex Deucher wrote:
> >> > On Wed, Apr 16, 2025 at 8:43 PM Fugang Duan <fugang.duan at cixtech.com> wrote:
> >> >>
> >> >> 发件人: Alex Deucher <alexdeucher at gmail.com> 发送时间: 2025年4月16日 22:49
> >> >> >收件人: Alexey Klimov <alexey.klimov at linaro.org>
> >> >> >On Wed, Apr 16, 2025 at 9:48 AM Alexey Klimov <alexey.klimov at linaro.org> wrote:
> >> >> >>
> >> >> >> On Wed Apr 16, 2025 at 4:12 AM BST, Fugang Duan wrote:
> >> >> >> > 发件人: Alexey Klimov <alexey.klimov at linaro.org> 发送时间: 2025年4月16
> >> >> >日 2:28
> >> >> >> >>#regzbot introduced: v6.12..v6.13
> >> >> >> >>The only change related to hdp_v5_0_flush_hdp() was
> >> >> >> >>cf424020e040 drm/amdgpu/hdp5.0: do a posting read when flushing HDP
> >> >> >> >>
> >> >> >> >>Reverting that commit ^^ did help and resolved that problem. Before
>
> [..]
>
> >> > OK. that patch won't change anything then. Can you try this patch instead?
> >>
> >> Config I am using is basically defconfig wrt memory parameters, yeah, i use 4k.
> >>
> >> So I tested that patch, thank you, and some other different configurations --
> >> nothing helped. Exactly the same behaviour with the same backtrace.
> >
> > Did you test the first (4k check) or the second (don't remap on ARM) patch?
>
> The second one. I think you mentioned that first one won't help for 4k pages.
>
>
> >> So it seems that it is firmware problem after all?
> >
> > There is no GPU firmware involved in this operation. It's just a
> > posted write. E.g., we write to a register to flush the HDP write
> > queue and then read the register back to make sure the write posted.
> > If the second patch didn't help, then perhaps there is some issue with
> > MMIO access on your platform?
>
> I didn't mean GPU firmware at all. I only had uefi/EL3 firmwares in mind.
>
> Completely out of the blue, based on nothing, do you think that
> adding delay/some mem barrier between write and read might help?
> I wonder if host data path code should be executed during common desktop
> usage as a common user then why it doesn't break later. But yeah, I also
> think this is this motherboard problem. Thank you.
I think I found the problem. The previous patch wasn't doing what I
expected. Please try this patch instead.
Thanks,
Alex
>
> Thanks,
> Alexey
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-drm-amdgpu-only-remap-HDP-registers-on-X86_64.patch
Type: text/x-patch
Size: 19025 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20250424/9b95f429/attachment-0001.bin>
More information about the amd-gfx
mailing list