WARNING in amdgpu_sync_keep_later / dma_fence_is_later should be rate limited

Alex Deucher alexdeucher at gmail.com
Thu Sep 21 21:30:47 UTC 2023


On Thu, Sep 21, 2023 at 4:21 PM Rafał Miłecki <zajec5 at gmail.com> wrote:
>
> On 21.09.2023 21:52, Deucher, Alexander wrote:
> >> backporting commit 187916e6ed9d ("drm/amdgpu: install stub fence into
> >> potential unused fence pointers") to stable kernels resulted in lots of
> >> WARNINGs on some devices. In my case I was getting 3 WARNINGs per
> >> second (~150 lines logged every second). Commit ended up being reverted for
> >> stable but it exposed a potential problem. My messages log size was reaching
> >> gigabytes and was running my /tmp/ out of space.
> >>
> >> Could someone take a look at amdgpu_sync_keep_later / dma_fence_is_later
> >> and make sure its logging is rate limited to avoid such situations in the future,
> >> please?
> >>
> >> Revert in linux-5.15.x:
> >> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=li
> >> nux-5.15.y&id=fae2d591f3cb31f722c7f065acf586830eab8c2a
> >>
> >> openSUSE bug report:
> >> https://bugzilla.opensuse.org/show_bug.cgi?id=1215523
> >
> > These patches were never intended for stable.  They were picked up by Sasha's stable autoselect tools and automatically applied to stable kernels.
>
> Are you saying massive WARNINGs in dma_fence_is_later() can't happen
> in any other case? I understand it was an incorrect backport action but
> I thought we may learn from it and still add some rate limit.

All of the current places where that function is used check the
contexts before calling it so it should be safe as is in the tree.
That said, something like this could potentially happen again.  I
don't think using WARN_ON_RATELIMIT() would be a problem.

Alex


More information about the amd-gfx mailing list