[Bug 204241] amdgpu fails to resume from suspend

bugzilla-daemon at bugzilla.kernel.org bugzilla-daemon at bugzilla.kernel.org
Sat Oct 5 00:08:04 UTC 2019


Ahzo at tutanota.com changed:

           What    |Removed                     |Added
                 CC|                            |Ahzo at tutanota.com

--- Comment #14 from Ahzo at tutanota.com ---
Created attachment 285349
  --> https://bugzilla.kernel.org/attachment.cgi?id=285349&action=edit
Patch to prevent frequent resume failures

While this issue happens rather randomly, it can be quite reliably reproduced
on linux 5.2 and later by performing successive suspend-resume cycles.
Usually the error occurs after less than 10 cycles, but occasionally only after
more than 20. Thus one can use the following command to reproduce it almost
$ for i in $(seq 30); do sudo rtcwake -m mem -s 5; sleep 15; done

A bisection using this method lead to:
commit 533aed278afeaa68bb5d0600856ab02268cfa3b8
Author: Andrey Grodzovsky <andrey.grodzovsky at amd.com>
Date:   Wed Mar 6 16:16:28 2019 -0500

    drm/amdgpu: Move IB pool init and fini v2

    Using SDMA for TLB invalidation in certain ASICs exposed a problem
    of IB pool not being ready while SDMA already up on Init and already
    shutt down while SDMA still running on Fini. This caused
    IB allocation failure. Temproary fix was commited into a
    bringup branch but this is the generic fix.

    Init IB pool rigth after GMC is ready but before SDMA is ready.
    Do th opposite for Fini.

    v2: Remove restriction on SDMA early init and move amdgpu_ib_pool_fini

    Reviewed-by: Christian K├Ânig <christian.koenig at amd.com>
    Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky at amd.com>
    Signed-off-by: Alex Deucher <alexander.deucher at amd.com>

Reverting this commit makes the problem unreproducible with above command.

Another way to prevent these frequent resume failures, while preserving the
intention of this commit, is to simply call amdgpu_ib_pool_init directly after
calling amdgpu_ucode_create_bo instead of directly before that. Attached is a
patch doing it that way.

You are receiving this mail because:
You are watching the assignee of the bug.

More information about the dri-devel mailing list