[Bug 204241] amdgpu fails to resume from suspend

Fri Oct 11 18:33:10 UTC 2019

https://bugzilla.kernel.org/show_bug.cgi?id=204241

Ahzo at tutanota.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
 Attachment #285349|0                           |1
        is obsolete|                            |

--- Comment #20 from Ahzo at tutanota.com ---
Created attachment 285469
  --> https://bugzilla.kernel.org/attachment.cgi?id=285469&action=edit
Patch to fix the resume failures

(In reply to Alex Deucher from comment #17)
> I'm not sure I understand why the patch helps.  You are just changing the
> order of two memory allocations.  The order shouldn't matter.

My hypothesis is that the order here is not the root cause of the problem, but
rather affects the likelihood of that manifesting itself.
This is based on the fact that I have seen a resume failure typical for this
bug on linux 5.0 once, but I'm unable to reproduce it with that version.

As commit 533aed278afe apparently makes the failures much more likely to
happen, it provides an opportunity to debug this further by backporting it to
older linux versions.
Doing that for versions down to linux 4.15 exposes the resume failures, but not
on linux 4.14.

A bisection between these two, while backporting 533aed278afe on every step,
lead to commit 2a91f272e34c, which failed to boot and thus had to be skipped,
and:
commit e0128efb08b3d628d767ec8578e77cdd7ecc8f81
Author: James Zhu <James.Zhu at amd.com>
Date:   Fri Sep 29 16:42:27 2017 -0400

    drm/amdgpu: add uvd enc ib test

    Generate create/destroy messages to test UVD encode indirect buffer
function.
    And enable UVD encode IB test during device initialization.

    Signed-off-by: James Zhu <James.Zhu at amd.com>
    Reviewed-and-Tested-by: Leo Liu <leo.liu at amd.com>
    Reviewed-by: Christian König <christian.koenig at amd.com>
    Signed-off-by: Alex Deucher <alexander.deucher at amd.com>

This looks like a likely root cause. Indeed, adding 'return 0;' at the
beginning of uvd_v6_0_enc_ring_test_ib makes the problem unreproducible, even
on the latest linux 5.4-rc2.

Comparing with amdgpu_uvd_get_{create,destroy}_msg shows that these use 0 as
dummy GPU pointer, while uvd_v6_0_enc_get_{create,destroy}_msg use a real GPU
memory address.
Changing them to also use 0 as dummy pointer, as is done in the attached patch,
actually fixes the resume failures.

Maybe a similar change should also be made for UVD 7.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.