[PATCH] drm/amdgpu: Fix S3 resume failre.

Fri Jul 20 13:52:21 UTC 2018

On 07/19/2018 02:30 PM, Alex Deucher wrote:
> On Thu, Jul 19, 2018 at 1:07 PM, Andrey Grodzovsky
> <Andrey.Grodzovsky at amd.com> wrote:
>>
>> On 07/19/2018 12:59 PM, Michel Dänzer wrote:
>>> On 2018-07-19 06:53 PM, Andrey Grodzovsky wrote:
>>>>
>>>> On 07/19/2018 12:47 PM, Michel Dänzer wrote:
>>>>> On 2018-07-19 06:33 PM, Andrey Grodzovsky wrote:
>>>>>> On 07/19/2018 11:39 AM, Ville Syrjälä wrote:
>>>>>>> On Thu, Jul 19, 2018 at 11:19:56AM -0400, Andrey Grodzovsky wrote:
>>>>>>>> Problem:
>>>>>>>> FB is still not unpinned during the first run of amdgpu_bo_evict_vram
>>>>>>>> and so it's left for the second run, but during second run the SDMA
>>>>>>>> for
>>>>>>>> moving buffer around already disabled and you have to do
>>>>>>>> it with CPU, but FB is not in visible VRAM and hence the eviction
>>>>>>>> failure
>>>>>>>> leading later to resume failure.
>>>>>>>>
>>>>>>>> Fix:
>>>>>>>> When DAL in use get a pointer to FB from crtc->primary->state rather
>>>>>>>> then from crtc->primary which is not set for DAL since it supports
>>>>>>>> atomic KMS.
>>>>>>>>
>>>>>>>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107065
>>>>>>>> Fixes e00fb85 drm: Stop updating plane->crtc/fb/old_fb on atomic
>>>>>>>> drivers
>>>>>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky at amd.com>
>>>>>>>> ---
>>>>>>>>      drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 ++-
>>>>>>>>      1 file changed, 2 insertions(+), 1 deletion(-)
>>>>>>>>
>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>>>>> index 709e4a3..dd9ebf7 100644
>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>>>>> @@ -2642,7 +2642,8 @@ int amdgpu_device_suspend(struct drm_device
>>>>>>>> *dev, bool suspend, bool fbcon)
>>>>>>>>          /* unpin the front buffers and cursors */
>>>>>>>>          list_for_each_entry(crtc, &dev->mode_config.crtc_list, head)
>>>>>>>> {
>>>>>>>>              struct amdgpu_crtc *amdgpu_crtc = to_amdgpu_crtc(crtc);
>>>>>>>> -        struct drm_framebuffer *fb = crtc->primary->fb;
>>>>>>>> +         struct drm_framebuffer *fb =
>>>>>>>> amdgpu_device_has_dc_support(adev) ?
>>>>>>>> +                 crtc->primary->state->fb : crtc->primary->fb;
>>>>>>> So apparently you haven't yet turned off the planes here. If I'm
>>>>>>> reading things right amdgpu_device_ip_suspend() should end up doing
>>>>>>> that through drm_atomic_helper_suspend(). So it looks like like now
>>>>>>> you'll end up unpinning the same bos twice. Doesn't that mess up
>>>>>>> some kind of refcount or something?
>>>>>> amdgpu_bo_unpin has a guard against that, amdgpu_bo_unreserve is less
>>>>>> clear.
>>>>> BO reservation shouldn't an issue here, BOs are only reserved for a
>>>>> short time around (un)pinning them.
>>>>>
>>>>>
>>>>>>> To me it would seem better to susped the display before trying
>>>>>>> to evict the bos.
>>>>>> Yea, i was aware of that and indeed DAL shouldn't rely on the code in
>>>>>> amdgpu_device_suspend to unpin
>>>>>> front buffer and cursor since the atomic helper should do it. Problem
>>>>>> is
>>>>>> that during amdgpu_device_ip_suspend
>>>>>> the SDMA engine gets suspended too, so you have to embed another
>>>>>> eviction in between, after display is suspended but before
>>>>>> SDMA and this forces ordering between them which kind of already in
>>>>>> place (amd_ip_block_type) but still it's an extra constrain.
>>>>> Ville's point (which I basically agree with) is that the display
>>>>> hardware should be turned off before evicting VRAM the first time, in
>>>>> which case no second eviction should be necessary (for this purpose).
>>>> Display HW is turned off as part of all IPs in a loop inside
>>>> amdgpu_device_ip_suspend.
>>>> Are you suggesting to extract the  display HW turn off from inside
>>>> amdgpu_device_ip_suspend and place it
>>>> before the first call to amdgpu_bo_evict_vram ?
>>> In a nutshell, yes.
>>>
>>> Or maybe it would be easier to move the amdgpu_bo_evict_vram call down
>>> to somewhere called from amdgpu_device_ip_suspend?
>>
>> I can move the BEFORE and AFTER calls to amdgpu_bo_evict_vram inside
>> amdgpu_device_ip_suspend
>> such that the first one is called AFTER display is shut off, while the
>> second in the very end of the function.
>> I am just not sure what's gonna be the side effect of evicting after bunch
>> of blocks (such as GMC) are already disabled.
> How about something like the attached patches?  Basically split the ip
> suspend sequence in two like we do for resume.
>
> Alex

Please carry over the regression  info and the related Bugzilla ticket 
into the comments here -

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107065
Fixes e00fb85 drm: Stop updating plane->crtc/fb/old_fb on atomic drivers

Since the warning I observed is not specific to your change please push 
this and I will handle the warning
separately.

Reviewed-and-tested-by: Andrey Grodzovsky <andrey.grodzovsky at amd.com>

Andrey

Andrey