[PATCH] drm/amdgpu: Fix S3 resume failre.

Thu Jul 19 16:47:15 UTC 2018

On 2018-07-19 06:33 PM, Andrey Grodzovsky wrote:
> On 07/19/2018 11:39 AM, Ville Syrjälä wrote:
>> On Thu, Jul 19, 2018 at 11:19:56AM -0400, Andrey Grodzovsky wrote:
>>> Problem:
>>> FB is still not unpinned during the first run of amdgpu_bo_evict_vram
>>> and so it's left for the second run, but during second run the SDMA for
>>> moving buffer around already disabled and you have to do
>>> it with CPU, but FB is not in visible VRAM and hence the eviction
>>> failure
>>> leading later to resume failure.
>>>
>>> Fix:
>>> When DAL in use get a pointer to FB from crtc->primary->state rather
>>> then from crtc->primary which is not set for DAL since it supports
>>> atomic KMS.
>>>
>>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107065
>>> Fixes e00fb85 drm: Stop updating plane->crtc/fb/old_fb on atomic drivers
>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky at amd.com>
>>> ---
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 ++-
>>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> index 709e4a3..dd9ebf7 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> @@ -2642,7 +2642,8 @@ int amdgpu_device_suspend(struct drm_device
>>> *dev, bool suspend, bool fbcon)
>>>       /* unpin the front buffers and cursors */
>>>       list_for_each_entry(crtc, &dev->mode_config.crtc_list, head) {
>>>           struct amdgpu_crtc *amdgpu_crtc = to_amdgpu_crtc(crtc);
>>> -        struct drm_framebuffer *fb = crtc->primary->fb;
>>> +         struct drm_framebuffer *fb =
>>> amdgpu_device_has_dc_support(adev) ?
>>> +                 crtc->primary->state->fb : crtc->primary->fb;
>> So apparently you haven't yet turned off the planes here. If I'm
>> reading things right amdgpu_device_ip_suspend() should end up doing
>> that through drm_atomic_helper_suspend(). So it looks like like now
>> you'll end up unpinning the same bos twice. Doesn't that mess up
>> some kind of refcount or something?
> amdgpu_bo_unpin has a guard against that, amdgpu_bo_unreserve is less
> clear.

BO reservation shouldn't an issue here, BOs are only reserved for a
short time around (un)pinning them.

>> To me it would seem better to susped the display before trying
>> to evict the bos.
> 
> Yea, i was aware of that and indeed DAL shouldn't rely on the code in 
> amdgpu_device_suspend to unpin
> front buffer and cursor since the atomic helper should do it. Problem is
> that during amdgpu_device_ip_suspend
> the SDMA engine gets suspended too, so you have to embed another
> eviction in between, after display is suspended but before
> SDMA and this forces ordering between them which kind of already in
> place (amd_ip_block_type) but still it's an extra constrain.

Ville's point (which I basically agree with) is that the display
hardware should be turned off before evicting VRAM the first time, in
which case no second eviction should be necessary (for this purpose).

-- 
Earthling Michel Dänzer               |               http://www.amd.com
Libre software enthusiast             |             Mesa and X developer