[PATCH] drm/amdgpu: Fix S3 resume failre.
Andrey Grodzovsky
Andrey.Grodzovsky at amd.com
Thu Jul 19 22:45:55 UTC 2018
On 07/19/2018 04:17 PM, Andrey Grodzovsky wrote:
>
>
> On 07/19/2018 03:37 PM, Harry Wentland wrote:
>> On 2018-07-19 02:30 PM, Alex Deucher wrote:
>>> On Thu, Jul 19, 2018 at 1:07 PM, Andrey Grodzovsky
>>> <Andrey.Grodzovsky at amd.com> wrote:
>>>>
>>>> On 07/19/2018 12:59 PM, Michel Dänzer wrote:
>>>>> On 2018-07-19 06:53 PM, Andrey Grodzovsky wrote:
>>>>>>
>>>>>> On 07/19/2018 12:47 PM, Michel Dänzer wrote:
>>>>>>> On 2018-07-19 06:33 PM, Andrey Grodzovsky wrote:
>>>>>>>> On 07/19/2018 11:39 AM, Ville Syrjälä wrote:
>>>>>>>>> On Thu, Jul 19, 2018 at 11:19:56AM -0400, Andrey Grodzovsky
>>>>>>>>> wrote:
>>>>>>>>>> Problem:
>>>>>>>>>> FB is still not unpinned during the first run of
>>>>>>>>>> amdgpu_bo_evict_vram
>>>>>>>>>> and so it's left for the second run, but during second run
>>>>>>>>>> the SDMA
>>>>>>>>>> for
>>>>>>>>>> moving buffer around already disabled and you have to do
>>>>>>>>>> it with CPU, but FB is not in visible VRAM and hence the
>>>>>>>>>> eviction
>>>>>>>>>> failure
>>>>>>>>>> leading later to resume failure.
>>>>>>>>>>
>>>>>>>>>> Fix:
>>>>>>>>>> When DAL in use get a pointer to FB from crtc->primary->state
>>>>>>>>>> rather
>>>>>>>>>> then from crtc->primary which is not set for DAL since it
>>>>>>>>>> supports
>>>>>>>>>> atomic KMS.
>>>>>>>>>>
>>>>>>>>>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107065
>>>>>>>>>> Fixes e00fb85 drm: Stop updating plane->crtc/fb/old_fb on atomic
>>>>>>>>>> drivers
>>>>>>>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky at amd.com>
>>>>>>>>>> ---
>>>>>>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 ++-
>>>>>>>>>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>>>>>>>>>
>>>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>>>>>>> index 709e4a3..dd9ebf7 100644
>>>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>>>>>>> @@ -2642,7 +2642,8 @@ int amdgpu_device_suspend(struct
>>>>>>>>>> drm_device
>>>>>>>>>> *dev, bool suspend, bool fbcon)
>>>>>>>>>> /* unpin the front buffers and cursors */
>>>>>>>>>> list_for_each_entry(crtc,
>>>>>>>>>> &dev->mode_config.crtc_list, head)
>>>>>>>>>> {
>>>>>>>>>> struct amdgpu_crtc *amdgpu_crtc =
>>>>>>>>>> to_amdgpu_crtc(crtc);
>>>>>>>>>> - struct drm_framebuffer *fb = crtc->primary->fb;
>>>>>>>>>> + struct drm_framebuffer *fb =
>>>>>>>>>> amdgpu_device_has_dc_support(adev) ?
>>>>>>>>>> + crtc->primary->state->fb : crtc->primary->fb;
>>>>>>>>> So apparently you haven't yet turned off the planes here. If I'm
>>>>>>>>> reading things right amdgpu_device_ip_suspend() should end up
>>>>>>>>> doing
>>>>>>>>> that through drm_atomic_helper_suspend(). So it looks like
>>>>>>>>> like now
>>>>>>>>> you'll end up unpinning the same bos twice. Doesn't that mess up
>>>>>>>>> some kind of refcount or something?
>>>>>>>> amdgpu_bo_unpin has a guard against that, amdgpu_bo_unreserve
>>>>>>>> is less
>>>>>>>> clear.
>>>>>>> BO reservation shouldn't an issue here, BOs are only reserved for a
>>>>>>> short time around (un)pinning them.
>>>>>>>
>>>>>>>
>>>>>>>>> To me it would seem better to susped the display before trying
>>>>>>>>> to evict the bos.
>>>>>>>> Yea, i was aware of that and indeed DAL shouldn't rely on the
>>>>>>>> code in
>>>>>>>> amdgpu_device_suspend to unpin
>>>>>>>> front buffer and cursor since the atomic helper should do it.
>>>>>>>> Problem
>>>>>>>> is
>>>>>>>> that during amdgpu_device_ip_suspend
>>>>>>>> the SDMA engine gets suspended too, so you have to embed another
>>>>>>>> eviction in between, after display is suspended but before
>>>>>>>> SDMA and this forces ordering between them which kind of
>>>>>>>> already in
>>>>>>>> place (amd_ip_block_type) but still it's an extra constrain.
>>>>>>> Ville's point (which I basically agree with) is that the display
>>>>>>> hardware should be turned off before evicting VRAM the first
>>>>>>> time, in
>>>>>>> which case no second eviction should be necessary (for this
>>>>>>> purpose).
>>>>>> Display HW is turned off as part of all IPs in a loop inside
>>>>>> amdgpu_device_ip_suspend.
>>>>>> Are you suggesting to extract the display HW turn off from inside
>>>>>> amdgpu_device_ip_suspend and place it
>>>>>> before the first call to amdgpu_bo_evict_vram ?
>>>>> In a nutshell, yes.
>>>>>
>>>>> Or maybe it would be easier to move the amdgpu_bo_evict_vram call
>>>>> down
>>>>> to somewhere called from amdgpu_device_ip_suspend?
>>>>
>>>> I can move the BEFORE and AFTER calls to amdgpu_bo_evict_vram inside
>>>> amdgpu_device_ip_suspend
>>>> such that the first one is called AFTER display is shut off, while the
>>>> second in the very end of the function.
>>>> I am just not sure what's gonna be the side effect of evicting
>>>> after bunch
>>>> of blocks (such as GMC) are already disabled.
>>> How about something like the attached patches? Basically split the ip
>>> suspend sequence in two like we do for resume.
>>>
>> Patches are
>> Acked-by: Harry Wentland <harry.wentland at amd.com>
>>
>> Harry
>>
>>> Alex
>
> Patches look good indeed but on second S3 in a raw i get a warning in
> dma_fence_is_later
> about fence contexts not equal. I will have to take a look why is that.
>
> Andrey
Seems like amdgpu_ttm_set_buffer_funcs_status destroys adev->mman.entity
on suspend without releasing
adev->mman.bdev.man[TTM_PL_VRAM].move fence so on resume the new
drm_sched_entity.fence_context causes
the warning against the old fence context which is different.
BTW, happens with my original change to, I just haven't noticed.
Andrey
>
>>>
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx at lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>
More information about the amd-gfx
mailing list