Questions about KMS flip

Lang Yu Lang.Yu at amd.com
Tue Nov 16 03:27:21 UTC 2021


On Mon, Nov 15, 2021 at 01:04:15PM +0100, Michel DDDnzer wrote:
> On 2021-11-15 12:31, Lang Yu wrote:
> > On Mon, Nov 15, 2021 at 10:49:39AM +0100, Michel DDDnzer wrote:
> >> On 2021-11-15 10:04, Lang Yu wrote:
> >>> On Mon, Nov 15, 2021 at 09:38:47AM +0100, Michel DDDnzer wrote:
> >>>> On 2021-11-15 07:41, Lang Yu wrote:
> >>>>> On Fri, Nov 12, 2021 at 05:10:27PM +0100, Michel DDDnzer wrote:
> >>>>>> On 2021-11-12 16:03, Christian König wrote:
> >>>>>>> Am 12.11.21 um 15:30 schrieb Michel Dänzer:
> >>>>>>>> On 2021-11-12 15:29, Michel Dänzer wrote:
> >>>>>>>>> On 2021-11-12 13:47, Christian König wrote:
> >>>>>>>>>> Anyway this unfortunately turned out to be work for Harray and Nicholas. In detail it's about this bug report here: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.kernel.org%2Fshow_bug.cgi%3Fid%3D214621&data=04%7C01%7CLang.Yu%40amd.com%7C07b2a4c13d2f4eba9ebb08d9a8300cd3%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637725746610767105%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=Qz2VKDE0%2BcIuXZMy0IL3NLZAjeXLimwl8PPZh4Cp4iE%3D&reserved=0
> >>>>>>>>>>
> >>>>>>>>>> Lang was able to reproduce the issue and narrow it down to the pin in amdgpu_display_crtc_page_flip_target().
> >>>>>>>>>>
> >>>>>>>>>> In other words we somehow have an unbalanced pinning of the scanout buffer in DC.
> >>>>>>>>> DC doesn't use amdgpu_display_crtc_page_flip_target AFAICT. The corresponding pin with DC would be in dm_plane_helper_prepare_fb, paired with the unpin in
> >>>>>>>>> dm_plane_helper_cleanup_fb.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> With non-DC, the pin in amdgpu_display_crtc_page_flip_target is paired with the unpin in dm_plane_helper_cleanup_fb
> >>>>>>>> This should say amdgpu_display_unpin_work_func.
> >>>>>>>
> >>>>>>> Ah! So that is the classic (e.g. non atomic) path?
> >>>>>>
> >>>>>> Presumably.
> >>>>>>
> >>>>>>
> >>>>>>>>> & dce_v*_crtc_disable. One thing I notice is that the pin is guarded by if (!adev->enable_virtual_display), but the unpins seem unconditional. So could this be about virtual display, and the problem is actually trying to unpin a BO that was never pinned?
> >>>>>>>
> >>>>>>> Nope, my educated guess is rather that we free up the BO before amdgpu_display_unpin_work_func is called.
> >>>>>>>
> >>>>>>> E.g. not pin unbalance, but rather use after free.
> >>>>>>
> >>>>>> amdgpu_display_crtc_page_flip_target calls amdgpu_bo_ref(work->old_abo), and amdgpu_display_unpin_work_func calls amdgpu_bo_unref(&work->old_abo) only after amdgpu_bo_unpin. So what you describe could only happen if there's an imbalance elsewhere such that amdgpu_bo_unref is called more often than amdgpu_bo_ref, or maybe if amdgpu_bo_reserve fails in amdgpu_display_unpin_work_func (in which case the "failed to reserve buffer after flip" error message should appear in dmesg).
> >>>>>
> >>>>>
> >>>>> Actually, each call to amdgpu_display_crtc_page_flip_target() will
> >>>>>
> >>>>> 1, init a work(amdgpu_display_unpin_work_func) to unpin an old buffer
> >>>>>    (crtc->primary->fb), the work will be queued in dce_vX_0_pageflip_irq().
> >>>>>
> >>>>> 2, pin a new buffer, assign it to crtc->primary->fb. But how to unpin it?
> >>>>>    Next call.
> >>>>>
> >>>>> The problem is the pinned buffer of last call to 
> >>>>> amdgpu_display_crtc_page_flip_target() is not unpinned.
> >>>>
> >>>> It's unpinned in dce_v*_0_crtc_disable.
> >>>
> >>> I just found crtc->primary->fb is NULL when came in dce_v*_0_crtc_disable().
> >>> So it's not unpinned...
> >>
> >> __drm_helper_disable_unused_functions sets crtc->primary->fb = NULL only after calling crtc_funcs->disable. Maybe this path can get hit for a CRTC which was already disabled, in which case crtc->primary->fb == NULL in dce_v*_0_crtc_disable is harmless.
> >>
> >> Have you checked for the issue I described below? Should be pretty easy to catch.
> >>
> >>
> >>>> I think I've found the problem though: dce_v*_0_crtc_do_set_base pin the BO from target_fb unconditionally, but unpin the BO from the fb parameter only if it's different from the former. So if they're the same, the BO's pin count is incremented by 1.
> > 
> > Form my observations, amdgpu_bo_unpin() in dce_v*_0_crtc_disable() is
> > never called.
> 
> It would be expected to happen when the screen turns off, e.g. due to DPMS.
> 
> 
> > Though a single call to dce_v*_0_crtc_do_set_base() will 
> > only pin the BO, I found it will be unpinned in next call to 
> > dce_v*_0_crtc_do_set_base().
> 
> Yeah, that's the normal case when the new BO is different from the old one.
> 
> To catch the case I described, try something like
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c b/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c
> 
> index 18a7b3bd633b..5726bd87a355 100644
> 
> --- a/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c
> 
> +++ b/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c
> 
> @@ -1926,6 +1926,7 @@ static int dce_v11_0_crtc_do_set_base(struct drm_crtc *crtc,
> 
>                 return r;
> 
> 
> 
>         if (!atomic) {
> 
> +               WARN_ON_ONCE(target_fb == fb);
> 
>                 r = amdgpu_bo_pin(abo, AMDGPU_GEM_DOMAIN_VRAM);
> 
>                 if (unlikely(r != 0)) {
> 
>                         amdgpu_bo_unreserve(abo);
>

I did some tests, the warning can be triggered.

pin/unpin operations in *_crtc_do_set_base() and
amdgpu_display_crtc_page_flip_target() are mixed.

Regards,
Lang

> 
> -- 
> Earthling Michel Dänzer            |                  https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fredhat.com%2F&data=04%7C01%7CLang.Yu%40amd.com%7C07b2a4c13d2f4eba9ebb08d9a8300cd3%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637725746610767105%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=jct%2F2NG7LamVRzzpkSy9XjQIDZRBIutSeK3VQ2jh524%3D&reserved=0
> Libre software enthusiast          |         Mesa and Xwayland developer


More information about the dri-devel mailing list