[igt-dev] [PATCH i-g-t v2] tests: Add variable refresh rate tests

Wed Apr 15 15:09:19 UTC 2020

On 2020-04-15 9:00 a.m., Daniel Vetter wrote:
> On Tue, Apr 14, 2020 at 03:17:20PM -0400, Harry Wentland wrote:
>>
>>
>> On 2020-04-14 2:50 p.m., Daniel Vetter wrote:
>>> On Tue, Apr 14, 2020 at 3:00 PM Kazlauskas, Nicholas
>>> <nicholas.kazlauskas at amd.com> wrote:
>>>>
>>>> On 2020-04-14 6:52 a.m., Daniel Vetter wrote:
>>>>>> +/* Performs an asynchronous non-blocking page-flip on a pipe. */
>>>>>> +static int
>>>>>> +do_flip(data_t *data, enum pipe pipe_id, igt_fb_t *fb)
>>>>>> +{
>>>>>> +       igt_pipe_t *pipe = &data->display.pipes[pipe_id];
>>>>>> +       int ret;
>>>>>> +
>>>>>> +       igt_set_timeout(1, "Scheduling page flip\n");
>>>>>> +
>>>>>> +       /*
>>>>>> +        * Only the legacy flip ioctl supports async flips.
>>>>>> +        * It's also non-blocking, but returns -EBUSY if flipping too fast.
>>>>>> +        * 2x monitor tests will need async flips in the atomic API.
>>>>>> +        */
>>>>>
>>>>> Uh, if this is also how your amdgpu userspace works we've just fucked
>>>>> up the uapi for good :-/
>>>>>
>>>>> FLIP_ASYNC = please tear
>>>>>
>>>>> VRR = please don't strictly obey the vrefresh, but very much dont tear
>>>>>
>>>>> Tying them together means we're deeply mixing things up.
>>>>>
>>>>> Also amdgpu is still using it's own flip implementation, which makes
>>>>> me wonder whether VRR would even work with atomic, or whether that's
>>>>> also butchered ...
>>>>>
>>>>> How I thought this stuff was supposed to work:
>>>>>
>>>>> - VRR_ENABLED controls whether we do VRR
>>>>> - since atomic is awesome you can change that on every frame
>>>>> - VRR has nothing to do with ASYNC
>>>>>
>>>>> So a) do I read this correctly b) how do we get out of this hole (and
>>>>> maybe c) amdgpu really needs to remove
>>>>> amdgpu_display_crtc_page_flip_target asap).
>>>>>
>>>>> Manasi pointed this out to me, so adding a few more people here.
>>>>> -Daniel
>>>>
>>>> Adaptive sync is an extended front porch that ends upon some signaling
>>>> from the driver is almost always tied to a page flip.
>>>>
>>>> FLIP_ASYNC is a mechanism to flip immediately without waiting for any
>>>> double buffering from the driver or hardware, potentially causing tearing.
>>>>
>>>> This test uses FLIP_ASYNC to precisely position when the flip occurs
>>>> from userspace within the VBLANK to end the frame at the midpoint
>>>> between the min and max range.
>>>>
>>>> But it doesn't really need to be FLIP_ASYNC to make this work, explicit
>>>> fencing is probably the better solution here for all drivers.
>>>
>>> Ok, I freaked out for a bit there, and reading amdgpu DC code didn't
>>> really help with convincing me it's not needed - you're still using
>>> your own page_flip_target handler instead of the one in the helpers
>>> (written by amd people even, and for a few features before VRR), so
>>> biggest worry I had was that VRR doesn't work without FLIP_ASYNC set.
>>> I also looked at -amdgpu code, but I couldn't figure this out one way
>>> or another there either (since that's not using atomic either).
>>>
>>>> This test only really worked by accident for testing amdgpu. We have
>>>> internal stalls in the driver to prevent an immediate flip from
>>>> occurring more than once per frame, so if you try to flip in the
>>>> extended front porch you end up hitting that stall and the returned
>>>> timestamp is a frame ahead.
>>>>
>>>> The other part of this test that's sort of an issue is that this is
>>>> effectively a measurement for how fast you can immediate flip - which is
>>>> good for catching performance regressions in the driver, and good for
>>>> consistent VRR behavior, but not exactly related to the test.
>>>
>>> Hm yeah I think that's another reason for why this test should maybe
>>> use atomic with vgem dma_fences that we signal exactly when needed.
>>> Probably should be a 2nd subtest, so that we can keep the other one
>>> working.
>>>
>>> Plus update the comment on why exactly FLIP_ASYNC is needed with your
>>> story above. Btw instead of FLIP_ASYNC could we use the target frame
>>> number, set to the next frame? That should also allow us to do an
>>> immediate flip while in the vblank.
>>>
>>> Also I think we should have a normal flip which is somewhat late-ish
>>> in the vblank, and make sure VRR can still hit that frame. If that
>>> doesn't exist yet. I think with that set of tests we should be able to
>>> exercise all the timings we want for VRR?
>>>
>>>> The way FLIP_ASYNC should probably work in amdgpu is:
>>>>
>>>> 1) No stalls
>>>> 2) Return -EBUSY if we're in the middle of a frame already, or even
>>>> allow the commit to happen in the same frame since the hardware supports it
>>>>
>>>> Not sure how userspace is equipped to handle that though.
>>>
>>> I think aside from -amdgpu there's 0 userspace out there using
>>> FLIP_ASYNC. Plus it's not standardized for atomic drivers, so really
>>> the semantics are super wobbly. I'm not worried about FLIP_ASYNC
>>> itself (not the first piece of kms uapi that might be fun to unify
>>> across drivers), the only thing that worried me is tying VRR into
>>> FLIP_ASYNC. Looks I worried without reasons since it's just an
>>> igt/test hack that we can change.
> 
> Aside: Big part of my confusion was that I ended up looking at non-DC
> code. Would be real nice if we could garbage collect at least some of that
> stuff ...
> 
>> Thanks, Nick, and Daniel, for hashing this out a bit.
>>
>> I still don't fully understand flip semantics in DRM or where they're
>> defined.
>>
>> One thing that is problematic for some use-cases, and I'm not sure if
>> this is unique to amdgpu, is that a normal atomic flip (not ASYNC) will
>> wait until HW latches (starts scanning out) the new address before it
>> returns. This will stall the renderer.
> 
> So for the blocking case we ... block.
> 
> For the non-blocking case the work item indeeds blocks until scanout has
> started. And we need to, because the next thing we do afterwards is unpin
> the buffers. But, and this is important in how atomic works, at least with
> the nonblocking helpers: These workers are overlapping, so the next one
> will already start before that's all finished. Furthermore if you block
> you rendering on a flip, that's definitely not something atomic core nor
> helpers does.
> 
> So not sure what you're doing, and what exactly is getting blocked.
> 
>> A better approach, and an approach that I understand Windows is taking,
>> is to return immediately from the driver call after programming the new
>> address. AMD HW at least will guarantee that this address is scanned out
>> only on the following frame and won't tear (unless an immediate flip is
>> explicitly requested). This will let usermode render and flip at a rate
>> it desires and ensures that only the flip programmed closest to the
>> start of scanout will be actually scanned out.
> 
> Ah that one, that's mailbox mode. Simply not yet implemented. There have
> been patches floating around about this idea since forever, some even
> landed (but not all). The generic cursor implementation at least works
> like this, and would be fairly easy to extend. It's really hard to extend
> this to all of atomic (Ville had some proof-of-concepts, but even that was
> limited), but just mailbox flip mode shouldn't be hard to roll out.
> 
> It's just no one has yet done the work (uapi + igt + userspace) to make it
> happen.

Thanks for teaching me a new term.

Good to hear there has been some work on this before. I gotta look for
Ville's patches sometime.

Harry

> -Daniel
> 
>>
>> Cheers,
>> Harry
>>
>>> Thanks for your explanations and all.
>>>
>>> Cheers, Daniel
>>>
>>>>
>>>> Regards,
>>>> Nicholas Kazlauskas
>>>>
>>>>>
>>>>>> +       do {
>>>>>> +               ret = drmModePageFlip(data->drm_fd, pipe->crtc_id,
>>>>>> +                                     fb->fb_id,
>>>>>> +                                     DRM_MODE_PAGE_FLIP_EVENT |
>>>>>> +                                     DRM_MODE_PAGE_FLIP_ASYNC,
>>>>>> +                                     data);
>>>>>> +       } while (ret == -EBUSY);
>>>>>> +
>>>>>> +       igt_assert_eq(ret, 0);
>>>>>> +       igt_reset_timeout();
>>>>>> +
>>>>>> +       return 0;
>>>>>> +}
>>>>
>>>
>>>
>