[PATCH 2/2] present: Fix use of vsynced pageflips and honor PresentOptionAsync. (v3)

Thu Dec 4 12:28:21 PST 2014

On 12/04/2014 11:04 AM, Axel Davy wrote:
> Hello,
>
> I'm not sure this is the right fix.
>
> vblank->sync_flip seems to have the meaning "this driver only supports 
> sync flips"
> not "we would like a sync flip ideally".
>

Hi, thanks for the response.

In my understanding - after digging through all the Present code-pathes 
of intel-ddx and nouveau-ddx and then the intel-kms and nouveau-kms 
drivers in the kernel - vblank->sync_flip == TRUE does really mean "we 
would like a sync flip ideally". The one-liner check that my patch 
replaces seems to be a hack introduced in the initial dri3/Present 
enablement by Keith from 2013, as far as i can see to allow bringup of 
the extension before the intel-ddx was completely ready for it. It 
worked - for the intel-ddx - but i think it was accidentally left there 
for the public release and not noticed, because there wasn't any other 
driver than intels actually implementing Present.

So here's my understanding of the logic and the evidence for it (and 
sorry for the wall of text, brain hasn't reached proper caffeine level yet):

1. PresentOptionAsync means "please present as soon as possible, don't 
wait for vblank (because that would defeat the "as soon as possible" 
part), let it tear. I think the Async refers to asynchronous to the 
vblank. This makes sense because it is the fastest way to get something 
on the screen, and there are legitimate use cases for tearing swaps. 
Also, under drm/kms, non-vsynced immediate pageflips are called async 
flips and the drm/kms drivers expose DRM_CAP_ASYNC_FLIP if they can do 
tearing flips.

This interpretation is consistent with how the servers current Present 
implementation treats PresentOptionAsync wherever it already takes the 
option into account: The target_msc computation/adjustment logic adjusts 
the target_msc to try to make sure the Present completes within the 
currently active scanout cycle. For a divisor + remainder controlled 
Present it does the closest equivalent to "hurry, do it now regardless 
if it tears" that is possible while staying consistent with the 
divisor+remainder semantic.

So for a "typical" (divisor=remainder=0) Present, the vblank event for a 
copy-present is queued in the kernel for immediate dispatch back to the 
server, without the kernel waiting for any vblank, so it will 
immediately return from the kernel and immediately trigger 
present_execute(), which in case of our copyswap means an immediate, not 
vsynced ddx->CopyArea() call to start the blit asap - and tearing. For a 
flip present, it queues the vblank event which will trigger the pageflip 
ioctl() for the user requested msc (or the current scanout cycle if we 
are already past that), something that only gives the correct behaviour 
iff the kernels kms driver executes the pageflip ioctl() in a 
non-vsynced, tearing manner. For a vsynced flip instead, the ioctl() 
would need to be called one vblank before the target vblank to get 
correct timing, and that is indeed what is done in the current 
implementation "if (vblank->sync_flip == TRUE)".

-> PresentOptionAsync requests "immediate" tearing Present by all means. 
Ideally by a non-vsynced asyncflip for fullscreen windows, but if that 
is not possible due to driver/hw limitations, by falling back to a 
non-vsynced ddx->CopyArea() blit.

2. vblank->sync_flip == FALSE vs. TRUE is used by all parts of the 
Present implementation to request a non-vsynced vs. vsynced pageflip:

It is passed into the drivers ddx->check_flip() function as a parameter 
to tell the driver if you want a sync flip or an async tearing flip, and 
the function is supposed to answer if the driver is able to perform the 
requested type of pageflip under the given conditions (drawable size, 
pixelformat, hw constraints, kms driver DRM_CAP_ASYNC_FLIP capable). 
Both intel-ddx and nouveau-ddx implement the check in that way - with 
the exception of that omission in intels uxa backend, fixed by that 
other patch i sent out. The sna backend checks, based on the passed 
vblank->sync_flip, if the intel driver can provide a vsynced/non-vsynced 
flip and answers accordingly. The nouveau-ddx does the same. Now it 
turns out that nouveau can always provide async pageflips on top of all 
Linux kernels where it supports or uses Present at all (Maxwell gpu's 
for the glamor backend, kernel with rendernodes support for exa 
backend). intel-kms otoh can't support async flips at all atm. (checked 
it against all currently released Linux kernels and drm-next), although 
the intel-ddx implements all the logic which would allow to request an 
async flip if the kms driver would actually support it - Future proofing 
i guess.

Depending on the result of the ddx->check_flip() the servers Present 
extension either goes ahead with a pageflipped present (vblank->flip == 
TRUE) either sync or async (vblank->sync_flip) or it falls back to a 
copy-present if the ddx can't execute a flip as requested.

If it is decided to execute a pageflip, then the Present implementations 
present_execute() function passes vblank->sync_flip as a flag to the 
ddx->present_flip() function to request a vsynced or non-vsynced flip, 
and the intel-ddx and nouveau-ddx use that flag to request a sync vs. 
async kms-pageflip from the kernel drm/kms driver.

-> vblank->sync_flip decides between sync flip or async/tearing flip.

So why does the current Present implementation seem to use 
vblank->sync_flip to decide if the driver is sync_flip / async_flip 
capable? What the line i replaced in that patch did is this: Iff the ddx 
driver doesn't have async flip capability, it forces vblank->sync_flip = 
TRUE, so the immediately following ddx->check_flip() query doesn't 
reject a fullscreen pageflip, and later on we ask the driver to perform 
a vsynced pageflip. So all is good, and usually pageflips are used for 
fullscreen windows.

Except for some problems:

1. PresentOptionAsync will not work correctly for fullscreen Presents on 
drivers without asyncflip support (= the intel-ddx): It will always 
execute a vsynced pageflip, but possibly for the wrong target vblank as 
the logic at the top of present_pixmap() assumes that PresentOptionAsync 
is correctly treated by the logic at the bottom of present_pixmap(). 
This problem was probably not noticed in practice because of the 
bugs/omissions in Mesa's dri3/present backend.

2. You will always get massive tearing on drivers which do support 
PresentCapabilityAsync (= the nouveau-ddx and possibly future intel-ddx 
etc.), because vblank->sync_flip doesn't get initialized at all in this 
case, which means it stays at its vblank = calloc() default of FALSE, 
which means to request non-vsynced tearing async flips from drivers 
which can do that. That's why i see massive tearing all the time on 
nouveau, regardless if PresentOptionAsync is set or not.

So the current code was imho a workaround for intel-ddx which worked ok 
for that driver in its current form for the common case of vsynced 
swaps, but it backfires on any other driver. The nouveau ddx 
implementation of present is very recent, so not surprising nobody 
noticed this.

My patch fixes the problem in the way i think Keith intended the 
mechanism to work. It initializes vblank->sync_flip depending on 
PresentOptionAsync to a vsynced or non-vsynced flip request. Then it 
checks via ddx->check_flip() if the driver can actually support the 
required vblank->sync_flip request type. If the driver says "Yes" then 
it goes ahead and executes the flip accordingly. If it say "No, i can't 
do that", it falls back to a copy-present with properly adjusted 
parameters to get a present that honors PresentOptionAsync and all the 
other target_msc, divisor, remainder constraints properly, just in a 
less efficient way than a pageflip.

> I'm not sure exactly what's the expected behaviour when Async flips 
> are available.
> Do we really want to have tearings in that case ? Likely the goal is 
> to avoid a fullscreen
> copy, and thus you want tearings, but only when PresentOptionAsync is 
> used.

Yes. But that's what the code does with the patch applied. Tearfree sync 
flip by default, tearing flip for PresentOptionAsync.
Tested on intel (sna or bug-fixed uxa) and nouveau (glamor and exa) with 
single and dual-display, windowed or fullscreen windows, swapinterval 
zero vs. 1, etc.

>
>
> If you have tearing issues when not using PresentOptionAsync, then 
> there is likely
> a bug somewhere in the current logic, but I don't think your patch is 
> the right one.
>

That is because of problem 2. above - failure to initialize 
vblank->sync_flip properly unless it gets "accidentally" initialized as 
a side effect of running on intel-ddx.

I think with the patch applied, and the other fix applied to the 
intel-ddx uxa present backend, the way it is implemented in the server 
is quite elegant, and consistent with how the drivers present backends 
are implemented (= how the driver writers interpreted the way it is 
supposed to work) and it passes all my correctness tests and stress 
tests just as well as it did under DRI2.

Otherwise passing the vblank->sync_flip parameter to ddx->check_flip() 
also wouldn't make any sense: It would basically be: Find out if the ddx 
can do async flips by some capability check, assign the result to 
vblank->sync_flip, then ask the driver again via ddx->check_flip() "Do 
you actually support flips in the way you just told me you do, or did 
you lie to me when i asked you exactly the same question with a slightly 
different phrasing just a microsecond ago?". Which was a question 
avoided by the hack because the intel-ddx was known to lie in 
ddx->check_flip() due to some missing implementation.

The patch went through three revisions by me so far, each time i was 
done i realized that a much simpler solution already exists, because 
that was already taken care of in the current implementation. But i 
think this is the minimal fix to cover all cases.

Thoughts? Keith, maybe an opinion if my story here is correct?

thanks,
-mario

> Axel Davy
>
> On 02/12/2014 20:08, Mario Kleiner wrote :
>> Pageflips for Pixmap presents were not synchronized to vblank on
>> drivers with support for PresentCapabilityAsync, due to some
>> missing init for vblank->sync_flips. The PresentOptionAsync
>> flag was completely ignored for pageflipped presents.
>>
>> Vsynced flips only worked by accident on the intel-ddx, as that
>> driver doesn't have PresentCapabilityAsync support.
>>
>> On nouveau-ddx, which supports PresentCapabilityAsync, this
>> always caused non-vsynced pageflips with pretty ugly tearing.
>>
>> This patch fixes the problem, as tested on top of XOrg 1.16.2
>> on nouveau and intel.
>>
>> Please also apply to XOrg 1.17 and XOrg 1.16.2 stable.
>>
>> Applying on top of XOrg 1.16.2 may require cherry-picking
>> commit 2051514652481a83bd7cf22e57cb0fcd40333f33
>> which trivially fixes lack of support for protocol option
>> PresentOptionCopy - get two bug fixes for the price of one!
>>
>> Signed-off-by: Mario Kleiner <mario.kleiner.de at gmail.com>
>> ---
>>   present/present.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/present/present.c b/present/present.c
>> index e5d3fd5..be1c9f1 100644
>> --- a/present/present.c
>> +++ b/present/present.c
>> @@ -834,7 +834,7 @@ present_pixmap(WindowPtr window,
>>       vblank->notifies = notifies;
>>       vblank->num_notifies = num_notifies;
>>   -    if (!screen_priv->info || !(screen_priv->info->capabilities & 
>> PresentCapabilityAsync))
>> +    if (!(options & PresentOptionAsync))
>>           vblank->sync_flip = TRUE;
>>         if (!(options & PresentOptionCopy) &&
>