[PATCH] present: Queue flips for later execution. Begging for review.

Wed Jun 11 05:35:20 PDT 2014

Am 04.06.2014 22:11, schrieb Keith Packard:
> Christian König <deathsimple at vodafone.de> writes:
>
>> Agree totally, even today we have a number of problems with MSCs.
>>
>> For example try to move a video from one monitor to another. Since the
>> state tracker doesn't know of the window move we send out MSCs for the
>> wrong display device. In the best case this results in just displaying a
>> frame way to early, in the worst we stall for a quite long time until
>> the other device reaches the MSC counter of the first one.
> Present constructs a logical MSC value for each window; move the window
> among CRTCs and the MSC value continues in the correct sequence,
> although the time between frames may change. In fact, move the window
> off the screen entirely and the MSC interval rises to a full second,
> slowing applications which are hidden down to reduce resource utilization.

Still doesn't sounds as flexible and reliable as a simple nanosecond 
based timestamp, so why don't just use this instead of a MSC?

>
>> I'm not sure if we want to queue things up in the kernel.
> We already queue one flip in the kernel; the question is what to do when
> a second flip call is made.

Exactly! What I would rather like to avoid is queuing up more than one 
flip in the kernel. Calling into the kernel with a pending flip should 
either result in an error or into replacing the pending flip.

>
>> I think just providing a timestamp when the frame should be first
>> visible like VDPAU does is the right way to go.
> DRI2 and Present both provide that information.

Unfortunately at least DRI2 doesn't do so. It provides an MSC when the 
frame should be visible, not an timestamp.

As noted before the crux with the MSC is that it's always in a certain 
unit which depends on the display device instead of an independent time 
source.

>
>> Adding a flag that a waiting flip request should be replaced by another
>> one instead of running into an error should be enough to handle the
>> triple (or quad) buffered case as well.
> I don't think this is sufficient to implement the swap_control_tear
> extension though. For that, if the request comes before the vblank
> event, it should wait, but if the request comes after the vblank event,
> it should swap immediately. By telling the kernel which vblank interval
> to flip at can you allow the kernel to correctly choose between waiting
> and immediately flipping.

Disagree, we should make that whole thing timestamp based instead of 
relying in the continuity of vblank intervals.

For example you can easily implement this with two stamps, where the 
first denotes when the frame should be first visible and the second is 
essentially a timeout after which we should do the flip unsynchronized 
to the vblank.

> The alternative is to eliminate delayed flips in the kernel entirely and
> rely on user space to make the request when the vblank interrupt
> happens. I don't know if this will work reliably enough; you need to
> deliver the vblank event out to user space, and the window system needs
> to respond by getting the flip to the kernel in the space of a few
> scanlines of time.

As already explained in the other mail at least for radeon hardware 
that's undesirable, cause we have hardware double buffered scan-out 
addresses and so actually need to program the new frame buffer address 
into the registers long before the vblank occurs.

> I've implemented this with the current in-ring flip stuff on
> Intel and it "works" when things are not busy, correctly flipping at
> vblank when possible. When the GPU becomes busy, the flip is delayed by
> sitting in the ring and you end up with a tear.
>
> However, this completely ignores the problem of scheduling the flip
> while rendering is still queued for the new buffer. Placing the flip in
> the ring ensures that any related rendering will be done before the flip
> occurs. Using MMIO to flip means that an explicit wait is required; for
> correct vblank-synchronized rendering, we must delay until the buffer is
> idle *and* vblank is occurring before updating the scanout
> register. Present isn't doing this idle check, and so even if we have
> MMIO writes in the kernel, I suspect we'll still have tearing...

While it is possible to program the flip through the ring buffer on 
current Radeon hardware as well, it's not a feature that we can assume 
to have always available.

Regards,
Christian.