Enhancements for Render composite request

Thu Aug 27 08:05:10 PDT 2009

Keith Packard <keithp at keithp.com> writes:

[I have reordered your mail a bit]

> The render composite request has a couple of glaring failures:
> 
>  1) Only one rectangle per request. Apps generate a lot of protocol,
>     the server spends a lot of time decoding requests and the driver
>     has to merge requests back together to hand more than one polygon
>     to the hardware. It's interesting that exa (and hence uxa by
>     derivation) have a poly-rectangle composite operation in their
>     driver interface.

> 
> As operation 1) is already supported by the EXA API, and can be emulated
> in DIX by executing multiple one-rectangle composite requests, this
> seems easy to add to the protocol in a completely compatible
> fashion:

> COMPOSITERECT	[
> 			src-x, src-y:	INT16
> 			msk-x, msk-y:	INT16
> 			dst-x, dst-y:	INT16
> 			width, height:	CARD16
> 		]
> 
> CompositeRectangles
> 
> 	op:		PICTOP
> 	src:		PICTURE
> 	mask:		PICTURE or None
> 	dst:		PICTURE
> 	rects:		LISTofCOMPOSITERECT
> 
> 	This request is equivalent to a sequence of Composite requests
> 	using the same op/src/mask/dst values and stepping through
> 	rects.

Is there any data to suggest that applications could actually benefit
from this? I don't see anything in cairo that would generate this kind
of pattern, but I am not very familiar with cairo's X backends.

I could see it happening in applications that would use the same shape
as a "stamp" many times, but I'm not sure how common that is, or how
easy it would be for applications to extract such a pattern. (Except
for text, but there is already support for that).

Naively, I'd expect applications to also switch source and mask almost
as often as they switch rectangles, but keep the same destination for
long periods of time, so maybe add "src" and "mask" fields
COMPOSITERECT as well?

>  2) No vblank synchronization. Anyone wanting to double buffer 2D apps
>     has no way of avoiding tearing. I'd like this inside the X server
>     to make updates under a RandR transform sync to vblank.
>
> It seems like operation 2) should be an option on the picture object;
> set a sync mode on the picture and all operations would be covered by
> that mode. It would be 'best effort', so that drivers not supporting the
> sync mode would simply skip it. The question is how fancy this option
> should be; in the simple case, we'd make it just avoid tearing, more
> complex cases could involve having sequential operations to the same
> picture wait for a specific frame number. I'd love to have comments on
> precisely which 'swap modes' would be useful here.
> 
> -- 

For an application to do good-looking animations, it needs to know at
any given time when rendering initiated then will turn into photons,
so that they can compute the frame that the user will actually
see. For this to work, they need to know:

        - An estimate of the latency until pixels hit the framebuffer

        - The screen's update frequency and phase

The first requires time-stamped fence events, and if a compositing
manager is present, a protocol to allow it to relay those events back
to the application. The application can then

        Ask for fence event
        Render things offscreen
        Inform the compositing manager that it should update and 
           send a fence event when it has submitted its rendering
        Wait for fence event to come back.

Subtracting the time stamps, it then has an estimate of the latency as
a function of the amount of rendering it did.

The update frequency is easy to get. The phase will hopefully be
provided by the time stamp in the DRI2 page flip event. Given this
information, the application can estimate when rendering done at that
moment will turn into photons ("now" + app's own latency + pipe
latency, rounded up to end of following vblank). This has two benefits

        - The application can keep reading input events until the last
          moment before rendering. This minimizes input latency

        - The application can render things as they are supposed to
          appear when the image is generated. This reduces visible
          jitter.

I'd imagine similar information is necessary for good audio
synchronization with video play back.

How this turns into protocol, I'm not sure. If DbeSwapBuffer() could
be executed asynchronously and generate an event, that might be good
enough.

I am not sure how this would work with a 'sync mode' on a
picture. Would each and every operation be synchronized, or would
several operations somehow be buffered and then executed in vblank?
Are other requests delayed behind the synchronized request?

Soren