[Intel-gfx] [RFC] drm/i915: Add sync framework support to execbuff IOCTL

Thu Jul 2 06:22:38 PDT 2015

On Thu, Jul 02, 2015 at 02:01:56PM +0100, John Harrison wrote:
> On 02/07/2015 12:54, Chris Wilson wrote:
> >On Thu, Jul 02, 2015 at 12:09:59PM +0100, John.C.Harrison at Intel.com wrote:
> >>From: John Harrison <John.C.Harrison at Intel.com>
> >>
> >>Various projects desire a mechanism for managing dependencies between
> >>work items asynchronously. This can also include work items across
> >>complete different and independent systems. For example, an
> >>application wants to retreive a frame from a video in device,
> >>using it for rendering on a GPU then send it to the video out device
> >>for display all without having to stall waiting for completion along
> >>the way. The sync framework allows this. It encapsulates
> >>synchronisation events in file descriptors. The application can
> >>request a sync point for the completion of each piece of work. Drivers
> >>should also take sync points in with each new work request and not
> >>schedule the work to start until the sync has been signalled.
> >>
> >>This patch adds sync framework support to the exec buffer IOCTL. A
> >>sync point can be passed in to stall execution of the batch buffer
> >>until signalled. And a sync point can be returned after each batch
> >>buffer submission which will be signalled upon that batch buffer's
> >>completion.
> >>
> >>At present, the input sync point is simply waited on synchronously
> >>inside the exec buffer IOCTL call. Once the GPU scheduler arrives,
> >>this will be handled asynchronously inside the scheduler and the IOCTL
> >>can return without having to wait.
> >>
> >>Note also that the scheduler will re-order the execution of batch
> >>buffers, e.g. because a batch buffer is stalled on a sync point and
> >>cannot be submitted yet but other, independent, batch buffers are
> >>being presented to the driver. This means that the timeline within the
> >>sync points returned cannot be global to the engine. Instead they must
> >>be kept per context per engine (the scheduler may not re-order batches
> >>within a context). Hence the timeline cannot be based on the existing
> >>seqno values but must be a new implementation.
> >But there is nothing preventing assignment of the sync value on
> >submission. Other than the debug .fence_value_str it's a private
> >implementation detail, and the interface is solely through the fd and
> >signalling.
> No, it needs to be public from the moment of creation. The sync
> framework API allows sync points to be combined together to create
> fences that either merge multiple points on the same timeline or
> amalgamate points across differing timelines. The merging part means
> that the sync point must be capable of doing arithmetic comparisons
> with other sync points from the instant it is returned to user land.
> And those comparisons must not change in the future due to scheduler
> re-ordering because by then it is too late to redo the test.

You know that's not documented at all. The only information userspace
gets is afaict

struct sync_pt_info {
	__u32   len;
	char    obj_name[32];
	char    driver_name[32];
	__s32   status;
	__u64   timestamp_ns;

	__u8    driver_data[0];
};

There is a merge operation done by combining two fence into a new one.
Merging is done by ordering the fences based on the context pointers and
then by sync_pt->fence.seqno, not the private sync value.

How does userspace try to order the fences other than as opaque fd? You
actually mean driver_data is undefined ABI...

> >  You could implement this as a secondary write to the HWS,
> >assigning the sync_value to the sync_pt on submission and
> >remove the request tracking, as when signalled you only need to compare
> >the sync_value against the timeline value in the HWS.
> >
> >However, that equally applies to the existing request->seqno. That can
> >also be assigned on submission so that it always an ordered timeline, and
> >so can be used internally or externally.
> 
> One of the scheduler patches is to defer seqno assignment until
> batch submission rather than do it at request creation (for
> execbuffer requests). You still have a problem with pre-emption
> though. A request that is pre-empted will get a new seqno assigned
> when it is resubmitted so that the HWS page always sees ordered
> values popping out. For internal requests, this is fine but for
> external sync points that breaks the assumptions made by the
> framework.

I fail to see how. Nothing in uapi/sync.h says anything about the order
of fences or gives any such guarantees. If the external callers only
have access through the fd, there is no restriction that the timeline
sync_pt->value must be set prior to submission.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre