[Intel-gfx] [RFC] drm/i915: Add sync framework support to execbuff IOCTL

Chris Wilson chris at chris-wilson.co.uk
Thu Jul 2 08:55:54 PDT 2015


On Thu, Jul 02, 2015 at 04:43:12PM +0100, John Harrison wrote:
> On 02/07/2015 14:22, Chris Wilson wrote:
> >On Thu, Jul 02, 2015 at 02:01:56PM +0100, John Harrison wrote:
> >>On 02/07/2015 12:54, Chris Wilson wrote:
> >>>On Thu, Jul 02, 2015 at 12:09:59PM +0100, John.C.Harrison at Intel.com wrote:
> >>>>From: John Harrison <John.C.Harrison at Intel.com>
> >>>>
> >>>>Various projects desire a mechanism for managing dependencies between
> >>>>work items asynchronously. This can also include work items across
> >>>>complete different and independent systems. For example, an
> >>>>application wants to retreive a frame from a video in device,
> >>>>using it for rendering on a GPU then send it to the video out device
> >>>>for display all without having to stall waiting for completion along
> >>>>the way. The sync framework allows this. It encapsulates
> >>>>synchronisation events in file descriptors. The application can
> >>>>request a sync point for the completion of each piece of work. Drivers
> >>>>should also take sync points in with each new work request and not
> >>>>schedule the work to start until the sync has been signalled.
> >>>>
> >>>>This patch adds sync framework support to the exec buffer IOCTL. A
> >>>>sync point can be passed in to stall execution of the batch buffer
> >>>>until signalled. And a sync point can be returned after each batch
> >>>>buffer submission which will be signalled upon that batch buffer's
> >>>>completion.
> >>>>
> >>>>At present, the input sync point is simply waited on synchronously
> >>>>inside the exec buffer IOCTL call. Once the GPU scheduler arrives,
> >>>>this will be handled asynchronously inside the scheduler and the IOCTL
> >>>>can return without having to wait.
> >>>>
> >>>>Note also that the scheduler will re-order the execution of batch
> >>>>buffers, e.g. because a batch buffer is stalled on a sync point and
> >>>>cannot be submitted yet but other, independent, batch buffers are
> >>>>being presented to the driver. This means that the timeline within the
> >>>>sync points returned cannot be global to the engine. Instead they must
> >>>>be kept per context per engine (the scheduler may not re-order batches
> >>>>within a context). Hence the timeline cannot be based on the existing
> >>>>seqno values but must be a new implementation.
> >>>But there is nothing preventing assignment of the sync value on
> >>>submission. Other than the debug .fence_value_str it's a private
> >>>implementation detail, and the interface is solely through the fd and
> >>>signalling.
> >>No, it needs to be public from the moment of creation. The sync
> >>framework API allows sync points to be combined together to create
> >>fences that either merge multiple points on the same timeline or
> >>amalgamate points across differing timelines. The merging part means
> >>that the sync point must be capable of doing arithmetic comparisons
> >>with other sync points from the instant it is returned to user land.
> >>And those comparisons must not change in the future due to scheduler
> >>re-ordering because by then it is too late to redo the test.
> >You know that's not documented at all. The only information userspace
> >gets is afaict
> >
> >struct sync_pt_info {
> >	__u32   len;
> >	char    obj_name[32];
> >	char    driver_name[32];
> >	__s32   status;
> >	__u64   timestamp_ns;
> >
> >	__u8    driver_data[0];
> >};
> >
> >There is a merge operation done by combining two fence into a new one.
> >Merging is done by ordering the fences based on the context pointers and
> >then by sync_pt->fence.seqno, not the private sync value.
> >
> >How does userspace try to order the fences other than as opaque fd? You
> >actually mean driver_data is undefined ABI...
> Hmm, something looks confused. Way back when (i.e. the shipping
> Android tree), the 'private' seqno value was very definitely being
> exposed in various ways but it looks like it has actually been
> superseded by the seqno value inside the (new) fence object that is
> inside the sync_pt. Despite that, there is still a 'compare'
> callback in the timeline_ops for doing comparisons between sync_pts
> based on their private implementation specific seqno. Although this
> is marked as 'required' it is not actually called anywhere anymore!
> So it looks like we can drop the 'private' seqno value completely
> and just use the fence version.
> 
> In my defense, this code is all coming from the Android tree and the
> port to the nightly was done by someone else. I hadn't realised that
> nightly had changed quite so significantly.

It would be nice if we could reuse one seqno both for internal/external
fences. If you need to expose a fence ordering within a timeline that is
based on the creation stamp rather than execution stamp, it seems like
we could just add such a stamp when creating the sync_pt and not worry
about its relationship to the execution seqno.

Doing so does expose that requests are reordered to userspace since the
signalling timeline is not the same as userspace's ordered timeline. Not
sure if that is a problem or not.

Afaict the sync uapi is based on waiting for all of a set of fences to
retire. It doesn't seem to rely on fence ordering (that is knowing that
fence A will signal before fence B so it need only wait on fence B).

Here's hoping that we can have both simplicity and efficiency...
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre


More information about the Intel-gfx mailing list