[Intel-gfx] [RFC] drm/i915: Add sync framework support to execbuff IOCTL

Mon Jul 6 07:41:55 PDT 2015

On Mon, Jul 06, 2015 at 03:26:12PM +0100, John Harrison wrote:
> On 06/07/2015 14:59, Daniel Vetter wrote:
> >On Mon, Jul 06, 2015 at 01:58:25PM +0100, John Harrison wrote:
> >>On 06/07/2015 10:29, Daniel Vetter wrote:
> >>>On Fri, Jul 03, 2015 at 12:17:33PM +0100, Tvrtko Ursulin wrote:
> >>>>On 07/02/2015 04:55 PM, Chris Wilson wrote:
> >>>>>It would be nice if we could reuse one seqno both for internal/external
> >>>>>fences. If you need to expose a fence ordering within a timeline that is
> >>>>>based on the creation stamp rather than execution stamp, it seems like
> >>>>>we could just add such a stamp when creating the sync_pt and not worry
> >>>>>about its relationship to the execution seqno.
> >>>>>
> >>>>>Doing so does expose that requests are reordered to userspace since the
> >>>>>signalling timeline is not the same as userspace's ordered timeline. Not
> >>>>>sure if that is a problem or not.
> >>>>>
> >>>>>Afaict the sync uapi is based on waiting for all of a set of fences to
> >>>>>retire. It doesn't seem to rely on fence ordering (that is knowing that
> >>>>>fence A will signal before fence B so it need only wait on fence B).
> >>>>>
> >>>>>Here's hoping that we can have both simplicity and efficiency...
> >>>>Jumping in with not even perfect understanding of everything here - but
> >>>>timeline business has always been confusing me. There is nothing in the
> >>>>uapi which needs it afaics and iirc there was some discussion at the time
> >>>>Jesse floated his patches that it can be removed. Based on that when I
> >>>>squashed his patches and ported them on top of John's request to fence
> >>>>conversion it ended up something like the below (manually edited a bit to
> >>>>be less noisy and some prep patches omitted):
> >>>>
> >>>>This implements the ioctl based uapi and indeed seqnos are not actually
> >>>>used in waits. So is this insufficient for some reason? (Other that it
> >>>>does not implement the input fence side of things.)
> >>>Yeah android syncpt on top of struct fence embedded int i915 request is
> >>>what I'd have expected.
> >>The thing I'm not happy with in that plan is that it leaves the kernel
> >>driver at the mercy of user land applications. If we return a fence object
> >>to user land via a file descriptor (or indeed any other mechanism) then that
> >>fence object must be locked until user land closes the file. If the fence
> >>object is the one embedded within our request structure then that means user
> >>land is effectively locking our request structure. Given that more and more
> >>stuff is being attached to the request, that could be a fair bit of memory
> >>tied up that we can do nothing about. E.g. if a rogue/buggy application
> >>requests a fence be returned for every batch buffer submitted but never
> >>closes them. Whereas, if we go the route of a separate fence object
> >>specifically for user land then they can leak them like a sieve and we won't
> >>really care so much.
> >Userspace can exhaust kernel allocations, that's nothing new. And if we
> >keep it userspace simply needs to leak a few more fence fds than if
> >there's a bit more data attached to it.
> >
> >The solution to this problem is to have a mem cgroup limit set. No need to
> >complicate our kernel code.
> 
> There is still the extra complication that request unreferencing cannot
> require any kind of mutex lock if we are allowing it to happen from outside
> of the driver. That means the unreference callback must move the request to
> a 'please clean me later' list, schedule a worker thread to run, and thus do
> the clean up asynchronously.

Yeah, struct_mutex locking design is terribly, and we'll pay the prize for
that dearly until it's eventually fixed up. We can optimize it at least
with a mutex_try_lock.

Or we just fix up request tracking to not require struct_mutex, that might
be better. All the references we hold onto the request should point one
way with no weak references going the other direction, so this should be
possible.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch