Fence, timeline and android sync points

Thu Aug 14 12:15:06 PDT 2014

On Thu, Aug 14, 2014 at 08:47:16PM +0200, Daniel Vetter wrote:
> On Thu, Aug 14, 2014 at 8:18 PM, Jerome Glisse <j.glisse at gmail.com> wrote:
> > Sucks because you can not do weird synchronization like one i depicted in another
> > mail in this thread and for as long as cmdbuf_ioctl do not give you fence|syncpt
> > you can not such thing cleanly in non hackish way.
> 
> Actually i915 can soon will do that that.

So you will return fence|syncpoint with each cmdbuf_ioctl ?

> 
> > Sucks because you have a fence object per buffer object and thus overhead grow
> > with the number of objects. Not even mentioning fence lifetime issue.
> >
> > Sucks because sub-buffer allocation is just one of many tricks that can not be
> > achieved properly and cleanly with implicit sync.
> >
> > ...
> 
> Well I heard all those reasons and I'm well of aware of them. The
> problem is that with current hardware the kernel needs to know for
> each buffer how long it needs to be kept around since hw just can't do
> page faulting. Yeah you can pin them but for an uma design that
> doesn't go down well with folks.

I am not thinking with fancy hw in mind, on contrary i thought about all
this with the crappiest hw i could think of, in mind.

Yes you can get rid of fence and not have to pin memory with current hw.
What matter for unpinning is to know that all hw block are done using the
memory. This is easily achievable with your beloved seqno. Have one seqno
per driver (one driver can have different block 3d, video decoding, crtc,
...) each time a buffer is use as part of a command on one block inc the
common seqno and tag the buffer with that number. Have each hw block write
the lastest seqno that is done to a per block location. Now to determine
is buffer is done compare the buffer seqno with the max of all the signaled
seqno of all blocks.

Cost 1 uint32 per buffer and simple if without locking to check status of
a buffer.

Yes preemption and gpu scheduling would break such scheme, but my point is
that when you have such gpu you want to implement a proper solution. Which
of course require quite some work accross the stack. So the past can live
on but the future needs to get its acts together.

> The other problem is that the Linux Desktop I don't seem to care about
> any more kinda relies on implicit syncing right now, so we better keep
> that working fairly well. Of course we could dream up a shiny new
> world where all of the Linux desktop does explicit syncing, but that
> world doesn't exist right now. I mean really if you want to right away
> throw implicit syncing overboard that doesn't bode well for the
> current linux desktop.

Again i fail at expressing myself. I am saying throw things over board,
i am well aware of the current reliance on implicit fencing. I am saying
if fence wants to be this new thing that should allow to do explicit
fencing in the future than it better be done correctly in the first place.

> So I don't understand all the implicit syncing bashing. It's here, and
> it'll stay for a while longer whether you like it or not ...

I am saying this is where we are and it sucks for a number of reasons,
then looking at fence and by looking at fence i am saying this try to
go in the right direction but do crazy things that i am convince we
will regret. In other word if we ever get to the explicit fence better
starts on the right path with the right tool. Moreover i am saying that
this can be done without breaking implicit sync we have today.

> Of course that doesn't mean we (intel here) won't support explicit
> syncing too, and I really don't see a conflict here when mixing and
> matching these two approaches.

Again i fail to express myself. I am not saying there is conflict. I
am saying better take a path which allow to go full way with explicit
fencing while still allowing a less optimal use for an implicit sync
model.

My point is the fence code proposed here, keeps the worst thing about
implicit fencing we have today. This can be done differently, in what
i believe to be better way. And this different approach stills allow
to have have implicit sync for existing userspace.

Cheers,
Jérôme

> -Daniel
> 
> > Having code that work around or alieviate some of this, is in no way a testimony
> > that it's the best solution. I do believe explicit sync to be superior in use
> > case it allows way more freedom while the only drawback i see is that you have to
> > put some trust into userspace.
> >
> > So yes implicit sync sucks and it does map to i915 reality as well.
> 
> 
> 
> 
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch