Fence, timeline and android sync points

Thu Aug 14 16:03:44 PDT 2014

On Thu, Aug 14, 2014 at 11:23:01PM +0200, Daniel Vetter wrote:
> On Thu, Aug 14, 2014 at 9:15 PM, Jerome Glisse <j.glisse at gmail.com> wrote:
> > Cost 1 uint32 per buffer and simple if without locking to check status of
> > a buffer.
> 
> Yeah well except it doesn't and that's why we switch to full blown
> fence objects internally instead of smacking seqno values all over the
> place. At least in i915 because I have seen what this "simple"
> solution looks like if you throw all the complexities into the bin and
> mix it well.

I am kind of curious on what can go wrong here ? Unless you have thousands
of different hw block inside your gpu all of them with different cmd queue
then i do not see any issue.

Note that this global seqno i am talking is only useful for bind/unbind of
buffer in ideal world of explicit sync, not for synchronization btw command
buffer. So pseudo code would be :

// if buffer_can_unbind(buf, dev) return true then buffer is no longer in
// use and can be unbind. if false you can wait on the device wait queue.
bool buffer_can_unbind(buf, dev)
{
  // Is the last gseqno associated with buffer is smaller than current
  // smallest signaled seqno ?
  if (buf->gseqno <= dev->gseqno_cmin)
    return true;
  hw_update_gseqno_min(dev);
  if (buf->gseqno <= dev->gseqno_cmin)
    return true;
  for_each_hw_block(hwblock, dev) {
    // If that hwblock has not signaled a gseqno bigger than the
    // buffer one's and also has work scheduled that might be using
    // the buffer (ie the last scheduled gseqno is greater than
    // buffer gseqno). If that's true than buffer might still be
    // in use so assume the worst.
    if (hwblock->gseqno < buf->gseqno &&
        hwblock->gseqno_last_scheduled >= buf->gseqno)
      return false;
    // This means either this block is already past the buf->gseqno
    // or that this block have nothing in its pipeline that will ever
    // use buf.
    // As an extra optimization one might add a hwblock mask to buf
    // and unmask wait for that hwblock so further wait will not wait
    // on this block as we know for sure it's not involve.
  }
  // Well none of the hwblock is still using that buffer.
  return true;
}

hw_update_gseqno_min(dev)
{
   unsigned nmin = -1;

   for_each_hw_block(hwblock, dev) {
     nmin = min(nmin, hwblock->gseqno);
   }
   // atomic exchange and compage set new min only if it's bigger then
   // current min
   atomic_cmp_xchg(dev->gseqno_cmin, nmin)
}

So this is how it looks in my mind, each hwblock write to there dedicated
>gseqno and for each hwblock you store the last gseqno that was scheduled.
There is no need for any lock and this is outside any other sync mecanism.

For hw with preemption, i assume you have then have a hwcontext, well you
use same code except that now you have a gseqno per context which means
you need to store a seqno per context per buffer. With some trickery this
might be avoided especialy if hw can do atomic cmp_xchg.

Cheers,
Jérôme

> -Daniel
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch