[PATCH 02/11] dma-buf: add new dma_fence_chain container v4

Jason Ekstrand jason at jlekstrand.net
Fri Feb 15 19:11:30 UTC 2019


On Fri, Feb 15, 2019 at 12:33 PM Koenig, Christian <Christian.Koenig at amd.com>
wrote:

> Am 15.02.19 um 19:16 schrieb Jason Ekstrand:
>
> On Fri, Feb 15, 2019 at 11:51 AM Christian König <
> ckoenig.leichtzumerken at gmail.com> wrote:
>
>> Am 15.02.19 um 17:49 schrieb Jason Ekstrand:
>>
>> On Fri, Feb 15, 2019 at 9:52 AM Lionel Landwerlin via dri-devel <
>> dri-devel at lists.freedesktop.org> wrote:
>>
>>> On 15/02/2019 14:32, Koenig, Christian wrote:
>>> > Am 15.02.19 um 15:23 schrieb Lionel Landwerlin:
>>> >> Hi Christian, David,
>>> >>
>>> >> For timeline semaphore we need points to signaled in order.
>>> >> I'm struggling to understand how this fence-chain implementation
>>> >> preserves ordering of the seqnos.
>>> >>
>>> >> One of the scenario I can see an issue happening is when you have a
>>> >> timeline with points 1 & 2 and userspace submits for 2 different
>>> >> engines :
>>> >>      - first with let's say a blitter style engine on point 2
>>> >>      - then a 3d style engine on point 1
>>> > Yeah, and where exactly is the problem?
>>> >
>>> > Seqno 1 will signal when the 3d style engine finishes work.
>>> >
>>> > And seqno 2 will signal when both seqno 1 is signaled and the blitter
>>> > style engine has finished its work.
>>>
>>
>> That's an interesting interpretation of the spec.  I think it's legal and
>> I could see that behavior may be desirable in some ways.
>>
>>
>> Well we actually had this discussion multiple times now, both internally
>> as well as on the mailing list. Please also see the previous mails with
>> Daniel on this topic.
>>
>
> I dug through dri-devel and read everything I could find with a search for
> "timeline semaphore"  I didn't find all that much but this did come up once.
>
>
> Need to dig through my mails as well, that was back in November/December
> last year.
>
>
>
>> My initial suggestion was actually to exactly what Leonid suggested as
>> well.
>>
>> And following this I used a rather simple container for the
>> implementation, e.g. just a ring buffer indexed by the sequence number. In
>> this scenario userspace can specify on syncobj creation time how big the
>> window for sequence numbers should be, e.g. in this implementation how big
>> the ring buffer would be.
>>
>> This was rejected by our guys who actually wrote a good part of the
>> Vulkan specification. Daniel then has gone into the same direction during
>> the public discussion.
>>
>
> I agree with whoever said that specifying a ringbuffer size is
> unacceptable.  I'm not really sure how that's relevant though.  Is a
> ringbuffer required to implement the behavior that is being suggested
> here?  Genuine question; I'm trying to get back up to speed.
>
>
> Using a ring buffer was just an example how we could do it if we follow my
> and Lionel's suggestion.
>
> Key point is that we could simplify the implementation massively if
> sequence numbers don't need to depend on each other.
>
> In other words we just see the syncobj as container where fences are added
> and retrieved from instead of something actively involved in the signaling.
>

In principal, I think this is a reasonable argument.  Having it involved in
signalling doesn't seem terrible to me but it would mean that a driver
wouldn't be able to detect that the fence it's waiting on actually belongs
to itself and optimize things.


> Main reason we didn't do it this way is because the AMD Vulkan team has
> rejected this approach.
>

Clearly, there's not quite as much agreement as I'd thought there was.  Oh,
well, that's why we have these discussions.


> Additional to that chaining sequence numbers together is really the more
> defensive approach, e.g. it is less likely that applications can shoot
> themselves in the foot.
>

Yeah, I can see how the "everything prior to n must be signalled" could be
safer.  I think both wait-any and wait-all have their ups and downs.  It
just took me by surprise.


>
>  4. If you do get into a sticky situation, you can unblock an entire
>> timeline by using the CPU signal ioctl to set it to a high value.
>>
>>
>> Well I think that this could be problematic as well. Keep in mind that
>> main use case for this is sharing timelines between processes.
>>
>> In other words you don't want applications to be able to mess with it to
>> much.
>>
>
> Cross-process is exactly why you want it.  Suppose you're a compositor and
> you have a timeline shared with another application and you've submitted
> work which waits on it.  Then you get a notification somehow (SIGHUP?) that
> the client has died leaving you hanging.  What do you do?  You take the
> semaphore that's shared with you and the client and whack it to UINT64_MAX
> to unblock yourself.  Of course, this can be abused and that's always the
> risk you take with timelines.
>
>
> My last status is that basically everybody agrees now that wait before
> signal in the kernel is forbidden.
>

Agreed.  I'm not saying that wait before signal in the kernel should be a
thing.  I think we're all agreed that wait-before-signal with the current
GEM infrastructure is utter insanity.

However, timeline syncobjs are both a kernel wait mechanism (only for time
points that already have a known-to-signal dma_fence) and a userspace wait
mechanism (which can wait for things which haven't materialized yet).  The
whacking to UINT64_MAX would be to unblock waiting userspace threads as you
mentioned below.

That said, in the case that I suggested here, if the client process died,
got kicked off the GPU, or whatever, then the kernel has likely declared
it's context a loss and signalled all dma_fences associated with it.  If
this is true, then whacking it to UINT64_MAX still works to unblock the
timeline because there would be nothing else pending to signal time
points.  Worst case, the process crashed and left valid GPU work pending in
the kernel and the compositor ends up waiting a little while longer for
said work to complete.

Ok, I think I'm reasonably convinced that the wait-all behaviour implied by
the chaining approach, while unexpected, isn't harmful.

--Jason
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20190215/47f3f573/attachment-0001.html>


More information about the amd-gfx mailing list