Question on UAPI for fences

Fri Sep 12 09:38:24 PDT 2014

On Fri, Sep 12, 2014 at 05:58:09PM +0200, Christian König wrote:
 > pass in a list of fences to wait for before beginning a command 
submission.

The Android implementation has a mechanism for combining multiple sync 
points into a brand new single sync pt. Thus APIs only ever need to take 
in a single fd not a list of them. If the user wants an operation to 
wait for multiple events to occur then it is up to them to request the 
combined version first. They can then happily close the individual fds 
that have been combined and only keep the one big one around. Indeed, 
even that fd can be closed once it has been passed on to some other API.

Doing such combining and cleaning up fds as soon as they have been 
passed on should keep each application's fd usage fairly small.

On 12/09/2014 17:08, Christian König wrote:
>> As Daniel said using fd is most likely the way we want to do it but this
>> remains vague.
> Separating the discussion if it should be an fd or not. Using an fd 
> sounds fine to me in general, but I have some concerns as well.
>
> For example what was the maximum number of opened FDs per process 
> again? Could that become a problem? etc...
>
> Please comment,
> Christian.
>
> Am 12.09.2014 um 18:03 schrieb Jerome Glisse:
>> On Fri, Sep 12, 2014 at 05:58:09PM +0200, Christian König wrote:
>>> Am 12.09.2014 um 17:48 schrieb Jerome Glisse:
>>>> On Fri, Sep 12, 2014 at 05:42:57PM +0200, Christian König wrote:
>>>>> Am 12.09.2014 um 17:33 schrieb Jerome Glisse:
>>>>>> On Fri, Sep 12, 2014 at 11:25:12AM -0400, Alex Deucher wrote:
>>>>>>> On Fri, Sep 12, 2014 at 10:50 AM, Jerome Glisse 
>>>>>>> <j.glisse at gmail.com> wrote:
>>>>>>>> On Fri, Sep 12, 2014 at 04:43:44PM +0200, Daniel Vetter wrote:
>>>>>>>>> On Fri, Sep 12, 2014 at 4:09 PM, Daniel Vetter 
>>>>>>>>> <daniel at ffwll.ch> wrote:
>>>>>>>>>> On Fri, Sep 12, 2014 at 03:23:22PM +0200, Christian König wrote:
>>>>>>>>>>> Hello everyone,
>>>>>>>>>>>
>>>>>>>>>>> to allow concurrent buffer access by different engines 
>>>>>>>>>>> beyond the multiple
>>>>>>>>>>> readers/single writer model that we currently use in radeon 
>>>>>>>>>>> and other
>>>>>>>>>>> drivers we need some kind of synchonization object exposed 
>>>>>>>>>>> to userspace.
>>>>>>>>>>>
>>>>>>>>>>> My initial patch set for this used (or rather abused) zero 
>>>>>>>>>>> sized GEM buffers
>>>>>>>>>>> as fence handles. This is obviously isn't the best way of 
>>>>>>>>>>> doing this (to
>>>>>>>>>>> much overhead, rather ugly etc...), Jerome commented on this 
>>>>>>>>>>> accordingly.
>>>>>>>>>>>
>>>>>>>>>>> So what should a driver expose instead? Android sync points? 
>>>>>>>>>>> Something else?
>>>>>>>>>> I think actually exposing the struct fence objects as a fd, 
>>>>>>>>>> using android
>>>>>>>>>> syncpts (or at least something compatible to it) is the way 
>>>>>>>>>> to go. Problem
>>>>>>>>>> is that it's super-hard to get the android guys out of hiding 
>>>>>>>>>> for this :(
>>>>>>>>>>
>>>>>>>>>> Adding a bunch of people in the hopes that something sticks.
>>>>>>>>> More people.
>>>>>>>> Just to re-iterate, exposing such thing while still using 
>>>>>>>> command stream
>>>>>>>> ioctl that use implicit synchronization is a waste and you can 
>>>>>>>> only get
>>>>>>>> the lowest common denominator which is implicit 
>>>>>>>> synchronization. So i do
>>>>>>>> not see the point of such api if you are not also adding a new 
>>>>>>>> cs ioctl
>>>>>>>> with explicit contract that it does not do any kind of 
>>>>>>>> synchronization
>>>>>>>> (it could be almost the exact same code modulo the do not wait for
>>>>>>>> previous cmd to complete).
>>>>>>> Our thinking was to allow explicit sync from a single process, but
>>>>>>> implicitly sync between processes.
>>>>>> This is a BIG NAK if you are using the same ioctl as it would 
>>>>>> mean you are
>>>>>> changing userspace API, well at least userspace expectation. 
>>>>>> Adding a new
>>>>>> cs flag might do the trick but it should not be about 
>>>>>> inter-process, or any
>>>>>> thing special, it's just implicit sync or no synchronization. 
>>>>>> Converting
>>>>>> userspace is not that much of a big deal either, it can be broken 
>>>>>> into
>>>>>> several step. Like mesa use explicit synchronization all time but 
>>>>>> ddx use
>>>>>> implicit.
>>>>> The thinking here is that we need to be backward compatible for 
>>>>> DRI2/3 and
>>>>> support all kind of different use cases like old DDX and new Mesa, 
>>>>> or old
>>>>> Mesa and new DDX etc...
>>>>>
>>>>> So for my prototype if the kernel sees any access of a BO from two 
>>>>> different
>>>>> clients it falls back to the old behavior of implicit 
>>>>> synchronization of
>>>>> access to the same buffer object. That might not be the fastest 
>>>>> approach,
>>>>> but is as far as I can see conservative and so should work under all
>>>>> conditions.
>>>>>
>>>>> Apart from that the planning so far was that we just hide this 
>>>>> feature
>>>>> behind a couple of command submission flags and new chunks.
>>>> Just to reproduce IRC discussion, i think it's a lot simpler and 
>>>> not that
>>>> complex. For explicit cs ioctl you do not wait for any previous 
>>>> fence of
>>>> any of the buffer referenced in the cs ioctl, but you still 
>>>> associate a
>>>> new fence with all the buffer object referenced in the cs ioctl. So 
>>>> if the
>>>> next ioctl is an implicit sync ioctl it will wait properly and 
>>>> synchronize
>>>> properly with previous explicit cs ioctl. Hence you can easily have 
>>>> a mix
>>>> in userspace thing is you only get benefit once enough of your 
>>>> userspace
>>>> is using explicit.
>>> Yes, that's exactly what my patches currently implement.
>>>
>>> The only difference is that by current planning I implemented it as 
>>> a per BO
>>> flag for the command submission, but that was just for testing. 
>>> Having a
>>> single flag to switch between implicit and explicit synchronization for
>>> whole CS IOCTL would do equally well.
>> Doing it per BO sounds bogus to me. But otherwise yes we are in 
>> agreement.
>> As Daniel said using fd is most likely the way we want to do it but this
>> remains vague.
>>
>>>> Note that you still need a way to have explicit cs ioctl to wait on a
>>>> previos "explicit" fence so you need some api to expose fence per cs
>>>> submission.
>>> Exactly, that's what this mail thread is all about.
>>>
>>> As Daniel correctly noted you need something like a functionality to 
>>> get a
>>> fence as the result of a command submission as well as pass in a 
>>> list of
>>> fences to wait for before beginning a command submission.
>>>
>>> At least it looks like we are all on the same general line here, its 
>>> just
>>> nobody has a good idea how the details should look like.
>>>
>>> Regards,
>>> Christian.
>>>
>>>> Cheers,
>>>> Jérôme
>>>>
>>>>> Regards,
>>>>> Christian.
>>>>>
>>>>>> Cheers,
>>>>>> Jérôme
>>>>>>
>>>>>>> Alex
>>>>>>>
>>>>>>>> Also one thing that the Android sync point does not have, 
>>>>>>>> AFAICT, is a
>>>>>>>> way to schedule synchronization as part of a cs ioctl so cpu 
>>>>>>>> never have
>>>>>>>> to be involve for cmd stream that deal only one gpu (assuming 
>>>>>>>> the driver
>>>>>>>> and hw can do such trick).
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> Jérôme
>>>>>>>>
>>>>>>>>> -Daniel
>>>>>>>>> -- 
>>>>>>>>> Daniel Vetter
>>>>>>>>> Software Engineer, Intel Corporation
>>>>>>>>> +41 (0) 79 365 57 48 - http://blog.ffwll.ch
>>>>>>>> _______________________________________________
>>>>>>>> dri-devel mailing list
>>>>>>>> dri-devel at lists.freedesktop.org
>>>>>>>> http://lists.freedesktop.org/mailman/listinfo/dri-devel
>