[Mesa-dev] [PATCH 09/17] mesa/st: add support for waiting for semaphore objects

Fri Nov 3 18:15:12 UTC 2017

FLUSH_VERTICES should be called by mesa/main if it's inside a GL call.

st/mesa should only flush bitmaps.

Marek

On Fri, Nov 3, 2017 at 6:36 PM, Andres Rodriguez <andresx7 at gmail.com> wrote:
>
>
> On 2017-11-03 05:17 AM, Nicolai Hähnle wrote:
>>
>> On 02.11.2017 04:57, Andres Rodriguez wrote:
>>>
>>> Bits to implement ServerWaitSemaphoreObject/ServerSignalSemaphoreObject
>>>
>>> Signed-off-by: Andres Rodriguez <andresx7 at gmail.com>
>>> ---
>>>   src/mesa/state_tracker/st_cb_semaphoreobjects.c | 28
>>> +++++++++++++++++++++++++
>>>   1 file changed, 28 insertions(+)
>>>
>>> diff --git a/src/mesa/state_tracker/st_cb_semaphoreobjects.c
>>> b/src/mesa/state_tracker/st_cb_semaphoreobjects.c
>>> index 47ece47..4cff3fd 100644
>>> --- a/src/mesa/state_tracker/st_cb_semaphoreobjects.c
>>> +++ b/src/mesa/state_tracker/st_cb_semaphoreobjects.c
>>> @@ -1,5 +1,7 @@
>>> +
>>>   #include "main/imports.h"
>>>   #include "main/mtypes.h"
>>> +#include "main/context.h"
>>>   #include "main/externalobjects.h"
>>> @@ -47,10 +49,36 @@ st_import_semaphoreobj_fd(struct gl_context *ctx,
>>>   #endif
>>>   }
>>> +static void
>>> +st_server_wait_semaphore(struct gl_context *ctx,
>>> +                         struct gl_semaphore_object *semObj)
>>> +{
>>> +   struct st_semaphore_object *st_obj = st_semaphore_object(semObj);
>>> +   struct st_context *st = st_context(ctx);
>>> +   struct pipe_context *pipe = st->pipe;
>>> +
>>> +   _mesa_flush(ctx);
>>
>>
>> What's the justification of this flush?
>
>
> Thanks for all the review feedback :)
>
> Technically according to the spec we shouldn't need to flush here. But the
> flush is required due to how amdgpu handles syncobjs.
>
> For amdgpu, the wait and signal operations happens on a submission boundary.
> I.e, we can say "wait for this syncobj before executing this submission", or
> "signal this syncobj after this submission completes". But we can't really
> introduce a wait/signal in the middle of a commandstream as the spec
> requires us to do.
>
> This flush (and the one in st_server_signal_semaphore) exist to adjust the
> submission boundaries. If we align the submission boundary with the
> semaphore operation boundary we can accurately emulate the timing of the
> semaphore operations.
>
> In this case, when st_server_wait_semaphore() is called, the user expects
> that any work queued up until this point will not wait for the semaphore.
> Any work submitted after the wait call returns should wait for the semaphore
> to signal. Hence, we call flush so that the amdgpu winsys will submit all
> the current work. Then we queue up the syncobj for future submissions to
> wait on.
>
> In the signal case, we flush so that the syncobj will signal when the
> current work finishes (instead of at "current work + N"). That way we don't
> have any unnecessary latency.
>
> However, I'm not 100% certain my approach here is the best solution. My main
> concern is that an amdgpu specific detail is surfaced in what should be a
> pipe agnostic layer. If you have any recommendations on how to move this
> code into the pipe layer, or achieve the same results with a different
> strategy I'd love to take the advice.
>
> From the performance point of view, I'm not as concerned. Wait/signal tend
> to be aligned at frame boundaries so flushing there shouldn't be a big deal.
> But I do think that improving the code structure from my current iteration
> would have value.
>
>
>>
>>
>>> +   pipe->semobj_wait(pipe, st_obj->semaphore);
>>> +}
>>> +
>>> +static void
>>> +st_server_signal_semaphore(struct gl_context *ctx,
>>> +                           struct gl_semaphore_object *semObj)
>>> +{
>>> +   struct st_semaphore_object *st_obj = st_semaphore_object(semObj);
>>> +   struct st_context *st = st_context(ctx);
>>> +   struct pipe_context *pipe = st->pipe;
>>> +
>>
>>
>> You have to flush state-tracker internal state here. This means
>> FLUSH_VERTICES, st_flush_bitmap, possibly others. I don't think we have this
>> factored out well yet, but see below.
>
>
> Thanks for pointing this out, this actually gives me an idea for getting rid
> of the full flushes, see below.
>
>>
>>
>>> +   pipe->semobj_signal(pipe, st_obj->semaphore);
>>> +   _mesa_flush(ctx);
>>
>>
>> What's the justification of this flush? Does EXT_external_objects contain
>> any language to guarantee that SignalSemaphoreEXT will "finish in finite
>> time"? If no, the flush should probably not be there. If yes, please add a
>> spec quote.
>
>
> See above for the flush reasoning.
>
>>
>> Furthermore, this whole code section is a strong hint that merging fences
>> and semaphores at the Gallium level is indeed a good idea. Basically, the
>> idea is that you'd end up calling
>>
>>      pipe->flush(pipe, &st_obj->semaphore, flags);
>>
>> where flags is PIPE_FLUSH_DEFERRED | PIPE_FLUSH_REUSE_FENCE or
>> PIPE_FLUSH_ASYNC | PIPE_FLUSH_REUSE_FENCE, depending on how the previous
>> paragraph is resolved. In one of my recent series that includes deferred
>> fences for threaded contexts, I'm already doing the equivalent of adding a
>> "PIPE_FLUSH_REUSE_FENCE". We could formalize this as a more broadly
>> applicable feature in the Gallium interface.
>>
>
> Thanks for pointing out PIPE_FLUSH_REUSE_FENCE, I wasn't aware of this flag.
>
> My reasoning for keeping semaphore operations separate is that a different
> pipe implementation might be able to emit a wait/signal packet into the
> middle of the commandstream, with no flushing required.
>
> However, that argument has a giant hole in it. I have a _mesa_flush() which
> completely negates any benefits of a "pipe agnostic" implementation. So in
> this case, as you suggested, I think merging semaphores with re-usable
> fences would be a good idea.
>
> I do have one alternative proposal try to keep the option of
> mid-commandstream semaphores open. How about something like:
>
> flush_state_tracker_internal_state() {
>         flush_vertices();
>         flush_bitmaps();
> }
>
> st_server_wait_semaphore() {
>         flush_state_tracker_internal_state();
>         pipe->semobj_wait(pipe, st_obj->semaphore);
>
>         // The amdgpu semobj_wait() implementation would
>         // then be responsible for adjusting the submission
>         // windows internally, i.e. calling flush()
> }
>
>
> st_server_signal_semaphore() {
>         flush_state_tracker_internal_state();
>         pipe->semobj_signal(pipe, st_obj->semaphore);
>
>         // The amdgpu semobj_signal() implementation would
>         // then be responsible for adjusting the submission
>         // windows internally, i.e. calling flush()
> }
>
> Let me know what you think of this slightly more agnostic implementation.
>
> Sorry for the long reply!
>
> Regards,
> Andres
>
>
>> Cheers,
>> Nicolai
>>
>>
>>> +}
>>> +
>>>   void
>>>   st_init_semaphoreobject_functions(struct dd_function_table *functions)
>>>   {
>>>      functions->NewSemaphoreObject = st_semaphoreobj_alloc;
>>>      functions->DeleteSemaphoreObject = st_semaphoreobj_free;
>>>      functions->ImportSemaphoreFd = st_import_semaphoreobj_fd;
>>> +   functions->ServerWaitSemaphoreObject = st_server_wait_semaphore;
>>> +   functions->ServerSignalSemaphoreObject = st_server_signal_semaphore;
>>>   }
>>>
>>
>>
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev