[Mesa-dev] [PATCH 09/17] mesa/st: add support for waiting for semaphore objects

Fri Nov 3 17:36:46 UTC 2017

On 2017-11-03 05:17 AM, Nicolai Hähnle wrote:
> On 02.11.2017 04:57, Andres Rodriguez wrote:
>> Bits to implement ServerWaitSemaphoreObject/ServerSignalSemaphoreObject
>>
>> Signed-off-by: Andres Rodriguez <andresx7 at gmail.com>
>> ---
>>   src/mesa/state_tracker/st_cb_semaphoreobjects.c | 28 
>> +++++++++++++++++++++++++
>>   1 file changed, 28 insertions(+)
>>
>> diff --git a/src/mesa/state_tracker/st_cb_semaphoreobjects.c 
>> b/src/mesa/state_tracker/st_cb_semaphoreobjects.c
>> index 47ece47..4cff3fd 100644
>> --- a/src/mesa/state_tracker/st_cb_semaphoreobjects.c
>> +++ b/src/mesa/state_tracker/st_cb_semaphoreobjects.c
>> @@ -1,5 +1,7 @@
>> +
>>   #include "main/imports.h"
>>   #include "main/mtypes.h"
>> +#include "main/context.h"
>>   #include "main/externalobjects.h"
>> @@ -47,10 +49,36 @@ st_import_semaphoreobj_fd(struct gl_context *ctx,
>>   #endif
>>   }
>> +static void
>> +st_server_wait_semaphore(struct gl_context *ctx,
>> +                         struct gl_semaphore_object *semObj)
>> +{
>> +   struct st_semaphore_object *st_obj = st_semaphore_object(semObj);
>> +   struct st_context *st = st_context(ctx);
>> +   struct pipe_context *pipe = st->pipe;
>> +
>> +   _mesa_flush(ctx);
> 
> What's the justification of this flush?

Thanks for all the review feedback :)

Technically according to the spec we shouldn't need to flush here. But 
the flush is required due to how amdgpu handles syncobjs.

For amdgpu, the wait and signal operations happens on a submission 
boundary. I.e, we can say "wait for this syncobj before executing this 
submission", or "signal this syncobj after this submission completes". 
But we can't really introduce a wait/signal in the middle of a 
commandstream as the spec requires us to do.

This flush (and the one in st_server_signal_semaphore) exist to adjust 
the submission boundaries. If we align the submission boundary with the 
semaphore operation boundary we can accurately emulate the timing of the 
semaphore operations.

In this case, when st_server_wait_semaphore() is called, the user 
expects that any work queued up until this point will not wait for the 
semaphore. Any work submitted after the wait call returns should wait 
for the semaphore to signal. Hence, we call flush so that the amdgpu 
winsys will submit all the current work. Then we queue up the syncobj 
for future submissions to wait on.

In the signal case, we flush so that the syncobj will signal when the 
current work finishes (instead of at "current work + N"). That way we 
don't have any unnecessary latency.

However, I'm not 100% certain my approach here is the best solution. My 
main concern is that an amdgpu specific detail is surfaced in what 
should be a pipe agnostic layer. If you have any recommendations on how 
to move this code into the pipe layer, or achieve the same results with 
a different strategy I'd love to take the advice.

 From the performance point of view, I'm not as concerned. Wait/signal 
tend to be aligned at frame boundaries so flushing there shouldn't be a 
big deal. But I do think that improving the code structure from my 
current iteration would have value.

> 
> 
>> +   pipe->semobj_wait(pipe, st_obj->semaphore);
>> +}
>> +
>> +static void
>> +st_server_signal_semaphore(struct gl_context *ctx,
>> +                           struct gl_semaphore_object *semObj)
>> +{
>> +   struct st_semaphore_object *st_obj = st_semaphore_object(semObj);
>> +   struct st_context *st = st_context(ctx);
>> +   struct pipe_context *pipe = st->pipe;
>> +
> 
> You have to flush state-tracker internal state here. This means 
> FLUSH_VERTICES, st_flush_bitmap, possibly others. I don't think we have 
> this factored out well yet, but see below.

Thanks for pointing this out, this actually gives me an idea for getting 
rid of the full flushes, see below.

> 
> 
>> +   pipe->semobj_signal(pipe, st_obj->semaphore);
>> +   _mesa_flush(ctx);
> 
> What's the justification of this flush? Does EXT_external_objects 
> contain any language to guarantee that SignalSemaphoreEXT will "finish 
> in finite time"? If no, the flush should probably not be there. If yes, 
> please add a spec quote.

See above for the flush reasoning.

> 
> Furthermore, this whole code section is a strong hint that merging 
> fences and semaphores at the Gallium level is indeed a good idea. 
> Basically, the idea is that you'd end up calling
> 
>      pipe->flush(pipe, &st_obj->semaphore, flags);
> 
> where flags is PIPE_FLUSH_DEFERRED | PIPE_FLUSH_REUSE_FENCE or 
> PIPE_FLUSH_ASYNC | PIPE_FLUSH_REUSE_FENCE, depending on how the previous 
> paragraph is resolved. In one of my recent series that includes deferred 
> fences for threaded contexts, I'm already doing the equivalent of adding 
> a "PIPE_FLUSH_REUSE_FENCE". We could formalize this as a more broadly 
> applicable feature in the Gallium interface.
> 

Thanks for pointing out PIPE_FLUSH_REUSE_FENCE, I wasn't aware of this flag.

My reasoning for keeping semaphore operations separate is that a 
different pipe implementation might be able to emit a wait/signal packet 
into the middle of the commandstream, with no flushing required.

However, that argument has a giant hole in it. I have a _mesa_flush() 
which completely negates any benefits of a "pipe agnostic" 
implementation. So in this case, as you suggested, I think merging 
semaphores with re-usable fences would be a good idea.

I do have one alternative proposal try to keep the option of 
mid-commandstream semaphores open. How about something like:

flush_state_tracker_internal_state() {
	flush_vertices();
	flush_bitmaps();
}

st_server_wait_semaphore() {
	flush_state_tracker_internal_state();
	pipe->semobj_wait(pipe, st_obj->semaphore);

	// The amdgpu semobj_wait() implementation would
	// then be responsible for adjusting the submission
	// windows internally, i.e. calling flush()
}

st_server_signal_semaphore() {
	flush_state_tracker_internal_state();
	pipe->semobj_signal(pipe, st_obj->semaphore);

	// The amdgpu semobj_signal() implementation would
	// then be responsible for adjusting the submission
	// windows internally, i.e. calling flush()
}

Let me know what you think of this slightly more agnostic implementation.

Sorry for the long reply!

Regards,
Andres

> Cheers,
> Nicolai
> 
> 
>> +}
>> +
>>   void
>>   st_init_semaphoreobject_functions(struct dd_function_table *functions)
>>   {
>>      functions->NewSemaphoreObject = st_semaphoreobj_alloc;
>>      functions->DeleteSemaphoreObject = st_semaphoreobj_free;
>>      functions->ImportSemaphoreFd = st_import_semaphoreobj_fd;
>> +   functions->ServerWaitSemaphoreObject = st_server_wait_semaphore;
>> +   functions->ServerSignalSemaphoreObject = st_server_signal_semaphore;
>>   }
>>
> 
>