[Mesa-dev] [PATCH 17/20] radeonsi: do not do two full flushes on every compute dispatch

Marek Olšák maraeo at gmail.com
Mon Apr 4 23:54:18 UTC 2016


On Tue, Apr 5, 2016 at 1:30 AM, Bas Nieuwenhuizen
<bas at basnieuwenhuizen.nl> wrote:
> On Tue, Apr 5, 2016 at 1:18 AM, Marek Olšák <maraeo at gmail.com> wrote:
>> On Sat, Apr 2, 2016 at 3:11 PM, Bas Nieuwenhuizen
>> <bas at basnieuwenhuizen.nl> wrote:
>>> Signed-off-by: Bas Nieuwenhuizen <bas at basnieuwenhuizen.nl>
>>> ---
>>>  src/gallium/drivers/radeonsi/si_compute.c | 17 ++---------------
>>>  src/gallium/drivers/radeonsi/si_state.c   |  6 ++++--
>>>  2 files changed, 6 insertions(+), 17 deletions(-)
>>>
>>> diff --git a/src/gallium/drivers/radeonsi/si_compute.c b/src/gallium/drivers/radeonsi/si_compute.c
>>> index 6549396..8fe4fe7 100644
>>> --- a/src/gallium/drivers/radeonsi/si_compute.c
>>> +++ b/src/gallium/drivers/radeonsi/si_compute.c
>>> @@ -433,13 +433,8 @@ static void si_launch_grid(
>>>         if (!sctx->cs_shader_state.initialized)
>>>                 si_initialize_compute(sctx);
>>>
>>> -       sctx->b.flags |= SI_CONTEXT_INV_VMEM_L1 |
>>> -                        SI_CONTEXT_INV_GLOBAL_L2 |
>>> -                        SI_CONTEXT_INV_ICACHE |
>>> -                        SI_CONTEXT_INV_SMEM_L1 |
>>> -                        SI_CONTEXT_FLUSH_WITH_INV_L2 |
>>> -                        SI_CONTEXT_FLAG_COMPUTE;
>>> -       si_emit_cache_flush(sctx, NULL);
>>> +       if(sctx->b.flags)
>>
>> "if (" please
>>
>>> +               si_emit_cache_flush(sctx, NULL);
>>>
>>>         if (!si_switch_compute_shader(sctx, program, &program->shader, info->pc))
>>>                 return;
>>> @@ -466,14 +461,6 @@ static void si_launch_grid(
>>>                 si_setup_tgsi_grid(sctx, info);
>>>
>>>         si_emit_dispatch_packets(sctx, info);
>>> -
>>> -       sctx->b.flags |= SI_CONTEXT_CS_PARTIAL_FLUSH |
>>> -                        SI_CONTEXT_INV_VMEM_L1 |
>>> -                        SI_CONTEXT_INV_GLOBAL_L2 |
>>> -                        SI_CONTEXT_INV_ICACHE |
>>> -                        SI_CONTEXT_INV_SMEM_L1 |
>>> -                        SI_CONTEXT_FLAG_COMPUTE;
>>> -       si_emit_cache_flush(sctx, NULL);
>>>  }
>>>
>>>
>>> diff --git a/src/gallium/drivers/radeonsi/si_state.c b/src/gallium/drivers/radeonsi/si_state.c
>>> index 10d691a..95d2589 100644
>>> --- a/src/gallium/drivers/radeonsi/si_state.c
>>> +++ b/src/gallium/drivers/radeonsi/si_state.c
>>> @@ -3533,7 +3533,8 @@ static void si_memory_barrier(struct pipe_context *ctx, unsigned flags)
>>>
>>>         /* Subsequent commands must wait for all shader invocations to
>>>          * complete. */
>>> -       sctx->b.flags |= SI_CONTEXT_PS_PARTIAL_FLUSH;
>>> +       sctx->b.flags |= SI_CONTEXT_PS_PARTIAL_FLUSH |
>>> +                        SI_CONTEXT_CS_PARTIAL_FLUSH;
>>>
>>>         if (flags & PIPE_BARRIER_CONSTANT_BUFFER)
>>>                 sctx->b.flags |= SI_CONTEXT_INV_SMEM_L1 |
>>> @@ -3543,7 +3544,8 @@ static void si_memory_barrier(struct pipe_context *ctx, unsigned flags)
>>>                      PIPE_BARRIER_SHADER_BUFFER |
>>>                      PIPE_BARRIER_TEXTURE |
>>>                      PIPE_BARRIER_IMAGE |
>>> -                    PIPE_BARRIER_STREAMOUT_BUFFER)) {
>>> +                    PIPE_BARRIER_STREAMOUT_BUFFER |
>>> +                    PIPE_BARRIER_GLOBAL_BUFFER)) {
>>>                 /* As far as I can tell, L1 contents are written back to L2
>>>                  * automatically at end of shader, but the contents of other
>>>                  * L1 caches might still be stale. */
>>
>> Do we have a test that writes to a buffer via a PS and then read from
>> it via a CS? And the same with a CS followed by a VS.
>>
>> I'm not sure if the hardware can execute compute and draw jobs in
>> parallel within one IB. (it can certainly do that with draw calls)
>
> Nothing in piglit as far as I know. I tested with a local test with
> two compute dispatches that not using CS_PARTIAL_FLUSH the second
> dispatch missed writes from the first and using CS_PARTIAL_FLUSH that
> synchronization was fixed.
>
> So I think the hardware is able to execute multiple compute dispatches
> at the same time in a single IB and that this add proper
> synchronization. I have not tried any compute<->graphics interaction.

Thanks. It looks like we do need partial flushes between dependent
draw and dispatch calls. We also need piglit tests for this. Streamout
does VS_PARTIAL_FLUSH so that written buffers are immediately visible
by the next draw call. If there are no dependencies, no partial
flushes are needed. If we can't track the dependencies, we should do a
partial flush always. Partial flushes just wait until the given shader
stage is idle. Without those, the hardware launches draw and dispatch
calls without waiting for previous ones.

Marek


More information about the mesa-dev mailing list