[Mesa-dev] [PATCH] anv/cmd_buffer: emit binding tables always if push constants are dirty

Thu Jun 28 20:30:30 UTC 2018

On Thu, Jun 28, 2018 at 3:58 AM, Iago Toral <itoral at igalia.com> wrote:

> On Thu, 2018-06-28 at 08:47 +0200, Iago Toral wrote:
>
> On Wed, 2018-06-27 at 09:13 -0700, Jason Ekstrand wrote:
>
> On Wed, Jun 27, 2018 at 2:25 AM, Iago Toral <itoral at igalia.com> wrote:
>
> On Tue, 2018-06-26 at 10:59 -0700, Jason Ekstrand wrote:
>
> On Tue, Jun 26, 2018 at 4:08 AM, Iago Toral Quiroga <itoral at igalia.com>
> wrote:
>
> Storage images require to patch push constant stateto work, which happens
> during
> binding table emision. In the scenario where our pipeline and descriptors
> are
> not dirty, we don't re-emit the binding table, however, if our push
> constant
> state is dirty, we will re-emit the push constant state, trashing storage
> image setup.
>
> While that scenario is probably not very likely to happen in practice,
> there
> are some CTS tests that trigger this by clearing storage images and buffers
> and dispatching a compute shader in a loop. The clearing of the images
> and buffers will trigger a blorp execution which will dirty our push
> constant
> state, however, because  we don't alter the descriptors or the compute
> dispatch
> at all in the loop (we are basically execution the same program in a loop),
> our pipeline and descriptor state is not dirty. If the shader uses a
> storage
> image, then any iteration after the first will re-emit push constant state
> without re-emitting binding tables and the storage image will not be
> properly
> setup any more.
>
>
> I don't see why that is a problem.  The only thing flush_descriptor_sets
> does is fill out the binding and sampler tables and fill in the push
> constant data for storage images/buffers.  The actual HW packets are filled
> out by flush_push_constants and emit_descriptor_pointers.  Yes, blorp
> trashes our descriptor pointers but the descriptor sets should be fine.
> For push constants, it does emit 3DSTATE_CONSTANT_* but it doesn't actually
> modify anv_cmd_state::push_constants.
>
> Are secondary command buffers involved?  I could see something funny going
> on with those.
>
>
> No, no secondaries are involved. I did some more investigation and I think
> my explanation of the problem was not good, this is what is really
> happening:
>
> First, I found the problem in the compute pipeline and I only extended the
> fix to the graphics pipeline because it looked like the same rationale
> would apply, so I'll explain what happens in compute and then we can
> discuss whether the same problem applies to graphics.
>
> The test does something like this:
>
> for (...) {
> clear ssbos / storage images
> dispatch compute
> }
>
> The first iteration of this loop will find that the compute pipeline and
> descriptors are dirty and proceed to emit binding tables. We have storage
> images, so during that process the push constant buffer is amended to
> include storage images. Specifically, we call anv_cmd_buffer_ensure_push_constants_size()
> for the images field. This gives us a size of 624.
>
> We move on to the second iteration of the loop. When we clear images and
> ssbos via blorp, we again mark the push constant buffer as dirty. Now we
> execute the compute dispatch and the first thing we do there is
> anv_cmd_buffer_push_base_group_id() which calls
> anv_cmd_buffer_ensure_push_constants_size() for the base group id, which
> gives as a size of 144. This is smaller than what we computed in the
> previous iteration, because we haven't called the same function for the
> images field yet. Unfortunately, we will never call that again, because we
> only do that during binding table emission and we only do that if the
> compute pipeline is dirty (it is not) or our descriptors are dirty (they
> are not). So we don't re-emit binding table and we don't ensure push
> constant space for the image data, but because we come from a blorp
> execution our push constant dirty flag is true, so we re-emit push constant
> data, only that this time we won't emit the push constant data we need for
> the storage images, which leads to the problem.
>
>
> The intention has always been that anv_cmd_buffer_ensure_push_constants_size
> would only ever grow the push constants and never shrink them.  The most
> obvious bug is in anv_cmd_buffer_ensure_push_constants_size.
>
>
> I thought that maybe making anv_cmd_buffer_ensure_push_constants_size()
> only update the size if we alloc or realloc would fix this, but that can
> cause GPU hangs in some cases when I run multiple tests in parallel, so I
> guess it isn't that simple.
>
>
> Ugh...  that makes things more interesting.  That does look like the right
> fix and now I'm wondering why it leads to a hang.
>
> In the compute case, flush_compute_descriptor_sets emits
> MEDIA_INTERFACE_DESCRIPTOR_LOAD.  My feeling is that not emitting that
> packet is the real bug.  In GL, we just re-emit all 4 compute packets all
> the time and don't try to track dirty bits.  I had patches to do that in
> Vulkan somewhere.  I rebased them and pushed them here:
>
> https://gitlab.freedesktop.org/jekstrand/mesa/tree/wip/anv-cs-packets
>
> Is that plus your patch to fix ensure_push_constants_size() enough to fix
> the bug?  If so, then I think what's going on is that we have to re-emit
> all the compute packets after a switch from 3D to compute and we're not
> doing that.
>
>
> No, with this we still run into a GPU hang some times and a bunch of tests
> keep failing consistently. The annoying part is that all this only happens
> when we run the tests in group (i.e. deqp-vk -n dEQP-VK.path.to.tests.*).
> If I run any of the failing tests alone (with our without your patches, but
> with the patch for ensure_push_constants_size()), they always pass. I'll
> see if I can find out more about what is happening.
>
>
> Ok, I think I know what the problem is. The loop I mention above has 200
> iterations and the problem starts showing up only after iteration 31. With
> 31 or less the tests pss just fine (with or without your patches but with
> the patch for ensure_push_constants_size()). So that got me thinking that
> something was growing too big and that was causing our binding table to be
> invalidated.
>
> I think the problem is that we allocate too many binding tables (each
> iteration of the loop emits blorp clears, and each blorp clear allocates
> binding tables), so there is a moment where blorp tries to allocate a new
> binding table and we run out of space in the block, so we hit the fallback
> where we need to allocate a new block *and* we call
> anv_cmd_buffer_emit_state_base_address(), which I understand is going to
> make the binding table used with the compute pipeline (which we only emit
> once in the first iteration of the loop), invalid, since it is was
> allocated in the other block.
>
> So what I think we need to do is mark descriptors as dirty every time we
> grab a new binding table block and emit a new state base address. FWIW,
> adding "cmd_buffer->state.descriptors_dirty |= ~0" to
> anv_cmd_buffer_alloc_blorp_binding_table() when we need to allocate a new
> block fixes the problem. I think we would need to do this every time we
> allocate a new binding table block, not just for blorp allocations.
>
> Does this make sense?
>

Yes, much more so.  Thanks for digging deeper!  We should flag
descriptors_dirty for everything whenever we re-emit STATE_BASE_ADDRESS or
something like that.

--Jason

> Iago
>
> --Jason
>
>
>
> I hope I am making more sense now.
>
> Iago
>
> --Jason
>
>
>
> Fixes multiple failures in some new CTS tests.
> ---
>  src/intel/vulkan/genX_cmd_buffer.c | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/src/intel/vulkan/genX_cmd_buffer.c
> b/src/intel/vulkan/genX_cmd_buffer.c
> index 97b321ccaeb..6e48aaedb9b 100644
> --- a/src/intel/vulkan/genX_cmd_buffer.c
> +++ b/src/intel/vulkan/genX_cmd_buffer.c
> @@ -2554,7 +2554,8 @@ genX(cmd_buffer_flush_state)(struct anv_cmd_buffer
> *cmd_buffer)
>      * 3DSTATE_BINDING_TABLE_POINTER_* for the push constants to take
> effect.
>      */
>     uint32_t dirty = 0;
> -   if (cmd_buffer->state.descriptors_dirty)
> +   if (cmd_buffer->state.descriptors_dirty ||
> +       cmd_buffer->state.push_constants_dirty)
>        dirty = flush_descriptor_sets(cmd_buffer);
>
>     if (dirty || cmd_buffer->state.push_constants_dirty) {
> @@ -2988,7 +2989,13 @@ genX(cmd_buffer_flush_compute_state)(struct
> anv_cmd_buffer *cmd_buffer)
>        anv_batch_emit_batch(&cmd_buffer->batch, &pipeline->batch);
>     }
>
> +   /* Storage images require push constant data, which is setup during the
> +    * binding table emision. If we have dirty push constants, we need to
> +    * re-emit the binding table so we get the push constant storage image
> setup
> +    * done, otherwise we trash it when we emit push constants below.
> +    */
>     if ((cmd_buffer->state.descriptors_dirty &
> VK_SHADER_STAGE_COMPUTE_BIT) ||
> +       (cmd_buffer->state.push_constants_dirty &
> VK_SHADER_STAGE_COMPUTE_BIT) ||
>         cmd_buffer->state.compute.pipeline_dirty) {
>        /* FIXME: figure out descriptors for gen7 */
>        result = flush_compute_descriptor_set(cmd_buffer);
> --
> 2.14.1
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20180628/80c32f3f/attachment-0001.html>