[Mesa-dev] [PATCH] anv/cmd_buffer: emit binding tables always if push constants are dirty
Iago Toral
itoral at igalia.com
Thu Jun 28 10:58:28 UTC 2018
On Thu, 2018-06-28 at 08:47 +0200, Iago Toral wrote:
> On Wed, 2018-06-27 at 09:13 -0700, Jason Ekstrand wrote:
> > On Wed, Jun 27, 2018 at 2:25 AM, Iago Toral <itoral at igalia.com>
> > wrote:
> > > On Tue, 2018-06-26 at 10:59 -0700, Jason Ekstrand wrote:
> > > > On Tue, Jun 26, 2018 at 4:08 AM, Iago Toral Quiroga <itoral at iga
> > > > lia.com> wrote:
> > > > > Storage images require to patch push constant stateto work,
> > > > > which happens during
> > > > >
> > > > > binding table emision. In the scenario where our pipeline and
> > > > > descriptors are
> > > > >
> > > > > not dirty, we don't re-emit the binding table, however, if
> > > > > our push constant
> > > > >
> > > > > state is dirty, we will re-emit the push constant state,
> > > > > trashing storage
> > > > >
> > > > > image setup.
> > > > >
> > > > >
> > > > >
> > > > > While that scenario is probably not very likely to happen in
> > > > > practice, there
> > > > >
> > > > > are some CTS tests that trigger this by clearing storage
> > > > > images and buffers
> > > > >
> > > > > and dispatching a compute shader in a loop. The clearing of
> > > > > the images
> > > > >
> > > > > and buffers will trigger a blorp execution which will dirty
> > > > > our push constant
> > > > >
> > > > > state, however, because we don't alter the descriptors or
> > > > > the compute dispatch
> > > > >
> > > > > at all in the loop (we are basically execution the same
> > > > > program in a loop),
> > > > >
> > > > > our pipeline and descriptor state is not dirty. If the shader
> > > > > uses a storage
> > > > >
> > > > > image, then any iteration after the first will re-emit push
> > > > > constant state
> > > > >
> > > > > without re-emitting binding tables and the storage image will
> > > > > not be properly
> > > > >
> > > > > setup any more.
> > > >
> > > > I don't see why that is a problem. The only thing
> > > > flush_descriptor_sets does is fill out the binding and sampler
> > > > tables and fill in the push constant data for storage
> > > > images/buffers. The actual HW packets are filled out by
> > > > flush_push_constants and emit_descriptor_pointers. Yes, blorp
> > > > trashes our descriptor pointers but the descriptor sets should
> > > > be fine. For push constants, it does emit 3DSTATE_CONSTANT_*
> > > > but it doesn't actually modify anv_cmd_state::push_constants.
> > > >
> > > > Are secondary command buffers involved? I could see something
> > > > funny going on with those.
> > >
> > > No, no secondaries are involved. I did some more investigation
> > > and I think my explanation of the problem was not good, this is
> > > what is really happening:
> > >
> > > First, I found the problem in the compute pipeline and I only
> > > extended the fix to the graphics pipeline because it looked like
> > > the same rationale would apply, so I'll explain what happens in
> > > compute and then we can discuss whether the same problem applies
> > > to graphics.
> > >
> > > The test does something like this:
> > >
> > > for (...) {
> > > clear ssbos / storage images
> > > dispatch compute
> > > }
> > >
> > > The first iteration of this loop will find that the compute
> > > pipeline and descriptors are dirty and proceed to emit binding
> > > tables. We have storage images, so during that process the push
> > > constant buffer is amended to include storage images.
> > > Specifically, we call anv_cmd_buffer_ensure_push_constants_size()
> > > for the images field. This gives us a size of 624.
> > >
> > > We move on to the second iteration of the loop. When we clear
> > > images and ssbos via blorp, we again mark the push constant
> > > buffer as dirty. Now we execute the compute dispatch and the
> > > first thing we do there is anv_cmd_buffer_push_base_group_id()
> > > which calls anv_cmd_buffer_ensure_push_constants_size() for the
> > > base group id, which gives as a size of 144. This is smaller than
> > > what we computed in the previous iteration, because we haven't
> > > called the same function for the images field yet. Unfortunately,
> > > we will never call that again, because we only do that during
> > > binding table emission and we only do that if the compute
> > > pipeline is dirty (it is not) or our descriptors are dirty (they
> > > are not). So we don't re-emit binding table and we don't ensure
> > > push constant space for the image data, but because we come from
> > > a blorp execution our push constant dirty flag is true, so we re-
> > > emit push constant data, only that this time we won't emit the
> > > push constant data we need for the storage images, which leads to
> > > the problem.
> >
> > The intention has always been that
> > anv_cmd_buffer_ensure_push_constants_size would only ever grow the
> > push constants and never shrink them. The most obvious bug is in
> > anv_cmd_buffer_ensure_push_constants_size.
> >
> > > I thought that maybe making
> > > anv_cmd_buffer_ensure_push_constants_size() only update the size
> > > if we alloc or realloc would fix this, but that can cause GPU
> > > hangs in some cases when I run multiple tests in parallel, so I
> > > guess it isn't that simple.
> > >
> >
> > Ugh... that makes things more interesting. That does look like
> > the right fix and now I'm wondering why it leads to a hang.
> >
> > In the compute case, flush_compute_descriptor_sets emits
> > MEDIA_INTERFACE_DESCRIPTOR_LOAD. My feeling is that not emitting
> > that packet is the real bug. In GL, we just re-emit all 4 compute
> > packets all the time and don't try to track dirty bits. I had
> > patches to do that in Vulkan somewhere. I rebased them and pushed
> > them here:
> >
> > https://gitlab.freedesktop.org/jekstrand/mesa/tree/wip/anv-cs-packe
> > ts
> >
> > Is that plus your patch to fix ensure_push_constants_size() enough
> > to fix the bug? If so, then I think what's going on is that we
> > have to re-emit all the compute packets after a switch from 3D to
> > compute and we're not doing that.
>
> No, with this we still run into a GPU hang some times and a bunch of
> tests keep failing consistently. The annoying part is that all this
> only happens when we run the tests in group (i.e. deqp-vk -n dEQP-
> VK.path.to.tests.*). If I run any of the failing tests alone (with
> our without your patches, but with the patch for
> ensure_push_constants_size()), they always pass. I'll see if I can
> find out more about what is happening.
>
Ok, I think I know what the problem is. The loop I mention above has
200 iterations and the problem starts showing up only after iteration
31. With 31 or less the tests pss just fine (with or without your
patches but with the patch for ensure_push_constants_size()). So that
got me thinking that something was growing too big and that was causing
our binding table to be invalidated.
I think the problem is that we allocate too many binding tables (each
iteration of the loop emits blorp clears, and each blorp clear
allocates binding tables), so there is a moment where blorp tries to
allocate a new binding table and we run out of space in the block, so
we hit the fallback where we need to allocate a new block *and* we call
anv_cmd_buffer_emit_state_base_address(), which I understand is going
to make the binding table used with the compute pipeline (which we only
emit once in the first iteration of the loop), invalid, since it is was
allocated in the other block.
So what I think we need to do is mark descriptors as dirty every time
we grab a new binding table block and emit a new state base address.
FWIW, adding "cmd_buffer->state.descriptors_dirty |= ~0"
to anv_cmd_buffer_alloc_blorp_binding_table() when we need to allocate
a new block fixes the problem. I think we would need to do this every
time we allocate a new binding table block, not just for blorp
allocations.
Does this make sense?
Iago
> > --Jason
> >
> > > I hope I am making more sense now.
> > >
> > > Iago
> > > > --Jason
> > > >
> > > > > Fixes multiple failures in some new CTS tests.
> > > > >
> > > > > ---
> > > > >
> > > > > src/intel/vulkan/genX_cmd_buffer.c | 9 ++++++++-
> > > > >
> > > > > 1 file changed, 8 insertions(+), 1 deletion(-)
> > > > >
> > > > >
> > > > >
> > > > > diff --git a/src/intel/vulkan/genX_cmd_buffer.c
> > > > > b/src/intel/vulkan/genX_cmd_buffer.c
> > > > >
> > > > > index 97b321ccaeb..6e48aaedb9b 100644
> > > > >
> > > > > --- a/src/intel/vulkan/genX_cmd_buffer.c
> > > > >
> > > > > +++ b/src/intel/vulkan/genX_cmd_buffer.c
> > > > >
> > > > > @@ -2554,7 +2554,8 @@ genX(cmd_buffer_flush_state)(struct
> > > > > anv_cmd_buffer *cmd_buffer)
> > > > >
> > > > > * 3DSTATE_BINDING_TABLE_POINTER_* for the push constants
> > > > > to take effect.
> > > > >
> > > > > */
> > > > >
> > > > > uint32_t dirty = 0;
> > > > >
> > > > > - if (cmd_buffer->state.descriptors_dirty)
> > > > >
> > > > > + if (cmd_buffer->state.descriptors_dirty ||
> > > > >
> > > > > + cmd_buffer->state.push_constants_dirty)
> > > > >
> > > > > dirty = flush_descriptor_sets(cmd_buffer);
> > > > >
> > > > >
> > > > >
> > > > > if (dirty || cmd_buffer->state.push_constants_dirty) {
> > > > >
> > > > > @@ -2988,7 +2989,13 @@
> > > > > genX(cmd_buffer_flush_compute_state)(struct anv_cmd_buffer
> > > > > *cmd_buffer)
> > > > >
> > > > > anv_batch_emit_batch(&cmd_buffer->batch, &pipeline-
> > > > > >batch);
> > > > >
> > > > > }
> > > > >
> > > > >
> > > > >
> > > > > + /* Storage images require push constant data, which is
> > > > > setup during the
> > > > >
> > > > > + * binding table emision. If we have dirty push
> > > > > constants, we need to
> > > > >
> > > > > + * re-emit the binding table so we get the push constant
> > > > > storage image setup
> > > > >
> > > > > + * done, otherwise we trash it when we emit push
> > > > > constants below.
> > > > >
> > > > > + */
> > > > >
> > > > > if ((cmd_buffer->state.descriptors_dirty &
> > > > > VK_SHADER_STAGE_COMPUTE_BIT) ||
> > > > >
> > > > > + (cmd_buffer->state.push_constants_dirty &
> > > > > VK_SHADER_STAGE_COMPUTE_BIT) ||
> > > > >
> > > > > cmd_buffer->state.compute.pipeline_dirty) {
> > > > >
> > > > > /* FIXME: figure out descriptors for gen7 */
> > > > >
> > > > > result = flush_compute_descriptor_set(cmd_buffer);
> > > > >
> > > > > --
> > > > >
> > > > > 2.14.1
> > > > >
> > > > >
> > > > >
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20180628/3a7fb2b0/attachment-0001.html>
More information about the mesa-dev
mailing list