<html><head></head><body><div>On Thu, 2018-06-28 at 08:47 +0200, Iago Toral wrote:</div><blockquote type="cite" style="margin:0 0 0 .8ex; border-left:2px #729fcf solid;padding-left:1ex"><div>On Wed, 2018-06-27 at 09:13 -0700, Jason Ekstrand wrote:</div><blockquote type="cite" style="margin:0 0 0 .8ex; border-left:2px #729fcf solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Wed, Jun 27, 2018 at 2:25 AM, Iago Toral <span dir="ltr"><<a href="mailto:itoral@igalia.com" target="_blank">itoral@igalia.com</a>></span> wrote:<br><blockquote type="cite" style="margin:0 0 0 .8ex; border-left:2px #729fcf solid;padding-left:1ex"><div><span class="gmail-"><div>On Tue, 2018-06-26 at 10:59 -0700, Jason Ekstrand wrote:</div><blockquote type="cite" style="margin:0 0 0 .8ex; border-left:2px #729fcf solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Tue, Jun 26, 2018 at 4:08 AM, Iago Toral Quiroga <span dir="ltr"><<a href="mailto:itoral@igalia.com" target="_blank">itoral@igalia.com</a>></span> wrote:<br><blockquote type="cite" style="margin:0 0 0 .8ex; border-left:2px #729fcf solid;padding-left:1ex">Storage images require to patch push constant stateto work, which happens during<br>
binding table emision. In the scenario where our pipeline and descriptors are<br>
not dirty, we don't re-emit the binding table, however, if our push constant<br>
state is dirty, we will re-emit the push constant state, trashing storage<br>
image setup.<br>
<br>
While that scenario is probably not very likely to happen in practice, there<br>
are some CTS tests that trigger this by clearing storage images and buffers<br>
and dispatching a compute shader in a loop. The clearing of the images<br>
and buffers will trigger a blorp execution which will dirty our push constant<br>
state, however, because we don't alter the descriptors or the compute dispatch<br>
at all in the loop (we are basically execution the same program in a loop),<br>
our pipeline and descriptor state is not dirty. If the shader uses a storage<br>
image, then any iteration after the first will re-emit push constant state<br>
without re-emitting binding tables and the storage image will not be properly<br>
setup any more.<br></blockquote><div><br></div><div>I don't see why that is a problem. The only thing flush_descriptor_sets does is fill out the binding and sampler tables and fill in the push constant data for storage images/buffers. The actual HW packets are filled out by flush_push_constants and emit_descriptor_pointers. Yes, blorp trashes our descriptor pointers but the descriptor sets should be fine. For push constants, it does emit 3DSTATE_CONSTANT_* but it doesn't actually modify anv_cmd_state::push_constants.</div><div><br></div><div>Are secondary command buffers involved? I could see something funny going on with those.</div></div></div></div></blockquote><div><br></div></span><div>No, no secondaries are involved. I did some more investigation and I think my explanation of the problem was not good, this is what is really happening:</div><div><br></div><div>First, I found the problem in the compute pipeline and I only extended the fix to the graphics pipeline because it looked like the same rationale would apply, so I'll explain what happens in compute and then we can discuss whether the same problem applies to graphics.</div><div><br></div><div>The test does something like this:</div><div><br></div><div>for (...) {</div><div> clear ssbos / storage images</div><div> dispatch compute</div><div>}</div><div><br></div><div>The first iteration of this loop will find that the compute pipeline and descriptors are dirty and proceed to emit binding tables. We have storage images, so during that process the push constant buffer is amended to include storage images. Specifically, we call anv_cmd_buffer_ensure_push_<wbr>constants_size() for the images field. This gives us a size of 624.</div><div><br></div><div>We move on to the second iteration of the loop. When we clear images and ssbos via blorp, we again mark the push constant buffer as dirty. Now we execute the compute dispatch and the first thing we do there is anv_cmd_buffer_push_base_<wbr>group_id() which calls anv_cmd_buffer_ensure_push_<wbr>constants_size() for the base group id, which gives as a size of 144. This is smaller than what we computed in the previous iteration, because we haven't called the same function for the images field yet. Unfortunately, we will never call that again, because we only do that during binding table emission and we only do that if the compute pipeline is dirty (it is not) or our descriptors are dirty (they are not). So we don't re-emit binding table and we don't ensure push constant space for the image data, but because we come from a blorp execution our push constant dirty flag is true, so we re-emit push constant data, only that this time we won't emit the push constant data we need for the storage images, which leads to the problem.</div></div><br></blockquote><div><br></div><div>The intention has always been that anv_cmd_buffer_ensure_push_constants_size would only ever grow the push constants and never shrink them. The most obvious bug is in anv_cmd_buffer_ensure_push_constants_size.<br></div><div> </div><blockquote type="cite" style="margin:0 0 0 .8ex; border-left:2px #729fcf solid;padding-left:1ex"><div><div>I thought that maybe making anv_cmd_buffer_ensure_push_<wbr>constants_size() only update the size if we alloc or realloc would fix this, but that can cause GPU hangs in some cases when I run multiple tests in parallel, so I guess it isn't that simple.<br></div></div><br></blockquote><div><br></div><div>Ugh... that makes things more interesting. That does look like the right fix and now I'm wondering why it leads to a hang.</div><div><br></div><div>In the compute case, flush_compute_descriptor_sets emits MEDIA_INTERFACE_DESCRIPTOR_LOAD. My feeling is that not emitting that packet is the real bug. In GL, we just re-emit all 4 compute packets all the time and don't try to track dirty bits. I had patches to do that in Vulkan somewhere. I rebased them and pushed them here:</div><div><br></div><div><a href="https://gitlab.freedesktop.org/jekstrand/mesa/tree/wip/anv-cs-packets">https://gitlab.freedesktop.org/jekstrand/mesa/tree/wip/anv-cs-packets</a></div><div><br></div><div>Is that plus your patch to fix ensure_push_constants_size() enough to fix the bug? If so, then I think what's going on is that we have to re-emit all the compute packets after a switch from 3D to compute and we're not doing that.</div></div></div></div></blockquote><div><br></div><div>No, with this we still run into a GPU hang some times and a bunch of tests keep failing consistently. The annoying part is that all this only happens when we run the tests in group (i.e. deqp-vk -n dEQP-VK.path.to.tests.*). If I run any of the failing tests alone (with our without your patches, but with the patch for ensure_push_constants_size()), they always pass. I'll see if I can find out more about what is happening.</div><div><br></div></blockquote><div><br></div><div>Ok, I think I know what the problem is. The loop I mention above has 200 iterations and the problem starts showing up only after iteration 31. With 31 or less the tests pss just fine (with or without your patches but with the patch for ensure_push_constants_size()). So that got me thinking that something was growing too big and that was causing our binding table to be invalidated.</div><div><br></div><div>I think the problem is that we allocate too many binding tables (each iteration of the loop emits blorp clears, and each blorp clear allocates binding tables), so there is a moment where blorp tries to allocate a new binding table and we run out of space in the block, so we hit the fallback where we need to allocate a new block *and* we call anv_cmd_buffer_emit_state_base_address(), which I understand is going to make the binding table used with the compute pipeline (which we only emit once in the first iteration of the loop), invalid, since it is was allocated in the other block.</div><div><br></div><div>So what I think we need to do is mark descriptors as dirty every time we grab a new binding table block and emit a new state base address. FWIW, adding "cmd_buffer->state.descriptors_dirty |= ~0" to anv_cmd_buffer_alloc_blorp_binding_table() when we need to allocate a new block fixes the problem. I think we would need to do this every time we allocate a new binding table block, not just for blorp allocations.</div><div><br></div><div>Does this make sense?</div><div><br></div><div>Iago</div><div><br></div><blockquote type="cite" style="margin:0 0 0 .8ex; border-left:2px #729fcf solid;padding-left:1ex"><blockquote type="cite" style="margin:0 0 0 .8ex; border-left:2px #729fcf solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div>--Jason</div><div><br></div><div> </div><blockquote type="cite" style="margin:0 0 0 .8ex; border-left:2px #729fcf solid;padding-left:1ex"><div><div></div><div>I hope I am making more sense now.</div><span class="gmail-HOEnZb"><font color="#888888"><div><br></div><div>Iago</div></font></span><div><div class="gmail-h5"><div><br></div><blockquote type="cite" style="margin:0 0 0 .8ex; border-left:2px #729fcf solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div>--Jason</div><div><br></div><div> </div><blockquote type="cite" style="margin:0 0 0 .8ex; border-left:2px #729fcf solid;padding-left:1ex">
Fixes multiple failures in some new CTS tests.<br>
---<br>
src/intel/vulkan/genX_cmd_buffer.c | 9 ++++++++-<br>
1 file changed, 8 insertions(+), 1 deletion(-)<br>
<br>
diff --git a/src/intel/vulkan/genX_cmd_buffer.c b/src/intel/vulkan/genX_cmd_buffer.c<br>
index 97b321ccaeb..6e48aaedb9b 100644<br>
--- a/src/intel/vulkan/genX_cmd_buffer.c<br>
+++ b/src/intel/vulkan/genX_cmd_buffer.c<br>
@@ -2554,7 +2554,8 @@ genX(cmd_buffer_flush_state)(struct anv_cmd_buffer *cmd_buffer)<br>
* 3DSTATE_BINDING_TABLE_POINTER_* for the push constants to take effect.<br>
*/<br>
uint32_t dirty = 0;<br>
- if (cmd_buffer->state.descriptors_dirty)<br>
+ if (cmd_buffer->state.descriptors_dirty ||<br>
+ cmd_buffer->state.push_constants_dirty)<br>
dirty = flush_descriptor_sets(cmd_buffer);<br>
<br>
if (dirty || cmd_buffer->state.push_constants_dirty) {<br>
@@ -2988,7 +2989,13 @@ genX(cmd_buffer_flush_compute_state)(struct anv_cmd_buffer *cmd_buffer)<br>
anv_batch_emit_batch(&cmd_buffer->batch, &pipeline->batch);<br>
}<br>
<br>
+ /* Storage images require push constant data, which is setup during the<br>
+ * binding table emision. If we have dirty push constants, we need to<br>
+ * re-emit the binding table so we get the push constant storage image setup<br>
+ * done, otherwise we trash it when we emit push constants below.<br>
+ */<br>
if ((cmd_buffer->state.descriptors_dirty & VK_SHADER_STAGE_COMPUTE_BIT) ||<br>
+ (cmd_buffer->state.push_constants_dirty & VK_SHADER_STAGE_COMPUTE_BIT) ||<br>
cmd_buffer->state.compute.pipeline_dirty) {<br>
/* FIXME: figure out descriptors for gen7 */<br>
result = flush_compute_descriptor_set(cmd_buffer);<br>
<span class="gmail-m_-1675126762955026495HOEnZb"><font color="#888888">-- <br>
2.14.1<br>
<br>
</font></span></blockquote></div><br></div></div>
</blockquote></div></div></div></blockquote></div><br></div></div>
</blockquote></blockquote></body></html>