<div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div class="gmail_quote"><div dir="ltr">On Tue, 4 Dec 2018 at 21:57, Bas Nieuwenhuizen <<a href="mailto:bas@basnieuwenhuizen.nl" target="_blank">bas@basnieuwenhuizen.nl</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On Tue, Dec 4, 2018 at 4:52 PM Samuel Pitoiset<br> <<a href="mailto:samuel.pitoiset@gmail.com" target="_blank">samuel.pitoiset@gmail.com</a>> wrote:<br> ><br> > Because WAIT_REG_MEM can only wait for a 32-bit value, it's not<br> > safe to use it for timestamp queries. If we only wait on the low<br> > 32 bits of a timestamp query we could be unlucky and the GPU<br> > might hang.<br> ><br> > One possible fix is to emit a full end of pipe event and wait<br> > on a 32-bit value which is actually an availability bit. This<br> > bit is allocated at creation time and always cleared before<br> > emitting the EOP event.<br> ><br> > Bugzilla: <a href="https://bugs.freedesktop.org/show_bug.cgi?id=108925" rel="noreferrer" target="_blank">https://bugs.freedesktop.org/show_bug.cgi?id=108925</a><br> > Fixes: 5d6a560a29 ("radv: do not use the availability bit for timestamp queries")<br> > Signed-off-by: Samuel Pitoiset <<a href="mailto:samuel.pitoiset@gmail.com" target="_blank">samuel.pitoiset@gmail.com</a>><br> > ---<br> > src/amd/vulkan/radv_query.c | 49 +++++++++++++++++++++++++++++++------<br> > 1 file changed, 41 insertions(+), 8 deletions(-)<br> ><br> > diff --git a/src/amd/vulkan/radv_query.c b/src/amd/vulkan/radv_query.c<br> > index 550abe307a1..9bb6b660add 100644<br> > --- a/src/amd/vulkan/radv_query.c<br> > +++ b/src/amd/vulkan/radv_query.c<br> > @@ -1056,8 +1056,15 @@ VkResult radv_CreateQueryPool(<br> > pool->pipeline_stats_mask = pCreateInfo->pipelineStatistics;<br> > pool->availability_offset = pool->stride * pCreateInfo->queryCount;<br> > pool->size = pool->availability_offset;<br> > - if (pCreateInfo->queryType == VK_QUERY_TYPE_PIPELINE_STATISTICS)<br> > + if (pCreateInfo->queryType == VK_QUERY_TYPE_PIPELINE_STATISTICS) {<br> > pool->size += 4 * pCreateInfo->queryCount;<br> > + } else if (pCreateInfo->queryType == VK_QUERY_TYPE_TIMESTAMP) {<br> > + /* Allocate one DWORD for the availability bit which is needed<br> > + * for vkCmdCopyQueryPoolResults() because we can't perform a<br> > + * WAIT_REG_MEM on a 64-bit value.<br> > + */<br> > + pool->size += 4;<br> > + }<br> ><br> > pool->bo = device->ws->buffer_create(device->ws, pool->size,<br> > 64, RADEON_DOMAIN_GTT, RADEON_FLAG_NO_INTERPROCESS_SHARING);<br> > @@ -1328,19 +1335,45 @@ void radv_CmdCopyQueryPoolResults(<br> > pool->availability_offset + 4 * firstQuery);<br> > break;<br> > case VK_QUERY_TYPE_TIMESTAMP:<br> > + if (flags & VK_QUERY_RESULT_WAIT_BIT) {<br> > + /* Emit a full end of pipe event because we can't<br> > + * perform a WAIT_REG_MEM on a 64-bit value. If we only<br> > + * do a WAIT_REG_MEM on the low 32 bits of a timestamp<br> > + * query we could be unlucky and the GPU might hang.<br> > + */<br> > + enum chip_class chip = cmd_buffer->device->physical_device->rad_info.chip_class;<br> > + bool is_mec = radv_cmd_buffer_uses_mec(cmd_buffer);<br> > + uint64_t avail_va = va + pool->availability_offset;<br> > +<br> > + /* Clear the availability bit before waiting on the end<br> > + * of pipe event.<br> > + */<br> > + radeon_emit(cs, PKT3(PKT3_WRITE_DATA, 3, 0));<br> > + radeon_emit(cs, S_370_DST_SEL(V_370_MEM_ASYNC) |<br> > + S_370_WR_CONFIRM(1) |<br> > + S_370_ENGINE_SEL(V_370_ME));<br> > + radeon_emit(cs, avail_va);<br> > + radeon_emit(cs, avail_va >> 32);<br> > + radeon_emit(cs, 0xdeadbeef);<br> > +<br> > + /* Wait for all prior GPU work. */<br> > + si_cs_emit_write_event_eop(cs, chip, is_mec,<br> > + V_028A90_BOTTOM_OF_PIPE_TS, 0,<br> > + EOP_DATA_SEL_VALUE_32BIT,<br> > + avail_va, 0, 1,<br> > + cmd_buffer->gfx9_eop_bug_va);<br> > +<br> > + /* Wait on the timestamp value. */<br> > + radv_cp_wait_mem(cs, WAIT_REG_MEM_EQUAL, avail_va,<br> > + 1, 0xffffffff);<br> > + }<br> > +<br> <br> Can we put this in a separate function? Also, you'll want to allocate<br> the availability bit in the upload buffer, in case there are multiple<br> concurrent command buffers using the same query pool.<br> <br> Alternative solution: look at the upper 32 bits, those definitely<br> should not be 0xfffffff until a far away point in the future.<br></blockquote><div><br></div><div>I just looked into this a bit more, since if the cause of the hang is that the low 32 bits on a valid timestamp are 0xffffffff, it seemed a bit suspicious that it's 100% repro.<div><br></div><div>What's actually happening is that some of the timestamps are being written before vkCmdResetQueryPool completes, so the reset ends up overwriting them back to TIMESTAMP_NOT_READY. I've updated the test case on the bug to map the buffer and check if any results have the low 32 bits as 0xffffffff. When there's a high enough query count to hang without this patch, the results for the first few queries in the pool are coming out as TIMESTAMP_NOT_READY. That doesn't happen with less than 512 queries, and that happens to be the threshold for using the shader path in radv_fill_buffer for the reset.</div></div><div><br></div><div>That's fixed if I add a flush at the end of radv_CmdResetQueryPool. I think that's needed since as far as I can see from the spec the app shouldn't need to do any explicit sync between reset and later query commands. I'll send a patch for that as well.</div><div><br></div><div>Thanks,</div><div>Alex</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> <br> > for(unsigned i = 0; i < queryCount; ++i, dest_va += stride) {<br> > unsigned query = firstQuery + i;<br> > uint64_t local_src_va = va + query * pool->stride;<br> ><br> > MAYBE_UNUSED unsigned cdw_max = radeon_check_space(cmd_buffer->device->ws, cs, 19);<br> ><br> > -<br> > - if (flags & VK_QUERY_RESULT_WAIT_BIT) {<br> > - radv_cp_wait_mem(cs, WAIT_REG_MEM_NOT_EQUAL,<br> > - local_src_va,<br> > - TIMESTAMP_NOT_READY >> 32,<br> > - 0xffffffff);<br> > - }<br> > if (flags & VK_QUERY_RESULT_WITH_AVAILABILITY_BIT) {<br> > uint64_t avail_dest_va = dest_va + elem_size;<br> ><br> > --<br> > 2.19.2<br> ><br> > _______________________________________________<br> > mesa-dev mailing list<br> > <a href="mailto:mesa-dev@lists.freedesktop.org" target="_blank">mesa-dev@lists.freedesktop.org</a><br> > <a href="https://lists.freedesktop.org/mailman/listinfo/mesa-dev" rel="noreferrer" target="_blank">https://lists.freedesktop.org/mailman/listinfo/mesa-dev</a><br> _______________________________________________<br> mesa-dev mailing list<br> <a href="mailto:mesa-dev@lists.freedesktop.org" target="_blank">mesa-dev@lists.freedesktop.org</a><br> <a href="https://lists.freedesktop.org/mailman/listinfo/mesa-dev" rel="noreferrer" target="_blank">https://lists.freedesktop.org/mailman/listinfo/mesa-dev<br></a><br> </blockquote></div></div></div></div></div>