[Bug 105695] [PERF] Updating ubo offset via vkCmdBindDescriptorSets is causing flush that is taking 50% of rendering time

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Sat Mar 24 00:35:35 UTC 2018


https://bugs.freedesktop.org/show_bug.cgi?id=105695

Jason Ekstrand <jason at jlekstrand.net> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |NEEDINFO

--- Comment #1 from Jason Ekstrand <jason at jlekstrand.net> ---
(In reply to Vyacheslav from comment #0)
> I do vkCmdBindDescriptorSets per draw call to change transformation matrix
> of an object. Basically I change only one number - offset into dynamic ubo.
> Then I call vkCmdDrawIndexed and it calls gen9_cmd_buffer_flush_state that
> amounts to 50% of rendering time (in terms of instr fetch metric). I dig
> deeper and find two culprits: flush_descriptor_sets,
> cmd_buffer_flush_push_constants.

>From your cachegrind profile, it appears that you have a build of the driver
with asserts enabled.  Please try with an fully optimized driver build and see
how the performance looks.  We have a *lot* of asserts in our driver (every
field of every hardware packet has bounds-checking for instance) and enabling
them will kill performance.

> And I don't even use push constants, I
> prefer dynamic ubos. Emitting binding tables is huge amount of work (23% of
> total rendering time). I don't understand why so much work has to be done
> just to change offset in memory.

As far as push constants go, you are getting them whether you meant to or not. 
We push chunks of UBOs when we can and it significantly helps UBO performance
in most cases.

> And this is the most popular usecase - everyone wants to change matrix per
> object.

Yes and no.  If you really need to be changing matrices that often, there are
other mechanisms that are more efficient.  Frequently engines will do a large
draw (many vertices) with a single UBO with an array of matrices and index into
the array something they pass in as a vertex attribute.  Doing thousands of
back-to-back draws with descriptor sets re-binds in-between is basically a CPU
overhead micro-benchmark.

> My opengl implementation is 2
> times faster than this. Are there any plans on improving performance in that
> area?

That definitely shouldn't be the case. :-)  I suspect this would change if you
ran with a properly optimized build.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/intel-3d-bugs/attachments/20180324/60f1770b/attachment.html>


More information about the intel-3d-bugs mailing list