[Bug 108787] [BSW] Mesa "total_needs <= urb_chunks" abort in GfxBench CarChase startup

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Fri Feb 15 01:47:18 UTC 2019


https://bugs.freedesktop.org/show_bug.cgi?id=108787

--- Comment #6 from Kenneth Graunke <kenneth at whitecape.org> ---
I'm not positive about that - it may have regressed.  There's a fundamental bug
which isn't a regression, but this app may have gotten worse.

So, what's going on...is that the program enables SLM, which drops the URB size
to  64kB on Cherryview or Broadwell GT1 systems.  We also always reserve 32kB
of space for push constants, which comes from the URB area, leaving us with
only 32kB of URB, which is very small.  The app also uses tessellation at the
same time, which on Gen8 bumps the minimum number of VS URB entries to 192, due
to a HW workaround.   And, the VUE size is fairly large, at 128 (bytes?).  With
the 8kB granularity, we end up needing 9 chunks of URB space, and we can only
offer 8 chunks.  So we are unable to meet the minimum requirements for the
program.

It may make sense to reduce the push constant space when the URB is small, or
possibly just for CHV/BDW-GT1 parts.  But, we also have VS, HS, DS, and PS
competing for that space, so reducing that space could hurt as well...

As to why this might be a regression.  It's possible that some compiler changes
ended up increasing the VUE sizes, due to extra varyings between stages.  This
would push it over the limit, when the app would have worked before.  So, that
would be a regression.  But, optimizing that again would not be a full fix,
because we can certainly write a test case to hit this path.

Another last thought.  There's no way that SLM and tessellation can both be
required at once.  SLM is only needed for compute shaders.  We try to avoid
changing the L3 configuration mid-batch because it's expensive, so we're
probably sticking with the SLM-enabled config when we ought to just switch back
to one with the full URB size.  This means that any i965 patch which affects
the command stream may move flush points such that compute lands in the same
batch as this expensive tessellation draw, and didn't before, triggering the
issue.

It may make sense to consider tessellation being enabled on BDW-GT1/CHV as a
good enough reason to do the expensive transition, because of the extra 192
entry workaround.  (Apollolake doesn't have that limitation, and it appears to
fit just fine.)

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/intel-3d-bugs/attachments/20190215/4044e317/attachment-0001.html>


More information about the intel-3d-bugs mailing list