[Bug 89580] Implement a NIR -> vec4 pass

Thu Jun 18 01:43:19 PDT 2015

https://bugs.freedesktop.org/show_bug.cgi?id=89580

--- Comment #23 from Iago Toral <itoral at igalia.com> ---
Jason, about the stuff with uniforms that has been bugging me:

The thing is that the vec4 backend and the nir_lower_io pass do not seem to
work very well together for uniforms. The reason for this is that the vec4
backend assumes that uniform elements are vec4-sized while nir_lower_io
(specifically, the get_io_offset function) doesn't. This creates a problem
because the offsets we compute in nir are not in the same units than what the
vec4 backend expects.

The get_io_offset function accumulates the constant part of the offset into the
uniform we are accessing (the base offset) and then emits code to compute the
variable indexing part (if we have non-constant array access). Both use a
function type_size() that does not measure sizes in units of vec4 like the
current vec4 implementation does. This creates a problem, specifically when we
have uniform arrays with indirect accesses, since the way the vec4 backend
handles these assumes that each uniform element is vec4-sized, padding it if it
isn't.

Initially, I fixed this by patching get_io_offset so that as soon as it was
computing an offset into something that had indirect indexing in a vertex
shader, it would use vec4 units to compute sizes all over get_io_offset. That
solution worked fine, but it did not look very clean to me (I imagine we do not
want NIR to deal with this sort of things, since the decision to pad uniform
elements to vec4 sizes seems like a backend thing to me and other drivers could
decide to not do that), so I was trying to find a different solution.

The current implementation manages to get away without patching get_io_offset
(for the most part) by handling the difference between what it needs and what
nir_lower_io delivers in the nir_vec4 backend. The only situation where we
can't do this is when we have indirect indexing into an array (more on that
later). This solution is cleaner in the sense that it does not have to modify
nir as much (only when indirect indexing is involved), but at the same time it
feels like the nir_vec4 backend needs to do extra work only because
nir_lower_io is not doing exactly what it needs, so that does not feel
completely right to me, but maybe it is unavoidable.

If indirect indexing into arrays is present, however, we still need to patch
get_io_offset, because in that case this function emits code directly to
multiply the indirect index by the size of the array elements, which we need in
units of vec4.

I guess this diff with the changes we have right now to nir_lower_io.c should
make the issue we are dealing with here clear:

http://pastebin.com/iqGSabsv

Therefore, lately I have been thinking that the only way to avoid this is to
have the vec4 backend work without assuming that uniform elements are
vec4-sized.

I started a quick experiment to see what this would involve. Basically, I
modified things so that uniform_size[] does not compute sizes in vec4 units,
then modified the uniform setup code so that we don't pad uniforms elements
smaller than a vec4 size with 0s. This also required small changes in
move_uniform_array_access_to_pull_constants because that also assumes uniform 
values padded to a vec4 size of course, and I think we would also need to
modify brw_wm_surface_state.c so that we don't create constant surfaces for
uniforms with BRW_SURFACEFORMAT_R32G32B32A32_FLOAT. Then I noticed that the
sampler messages we use to pull uniform array data with variable index also
expect uniform data padded to a vec4 size since they use oword offsets (so I
hacked this to use unaligned messages instead that take a byte offset), then I
realized that uniforms loaded as push constants also need to be padded to a
vec4 size, I am still not sure about the implications of changing this... and I
am sure there will be plenty of other things that would need to be fixed. The
bottom line is that going for this looks like a rather big change and a
significant departure from what the current vec4_visitor does, which means that
even if we do this successfully, maintaining both implementations until we can
get rid of the old vec4_visitor would probably be a pain.

So at this point I am not sure about the best approach to follow, I think our
current patch for nir_lower_io with indirect array accesses is a bit ugly, but
at the same  time it is small, quite self-contained and it works, the
alternative to this seems to be a large change that would make the nir-vec4
backend quite different to the current vec4_visitor and that also looks
troublesome (and I guess it would also take some time to get working), so I am
not sure if it is a good idea right now or if there is a better solution to
this problem that I am missing.

What do you think?

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/intel-3d-bugs/attachments/20150618/e9516511/attachment-0001.html>