<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Sun, Apr 22, 2018 at 11:31 AM, Iago Toral <span dir="ltr"><<a href="mailto:itoral@igalia.com" target="_blank">itoral@igalia.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><span class=""><div>On Fri, 2018-04-20 at 17:16 -0700, Jason Ekstrand wrote:</div><blockquote type="cite" style="margin:0 0 0 .8ex;border-left:2px #729fcf solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Fri, Apr 20, 2018 at 5:16 AM, Nicolai Hähnle <span dir="ltr"><<a href="mailto:nhaehnle@gmail.com" target="_blank">nhaehnle@gmail.com</a>></span> wrote:<br><blockquote type="cite" style="margin:0 0 0 .8ex;border-left:2px #729fcf solid;padding-left:1ex"><div class="m_-2936274396533937738HOEnZb"><div class="m_-2936274396533937738h5">On 20.04.2018 10:21, Iago Toral wrote:<br>
<blockquote type="cite" style="margin:0 0 0 .8ex;border-left:2px #729fcf solid;padding-left:1ex">
Hi,<br>
<br>
while developing support for Vulkan shaderInt16 on Anvil I came across<br>
a feature of NIR that was a bit inconvenient: bools are always 32-bit<br>
by design, but the Intel hardware produces 16-bit bool results for 16-<br>
bit comparisons, so that creates a problem that manifests like this:<br>
<br>
vec1 32 ssa_21 = fge ssa_20, ssa_16<br>
vec1 16 ssa_22 = b2f ssa_21<br></blockquote></div></div><br></blockquote><div><br></div><div>I was thinking about this a bit this morning and it gets even more sticky. What happens if you have<br><br></div><div>bool e = (a < b) && (c < d);<br><br></div><div>where a and b are 16-bit and c and d are 32-bit? In this case, one comprison has a 32-bit value and one has a 16-bit value and you have to pick one for the &&.<br></div></div></div></div></blockquote><div><br></div></span><div>Good point, yes, it seems we would need to handle this somehow. Topi seems to try and do something about this in the patch he referenced in his reply. Alternatively, we can also run a specific pass to fix this up...</div></div></blockquote><div><br></div><div>Yeah. For now, just making the 16-bit comparisons return 32 bits will get things working. If we want to write a pass to assign "optimal" bit sizes later, that wouldn't be too hard. However, it would be tricky since you have to take both sources and destinations into account and handle mixed bit sizes.<br><br></div><div>--Jason<br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><div><div class="h5"><blockquote type="cite" style="margin:0 0 0 .8ex;border-left:2px #729fcf solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><blockquote type="cite" style="margin:0 0 0 .8ex;border-left:2px #729fcf solid;padding-left:1ex"><div class="m_-2936274396533937738HOEnZb"><div class="m_-2936274396533937738h5"><blockquote type="cite" style="margin:0 0 0 .8ex;border-left:2px #729fcf solid;padding-left:1ex">
Our CMP instruction will produce a 16-bit boolean result for the first<br>
NIR instruction (where NIR expects it to be 32-bit), so by the time we<br>
emit the second instruction in the driver the bit-size for the operand<br>
of b2f provided by NIR no longer matches the reality and we emit<br>
incorrect code.<br>
<br>
This seems to have been a consicious design choice in NIR, and while<br>
discussing this with Jason he was unsure how much we wanted to change<br>
this or how to do it, given how thoroughly 32-bit bools are baked into<br>
NIR and the complexities that modifying this would also bring to our<br>
bit-size validation code.<br>
<br>
I have been considering alternatives that didn't involve changing NIR<br>
to support multiple bit-sizes for booleans:<br>
<br>
1) Drivers that need to emit smaller booleans could try to fix the<br>
generated NIR by correcting the expected bit-sizes for CMP<br>
instructions. This would be rather trivial to implement in drivers (and<br>
maybe we could even make a generic pass for other drivers to use if<br>
they need it) but this will make the validator complain because it<br>
won't recognize comparisons with 16-bit bool outputs as valid NIR<br>
opcodes. I also found instances where nir_search would complain about<br>
mismatching bit-sizes. I haven't looked any further into it yet though,<br>
so maybe we can reasonably work around these issues.<br>
<br>
2) Drivers could handle this specially when they emit code from NIR.<br>
Specifically, when they see a 32-bit boolean source in an instruction,<br>
they would have to search for the instruction that produced that source<br>
value and check whether it is a 16-bit or a 32-bit boolean to emit<br>
proper code for the instruction.<br>
<br>
3) Drivers can just convert the 16-bit bool result they generate for<br>
16-bit cmp to the 32-bit bool that NIR expects, and then possibly run<br>
an optimization pass to eliminate these extra conversions and fix up<br>
the code accordingly.<br>
<br></blockquote>
<br></div></div>
radeonsi(NIR) and radv already use option 3, since GCN hardware really wants to treat bools as 1-bit value, so that's what I'd suggest. The optimizations that cleanup the conversions happen in LLVM for us.<br></blockquote><div><br></div><div>Is this a GCN thing or an LLVM thing? It would be neat if your hardware had 1-bit registers. :-) We sort-of do but they're special flag registers and we have very few of them.<br></div><br></div><div class="gmail_quote">--Jason<br></div></div></div>
</blockquote></div></div></div></blockquote></div><br></div></div>