<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Fri, Apr 20, 2018 at 5:16 AM, Nicolai Hähnle <span dir="ltr"><<a href="mailto:nhaehnle@gmail.com" target="_blank">nhaehnle@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="HOEnZb"><div class="h5">On 20.04.2018 10:21, Iago Toral wrote:<br> <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> Hi,<br> <br> while developing support for Vulkan shaderInt16 on Anvil I came across<br> a feature of NIR that was a bit inconvenient: bools are always 32-bit<br> by design, but the Intel hardware produces 16-bit bool results for 16-<br> bit comparisons, so that creates a problem that manifests like this:<br> <br> vec1 32 ssa_21 = fge ssa_20, ssa_16<br> vec1 16 ssa_22 = b2f ssa_21<br></blockquote></div></div></blockquote><div><br></div><div>I was thinking about this a bit this morning and it gets even more sticky. What happens if you have<br><br></div><div>bool e = (a < b) && (c < d);<br><br></div><div>where a and b are 16-bit and c and d are 32-bit? In this case, one comprison has a 32-bit value and one has a 16-bit value and you have to pick one for the &&.<br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="HOEnZb"><div class="h5"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> Our CMP instruction will produce a 16-bit boolean result for the first<br> NIR instruction (where NIR expects it to be 32-bit), so by the time we<br> emit the second instruction in the driver the bit-size for the operand<br> of b2f provided by NIR no longer matches the reality and we emit<br> incorrect code.<br> <br> This seems to have been a consicious design choice in NIR, and while<br> discussing this with Jason he was unsure how much we wanted to change<br> this or how to do it, given how thoroughly 32-bit bools are baked into<br> NIR and the complexities that modifying this would also bring to our<br> bit-size validation code.<br> <br> I have been considering alternatives that didn't involve changing NIR<br> to support multiple bit-sizes for booleans:<br> <br> 1) Drivers that need to emit smaller booleans could try to fix the<br> generated NIR by correcting the expected bit-sizes for CMP<br> instructions. This would be rather trivial to implement in drivers (and<br> maybe we could even make a generic pass for other drivers to use if<br> they need it) but this will make the validator complain because it<br> won't recognize comparisons with 16-bit bool outputs as valid NIR<br> opcodes. I also found instances where nir_search would complain about<br> mismatching bit-sizes. I haven't looked any further into it yet though,<br> so maybe we can reasonably work around these issues.<br> <br> 2) Drivers could handle this specially when they emit code from NIR.<br> Specifically, when they see a 32-bit boolean source in an instruction,<br> they would have to search for the instruction that produced that source<br> value and check whether it is a 16-bit or a 32-bit boolean to emit<br> proper code for the instruction.<br> <br> 3) Drivers can just convert the 16-bit bool result they generate for<br> 16-bit cmp to the 32-bit bool that NIR expects, and then possibly run<br> an optimization pass to eliminate these extra conversions and fix up<br> the code accordingly.<br> </blockquote> <br></div></div> radeonsi(NIR) and radv already use option 3, since GCN hardware really wants to treat bools as 1-bit value, so that's what I'd suggest. The optimizations that cleanup the conversions happen in LLVM for us.<br></blockquote><div><br></div><div>Is this a GCN thing or an LLVM thing? It would be neat if your hardware had 1-bit registers. :-) We sort-of do but they're special flag registers and we have very few of them.<br></div><br></div><div class="gmail_quote">--Jason<br></div></div></div>