<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Mon, Jan 11, 2016 at 10:04 PM, Connor Abbott <span dir="ltr"><<a href="mailto:cwabbott0@gmail.com" target="_blank">cwabbott0@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="HOEnZb"><div class="h5">On Tue, Jan 12, 2016 at 12:54 AM, Matt Turner <<a href="mailto:mattst88@gmail.com">mattst88@gmail.com</a>> wrote:<br>
> On Mon, Jan 11, 2016 at 6:58 PM, Jason Ekstrand <<a href="mailto:jason@jlekstrand.net">jason@jlekstrand.net</a>> wrote:<br>
>><br>
>> On Jan 11, 2016 2:47 PM, "Matt Turner" <<a href="mailto:mattst88@gmail.com">mattst88@gmail.com</a>> wrote:<br>
>>><br>
>>> The OpenGL specifications for bitfieldInsert() and bitfieldExtract() says:<br>
>>><br>
>>> The result will be undefined if <offset> or <bits> is negative, or if<br>
>>> the sum of <offset> and <bits> is greater than the number of bits<br>
>>> used to store the operand.<br>
>>><br>
>>> Therefore passing bits=32, offset=0 is legal and defined in GLSL.<br>
>>><br>
>>> But the earlier DX11/SM5 bfi/ibfe/ubfe opcodes are specified to accept a<br>
>>> bitfield width ranging from 0-31. As such, Intel and AMD instructions<br>
>>> read only the low 5 bits of the width operand, making them not able to<br>
>>> implement the GLSL-specified behavior directly.<br>
>>><br>
>>> This commit adds a pass that inserts code to implement the trivial cases<br>
>>> of <bits> = 32 as<br>
>>><br>
>>> bitfieldInsert:<br>
>>> bits > 31 ? insert : bitfieldInsert(base, insert, offset, bits)<br>
>>><br>
>>> bitfieldExtract:<br>
>>> bits > 31 ? value : bitfieldExtract(value, offset, bits)<br>
>><br>
>> Why not just add this to nir_opt_algebraic with a flag in<br>
>> nir_compiler_options? That's the way we've done a bunch of other lowering.<br>
><br>
> Note that these are replacing the expression with an expression<br>
> containing the original expression.<br>
><br>
> I think there's a distinction to be made between lowering (decomposing<br>
> an operation into a number of simpler operations) and what I've called<br>
> a "fixup" (inserting some additional code needed for correctness or<br>
> other reasons).<br>
><br>
> I claim that doing this fixup once before optimizations is the best<br>
> solution, with the alternatives suffering from problems:<br>
><br>
> - do it in nir_opt_algebraic. Problem: how to avoid adding the fixup<br>
> code multiple times? The comparison could be optimized away...<br>
> - do it after optimizations. Problem: might have been able to<br>
> optimize the comparison away...<br>
<br>
</div></div>The problem with doing it this way is that now bitfield_insert and<br>
bitfield_extract now have two different semantics, which seems a<br>
little ugly. For example, it's impossible to know in an optimization<br>
pass whether it's legal to do the inverse transform: bits > 31 ? value<br>
: bitfield_extract(value, offset, bits) -> bitfield_extract, which for<br>
HW implementing the GL semantics, might be a useful thing. I think the<br>
cleanest way to do it would be:<br>
<br>
- introduce a new bitfield_extract opcode so that we have both<br>
bitfield_extract and bfe<br>
- clarify that bfe, bfi, and bfm have the D3D semantics, while<br>
bitfield_extract and bitfield_insert are equivalent to the GLSL<br>
builtins<br>
- add a flag to lower the GLSL opcodes into the D3D opcodes, inserting<br>
the fixup code along the way<br>
</blockquote></div><br></div><div class="gmail_extra">This last one is what I'm getting at. If the backend wants it lowered to bfe/bfi/bfm, then it asks for the lowering and it gets lowered, selects and all. If it doesn't want it lowered, it gets to deal with the full GLSL opcode in whatever way it wants.<br><br></div></div>