[Mesa-dev] [PATCH 2/3] i965/fs: Make emit_uniformize a no-op for immediates
Kristian Høgsberg
krh at bitplanet.net
Fri Oct 23 09:45:41 PDT 2015
On Fri, Oct 23, 2015 at 5:38 AM, Francisco Jerez <currojerez at riseup.net> wrote:
> Kristian Høgsberg <krh at bitplanet.net> writes:
>
>> On Tue, Oct 20, 2015 at 11:56 AM, Francisco Jerez <currojerez at riseup.net> wrote:
>>> Kristian Høgsberg <krh at bitplanet.net> writes:
>>>
>>>> On Tue, Oct 20, 2015 at 3:16 AM, Francisco Jerez <currojerez at riseup.net> wrote:
>>>>> Kristian Høgsberg <krh at bitplanet.net> writes:
>>>>>
>>>>>> On Mon, Oct 19, 2015 at 4:19 AM, Francisco Jerez <currojerez at riseup.net> wrote:
>>>>>>> Neil Roberts <neil at linux.intel.com> writes:
>>>>>>>
>>>>>>>> Just a thought, would it be better to move this check into the
>>>>>>>> eliminate_find_live_channel optimisation? That way it could catch
>>>>>>>> sources that become immediates through later optimisations. One problem
>>>>>>>> with this that I've seen before is that eliminating the
>>>>>>>> FIND_LIVE_CHANNEL doesn't cause the subsequent BROADCAST to be
>>>>>>>> eliminated because the copy propagation doesn't work.
>>>>>>>
>>>>>>> I believe in this particular case the BROADCAST instruction should
>>>>>>> already be eliminated (by opt_algebraic), because its first source is an
>>>>>>> immediate, so this optimization would in fact be redundant with
>>>>>>> opt_algebraic+constant propagation if it weren't because the latter is
>>>>>>> unable to handle scalar copies.
>>>>>>
>>>>>> Yes, is happens as you say: first the FIND_LIVE_CHANNEL gets
>>>>>> eliminated because it's outside control flow
>>>>>> (fs_visitor::eliminate_find_live_channel). Then the broadcast gets
>>>>>> reduced to a MOV in opt_algebraic because src0 is uniform (immediate
>>>>>> constant). The problem then is the dst of the MOV has stride 0 which
>>>>>> can_propagate_from() then bails on.
>>>>>>
>>>>>>> Another possibility that would likely
>>>>>>> be more general than this (because it would handle cases in which, as
>>>>>>> Neil said, the argument becomes an immediate only after optimization,
>>>>>>> and because it would also avoid the issue with liveness analysis
>>>>>>> extending the live ranges of scalar values incorrectly [1]) would be to
>>>>>>> extend the destination register of the BROADCAST instructions to a
>>>>>>> vector [2] and then add a case to the switch statement of
>>>>>>> try_constant_propagate() so that the immediate MOV resulting from
>>>>>>> opt_algebraic is propagated into surface instructions (see attachment).
>>>>>>
>>>>>> I wrote exactly this code to make constant propagate work, plus adding
>>>>>> the extra opcodes to the switch statement. It works and I could
>>>>>> certainly send that out after this, but
>>>>>>
>>>>>> 1) This doesn't mean we shouldn't do the if (src.file == IMM)
>>>>>> shortcut. It saves the compiler a bit of work in the very common case
>>>>>> of
>>>>>> non-indirect buffer access.
>>>>>>
>>>>>> 2) I'm not even sure it makes sense to extend copy-propagation to do
>>>>>> this (which is why I went back to just the IMM test). Anything that
>>>>>> would be an immediate at this point should be an immediate, if not
>>>>>> we're missing something in nir.
>>>>>>
>>>>> Still this doesn't address the root of the problem, which is that
>>>>> emit_uniformize() emits scalar code that the rest of the compiler is not
>>>>> able to handle properly.
>>>>
>>>> This is not a case of papering over the root cause, it's about not
>>>> creating the root cause in the first place. The output of
>>>> emit_uniformize() always ends up as either a surface or a sampler
>>>> index, which we only look at later in the generator. There are no
>>>> other cases where the result of emit_uniformize() might be part of an
>>>> expression that we can copy propagate or otherwise optimize. If the
>>>> input to emit_uniformize() isn't an immediate where it could be, nir
>>>> optimization needs fixing. So if we add these two lines to
>>>> emit_uniformize() to pass immediates straight through, we avoid
>>>> generating code that we have to extend the copy prop pass to handle.
>>>>
>>>
>>> Kristian, there are legitimate uses of emit_uniformize() in which the
>>> argument is not an immediate but still can be optimized out later on --
>>> E.g. for images it will frequently be a uniform register, or a
>>> non-constant expression calculated within uniform control flow (in which
>>> case the FIND_LIVE_CHANNEL instruction emitted here will be reduced to a
>>> constant MOV by eliminate_find_live_chaso nnel()). In such cases we still
>>> want copy-propagation to kick in, but it wont because of the problem I
>>> was talking about with scalar writes. Even if the instructions emitted
>>> by emit_uniformize() cannot be optimized out, liveness analysis will
>>> overestimate the live ranges of the temporaries used by
>>> emit_uniformize() for the same reason, potentially causing the register
>>> allocator to spill or run out of registers.
>>
>> Yeah, fair point. I went and took a look at non-constant surface index
>> and sent out a new series that make that work better as well as ssbo
>> write optimizations. I took out the imm shortcut and put in the
>> generic optimizations. It makes non-const surface index a little
>> better, but as I wrote in the cover letter, it still doesn't work that
>> well as we don't propagate the surface index into the send
>> instructions.
>>
>>> Anyway don't take me wrong, I'm not NAK-ing your patch or anything, I
>>> just have the feeling that by fixing this more generally you could've
>>> saved us [or most likely me ;)] work in the near future.
>>
>> I may be saving you some time, but I've been working on this for
>> longer than I expected ;-)
>>
> Thanks, and sorry for the trouble. :)
And thanks for the reviews, pushed the series now. I'll leave scalar
copy propagation to you then.
Kristian
>> Kristian
>>
>>>> Kristian
>>>>
>>>>> Re-implementing a special case of the
>>>>> optimization already done by opt_algebraic() might have helped in this
>>>>> specific case, but it won't help when the argument is of some other kind
>>>>> of uniform value which are also frequently encountered in practice and
>>>>> could also be copy-propagated, and it won't help avoid the liveness
>>>>> analysis bug [1] in cases where the argument is not an immediate (the
>>>>> bug is reported to break some thousands SSBO dEQP tests, although I
>>>>> don't know what fraction of them actually use non-constant indexing).
>>>>> The alternative solution I provided patches for (and you seem to have
>>>>> implemented yourself independently) would address all these issues at
>>>>> once.
>>>>>
>>>>>> Kristian
>>>>>>
>>>>>>>> I made this patch a while ago but I never posted it anywhere because
>>>>>>>> it's a of a kludge and it would probably be better to fix the copy
>>>>>>>> propagation:
>>>>>>>>
>>>>>>>> https://github.com/bpeel/mesa/commit/e4c3286075f891f466fb8558106d2aaa
>>>>>>>>
>>>>>>> Heh, yeah, I'd rather fix copy propagation instead, which I believe will
>>>>>>> become much easier with the use-def-chain analysis pass I'm working on.
>>>>>>>
>>>>>>>> Either way though I don't think it would do any harm to have Kristian's
>>>>>>>> patch as well even if we did improve elimintate_find_live_channel so it
>>>>>>>> is:
>>>>>>>>
>>>>>>>> Reviewed-by: Neil Roberts <neil at linux.intel.com>
>>>>>>>>
>>>>>>>> - Neil
>>>>>>>>
>>>>>>>> Kristian Høgsberg Kristensen <krh at bitplanet.net> writes:
>>>>>>>>
>>>>>>>>> An immdiate is already uniform so just return it up front. Without this,
>>>>>>>>> brw_fs_surface_builder ends up passing immediate surface indices through
>>>>>>>>> SHADER_OPCODE_BROADCAST. This writes to a stride 0 dst, which we can't
>>>>>>>>> constant propagate out of, and further, we don't constant propagate into
>>>>>>>>> the typed/untype read/write opcodes at all. The end result is that all
>>>>>>>>> typed/untyped read/write/atomics end up as indirect sends.
>>>>>>>>>
>>>>>>>>> Code generation should always produce either an immediate or an actual
>>>>>>>>> indirect surface index, so we can fix this by just special casing
>>>>>>>>> immediates in emit_uniformize.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Kristian Høgsberg Kristensen <krh at bitplanet.net>
>>>>>>>>> ---
>>>>>>>>> src/mesa/drivers/dri/i965/brw_fs_builder.h | 3 +++
>>>>>>>>> 1 file changed, 3 insertions(+)
>>>>>>>>>
>>>>>>>>> diff --git a/src/mesa/drivers/dri/i965/brw_fs_builder.h b/src/mesa/drivers/dri/i965/brw_fs_builder.h
>>>>>>>>> index df10a9d..98ce71e 100644
>>>>>>>>> --- a/src/mesa/drivers/dri/i965/brw_fs_builder.h
>>>>>>>>> +++ b/src/mesa/drivers/dri/i965/brw_fs_builder.h
>>>>>>>>> @@ -394,6 +394,9 @@ namespace brw {
>>>>>>>>> const dst_reg chan_index = component(vgrf(BRW_REGISTER_TYPE_UD), 0);
>>>>>>>>> const dst_reg dst = component(vgrf(src.type), 0);
>>>>>>>>>
>>>>>>>>> + if (src.file == IMM)
>>>>>>>>> + return src;
>>>>>>>>> +
>>>>>>>>> ubld.emit(SHADER_OPCODE_FIND_LIVE_CHANNEL, chan_index);
>>>>>>>>> ubld.emit(SHADER_OPCODE_BROADCAST, dst, src, chan_index);
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> 2.6.2
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> mesa-dev mailing list
>>>>>>>>> mesa-dev at lists.freedesktop.org
>>>>>>>>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>>>>>>>> _______________________________________________
>>>>>>>> mesa-dev mailing list
>>>>>>>> mesa-dev at lists.freedesktop.org
>>>>>>>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>>>>>>>
>>>>>>>
>>>>>>> [1] http://lists.freedesktop.org/archives/mesa-dev/2015-October/097085.html
>>>>>>> [2] http://lists.freedesktop.org/archives/mesa-dev/attachments/20151014/25dd38dc/attachment.patch
>>>>>>>
More information about the mesa-dev
mailing list