[Mesa-dev] [PATCH 2/3] i965/fs: Make emit_uniformize a no-op for immediates

Kristian Høgsberg krh at bitplanet.net
Tue Oct 20 11:25:47 PDT 2015


On Tue, Oct 20, 2015 at 3:16 AM, Francisco Jerez <currojerez at riseup.net> wrote:
> Kristian Høgsberg <krh at bitplanet.net> writes:
>
>> On Mon, Oct 19, 2015 at 4:19 AM, Francisco Jerez <currojerez at riseup.net> wrote:
>>> Neil Roberts <neil at linux.intel.com> writes:
>>>
>>>> Just a thought, would it be better to move this check into the
>>>> eliminate_find_live_channel optimisation? That way it could catch
>>>> sources that become immediates through later optimisations. One problem
>>>> with this that I've seen before is that eliminating the
>>>> FIND_LIVE_CHANNEL doesn't cause the subsequent BROADCAST to be
>>>> eliminated because the copy propagation doesn't work.
>>>
>>> I believe in this particular case the BROADCAST instruction should
>>> already be eliminated (by opt_algebraic), because its first source is an
>>> immediate, so this optimization would in fact be redundant with
>>> opt_algebraic+constant propagation if it weren't because the latter is
>>> unable to handle scalar copies.
>>
>> Yes, is happens as you say: first the FIND_LIVE_CHANNEL gets
>> eliminated because it's outside control flow
>> (fs_visitor::eliminate_find_live_channel). Then the broadcast gets
>> reduced to a MOV in opt_algebraic because src0 is uniform (immediate
>> constant). The problem then is the dst of the MOV has stride 0 which
>> can_propagate_from() then bails on.
>>
>>> Another possibility that would likely
>>> be more general than this (because it would handle cases in which, as
>>> Neil said, the argument becomes an immediate only after optimization,
>>> and because it would also avoid the issue with liveness analysis
>>> extending the live ranges of scalar values incorrectly [1]) would be to
>>> extend the destination register of the BROADCAST instructions to a
>>> vector [2] and then add a case to the switch statement of
>>> try_constant_propagate() so that the immediate MOV resulting from
>>> opt_algebraic is propagated into surface instructions (see attachment).
>>
>> I wrote exactly this code to make constant propagate work, plus adding
>> the extra opcodes to the switch statement. It works and I could
>> certainly send that out after this, but
>>
>> 1) This doesn't mean we shouldn't do the if (src.file == IMM)
>> shortcut. It saves the compiler a bit of work in the very common case
>> of
>> non-indirect buffer access.
>>
>> 2) I'm not even sure it makes sense to extend copy-propagation to do
>> this (which is why I went back to just the IMM test). Anything that
>> would be an immediate at this point should be an immediate, if not
>> we're missing something in nir.
>>
> Still this doesn't address the root of the problem, which is that
> emit_uniformize() emits scalar code that the rest of the compiler is not
> able to handle properly.

This is not a case of papering over the root cause, it's about not
creating the root cause in the first place. The output of
emit_uniformize() always ends up as either a surface or a sampler
index, which we only look at later in the generator. There are no
other cases where the result of emit_uniformize() might be part of an
expression that we can copy propagate or otherwise optimize. If the
input to emit_uniformize() isn't an immediate where it could be, nir
optimization needs fixing. So if we add these two lines to
emit_uniformize() to pass immediates straight through, we avoid
generating code that we have to extend the copy prop pass to handle.

Kristian

> Re-implementing a special case of the
> optimization already done by opt_algebraic() might have helped in this
> specific case, but it won't help when the argument is of some other kind
> of uniform value which are also frequently encountered in practice and
> could also be copy-propagated, and it won't help avoid the liveness
> analysis bug [1] in cases where the argument is not an immediate (the
> bug is reported to break some thousands SSBO dEQP tests, although I
> don't know what fraction of them actually use non-constant indexing).
> The alternative solution I provided patches for (and you seem to have
> implemented yourself independently) would address all these issues at
> once.
>
>> Kristian
>>
>>>> I made this patch a while ago but I never posted it anywhere because
>>>> it's a of a kludge and it would probably be better to fix the copy
>>>> propagation:
>>>>
>>>> https://github.com/bpeel/mesa/commit/e4c3286075f891f466fb8558106d2aaa
>>>>
>>> Heh, yeah, I'd rather fix copy propagation instead, which I believe will
>>> become much easier with the use-def-chain analysis pass I'm working on.
>>>
>>>> Either way though I don't think it would do any harm to have Kristian's
>>>> patch as well even if we did improve elimintate_find_live_channel so it
>>>> is:
>>>>
>>>> Reviewed-by: Neil Roberts <neil at linux.intel.com>
>>>>
>>>> - Neil
>>>>
>>>> Kristian Høgsberg Kristensen <krh at bitplanet.net> writes:
>>>>
>>>>> An immdiate is already uniform so just return it up front. Without this,
>>>>> brw_fs_surface_builder ends up passing immediate surface indices through
>>>>> SHADER_OPCODE_BROADCAST. This writes to a stride 0 dst, which we can't
>>>>> constant propagate out of, and further, we don't constant propagate into
>>>>> the typed/untype read/write opcodes at all.  The end result is that all
>>>>> typed/untyped read/write/atomics end up as indirect sends.
>>>>>
>>>>> Code generation should always produce either an immediate or an actual
>>>>> indirect surface index, so we can fix this by just special casing
>>>>> immediates in emit_uniformize.
>>>>>
>>>>> Signed-off-by: Kristian Høgsberg Kristensen <krh at bitplanet.net>
>>>>> ---
>>>>>  src/mesa/drivers/dri/i965/brw_fs_builder.h | 3 +++
>>>>>  1 file changed, 3 insertions(+)
>>>>>
>>>>> diff --git a/src/mesa/drivers/dri/i965/brw_fs_builder.h b/src/mesa/drivers/dri/i965/brw_fs_builder.h
>>>>> index df10a9d..98ce71e 100644
>>>>> --- a/src/mesa/drivers/dri/i965/brw_fs_builder.h
>>>>> +++ b/src/mesa/drivers/dri/i965/brw_fs_builder.h
>>>>> @@ -394,6 +394,9 @@ namespace brw {
>>>>>           const dst_reg chan_index = component(vgrf(BRW_REGISTER_TYPE_UD), 0);
>>>>>           const dst_reg dst = component(vgrf(src.type), 0);
>>>>>
>>>>> +         if (src.file == IMM)
>>>>> +            return src;
>>>>> +
>>>>>           ubld.emit(SHADER_OPCODE_FIND_LIVE_CHANNEL, chan_index);
>>>>>           ubld.emit(SHADER_OPCODE_BROADCAST, dst, src, chan_index);
>>>>>
>>>>> --
>>>>> 2.6.2
>>>>>
>>>>> _______________________________________________
>>>>> mesa-dev mailing list
>>>>> mesa-dev at lists.freedesktop.org
>>>>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>>>> _______________________________________________
>>>> mesa-dev mailing list
>>>> mesa-dev at lists.freedesktop.org
>>>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>>>
>>>
>>> [1] http://lists.freedesktop.org/archives/mesa-dev/2015-October/097085.html
>>> [2] http://lists.freedesktop.org/archives/mesa-dev/attachments/20151014/25dd38dc/attachment.patch
>>>


More information about the mesa-dev mailing list