[Mesa-dev] [PATCH 09/12] i965/fs: make SIMD-splitting respect the original stride/offset

Tue Aug 18 11:16:40 PDT 2015

On Tue, Aug 18, 2015 at 10:53 AM, Francisco Jerez <currojerez at riseup.net> wrote:
> Connor Abbott <cwabbott0 at gmail.com> writes:
>> and so many of them are interconnected and won't even make sense until
>> we have SSA, that it's not always useful to think about the IR we want
>> to have in the distant future in order to decide what to do today.
>> We've talked about scrapping FS and completely re-writing it, and even
>> if we don't, there's going to be a *massive* amount of churn. I think
>> you've managed to convince me that having a separate lowering pass
>> that deals with the DF->F workaround and dropping this patch is
>> useful, since it avoids having to do copy propagation with arbitrary
>> strides/offsets, but unless there's some other practical benefit, I'm
>> not inclined to add more stuff that will only be useful for SSA. If
>> the SSA-based thing is going to be useful today, then sure, let's do
>> it, but otherwise it's just adding dead code.
>>
>
> I thought that making the FS back-end more SSA-friendly incrementally
> was the main point of e.g. the LOAD_PAYLOAD instruction.  Such a PACK
> instruction would be immediately useful for the image load/store
> implementation, for your implementation of the double-packing built-ins,
> for the FS implementation of some other GLSL packing built-ins, and for
> the VEC4 back-end (in fact the VEC4 back-end already has a similar
> instruction but it's somewhat less general than what I described).
> Regardless of the back-end IR being SSA-form or not it would likely make
> things easier to CSE than a sequence of partial writes.

Indeed. The purpose of LOAD_PAYLOAD was to remove one source of
partial writes that would be difficult for SSA, but it also allows us
to CSE the payload set up and the texture operations using it.

In that same vein, I think one thing that might be missing from this
conversation is that our CSE pass *cannot* handle partial writes.
Having a higher-level opcode that's lowered into partial writes after
optimizations avoids this problem.