[Mesa-dev] i965 implementation of the ARB_shader_image_load_store built-ins. (v3)

Tue May 19 08:57:08 PDT 2015

On Tue, May 19, 2015 at 5:42 AM, Francisco Jerez <currojerez at riseup.net> wrote:
> Jason Ekstrand <jason at jlekstrand.net> writes:
>
>> On Mon, May 18, 2015 at 10:34 AM, Francisco Jerez <currojerez at riseup.net> wrote:
>>>[...]
>>> I've given this idea a shot.  Can you have a look at the
>>> image-load-store-lower branch of my tree [1]?  It's just a quick and
>>> dirty proof of concept, so don't bother to review it carefully, just let
>>> me know if you agree with the general design before I spend more time on
>>> it.
>>>
>>> [1] http://cgit.freedesktop.org/~currojerez/mesa/log/?h=image-load-store-lower
>>
>> I took a look at it.  I think patch 3 "Add pass to lower opcodes with
>> unsupported SIMD width." is more-or-less exactly what I'm talking
>> about.  What I don't understand is the stuff about split payloads.
>> While I think we *might* be able to split a payload it seems dangerous
>> and like something we shouldn't be doing.
>
> Dangerous how?  Can you elaborate?

It is not always the case that if you just leave the header alone and
split the others that you will get the payload you want for SIMD8.
More in a moment.

>> This is where the "logical" opcodes I mentioned come into play.  I
>> think there has been some miscommunication there; perhaps I didn't
>> explain myself very well.  Allow me to be more explicit; I'll use
>> image loads for my example.
>>
>>  1) We would add an opcode SHADER_IMAGE_LOAD_LOGICAL (or some other
>> name) that takes 4 arguments: image, address, format, and dims just
>> like the emit_image_load helper.
>>  2) Instead of calling the helper, the visitor would just emit
>> SHADER_IMAGE_LOAD_LOGICAL instruction with those arguments.
>>  3) We then run the splitting pass which can easily split the new load
>> instruction since no payloads are involved.
>>  4) We then have a lowering pass which knows how to turn
>> SHADER_IMAGE_LOAD_LOGICAL into an actual load including the payload,
>> pixel mask, and whatever other fiddly bits there are.
>>
>> Steps (1) and (2) may not be quite right (you'll have to help me out
>> here).  We may want to keep emit_image_load so that it can do format
>> conversion and emit an untyped logical instruction.  However, in any
>> case, the logical instruction does not have any payload sources if we
>> can at all help it.
>>
>> Does that make more sense?  Is there something I'm missing?
>
> I don't think that a high-level "image load" opcode would be of much use
> in the back-end IR, the hardware can only do a number untyped and typed
> surface operations, and we probably want to represent them as such.
>
> My _SPLIT opcodes are roughly the same as the _LOGICAL opcodes you
> describe -- as far as the visitor and optimization passes are concerned,
> they both behave as a normal opcode taking an address, surface,
> dimensions and size as separate arguments, the main difference is that
> the lowering to a send-message-style opcode (your step 4) is fully
> deterministic, as the layout of the message payload is inferred from the
> source_is_payload(i) and regs_read(i) instruction queries.  This has two
> obvious advantages:
>
> 1/ The same lowering logic can be reused for *all* send-message opcodes
>    making use of this infrastructure, so there is no need to implement
>    ad-hoc lowering logic for each message, which seemed like the
>    greatest annoyance of your proposal.

The fact that you can do that for untyped reads/writes is great.  It
means we should only need one lowering function for them.
Unfortunately, other messages such as FB writes aren't going to be
quite so simple.  I'm not sure what texturing will look like but I'll
hazard a guess that they won't be as trivial either.

In other words, while it works nicely for those opcodes, I wouldn't
bother building a lot of infastructure in the compiler for it unless
it really saves you something for the untyped surface read/writes.
The texturing and fb-write code will probably have to be custom.  That
said, we already have that custom code written we just need to change
it from emit_single_fb_write to lower_fb_write and make it use the
builder.

> 2/ It could make the transition easier to Gen9 split send messages, as
>    we could just change the one lowering pass to emit instructions with
>    two partially assembled payload sources and let the hardware do the
>    rest, in a way transparent to the visitor code making use of this
>    infrastructure.
>
> By doing this I can also easily avoid defining the array_reg stuff
> others seemed to disagree with for some reason, although personally I
> consider this more an obfuscation than an advantage (sigh).

I understand.  You may not like it, but it's a path towards getting
things merged.
--Jason