[Mesa-dev] i965 implementation of the ARB_shader_image_load_store built-ins. (v3)

Tue May 19 09:25:19 PDT 2015

Jason Ekstrand <jason at jlekstrand.net> writes:

> On Tue, May 19, 2015 at 5:42 AM, Francisco Jerez <currojerez at riseup.net> wrote:
>> Jason Ekstrand <jason at jlekstrand.net> writes:
>>
>>> On Mon, May 18, 2015 at 10:34 AM, Francisco Jerez <currojerez at riseup.net> wrote:
>>>>[...]
>>>> I've given this idea a shot.  Can you have a look at the
>>>> image-load-store-lower branch of my tree [1]?  It's just a quick and
>>>> dirty proof of concept, so don't bother to review it carefully, just let
>>>> me know if you agree with the general design before I spend more time on
>>>> it.
>>>>
>>>> [1] http://cgit.freedesktop.org/~currojerez/mesa/log/?h=image-load-store-lower
>>>
>>> I took a look at it.  I think patch 3 "Add pass to lower opcodes with
>>> unsupported SIMD width." is more-or-less exactly what I'm talking
>>> about.  What I don't understand is the stuff about split payloads.
>>> While I think we *might* be able to split a payload it seems dangerous
>>> and like something we shouldn't be doing.
>>
>> Dangerous how?  Can you elaborate?
>
> It is not always the case that if you just leave the header alone and
> split the others that you will get the payload you want for SIMD8.
> More in a moment.
>
>>> This is where the "logical" opcodes I mentioned come into play.  I
>>> think there has been some miscommunication there; perhaps I didn't
>>> explain myself very well.  Allow me to be more explicit; I'll use
>>> image loads for my example.
>>>
>>>  1) We would add an opcode SHADER_IMAGE_LOAD_LOGICAL (or some other
>>> name) that takes 4 arguments: image, address, format, and dims just
>>> like the emit_image_load helper.
>>>  2) Instead of calling the helper, the visitor would just emit
>>> SHADER_IMAGE_LOAD_LOGICAL instruction with those arguments.
>>>  3) We then run the splitting pass which can easily split the new load
>>> instruction since no payloads are involved.
>>>  4) We then have a lowering pass which knows how to turn
>>> SHADER_IMAGE_LOAD_LOGICAL into an actual load including the payload,
>>> pixel mask, and whatever other fiddly bits there are.
>>>
>>> Steps (1) and (2) may not be quite right (you'll have to help me out
>>> here).  We may want to keep emit_image_load so that it can do format
>>> conversion and emit an untyped logical instruction.  However, in any
>>> case, the logical instruction does not have any payload sources if we
>>> can at all help it.
>>>
>>> Does that make more sense?  Is there something I'm missing?
>>
>> I don't think that a high-level "image load" opcode would be of much use
>> in the back-end IR, the hardware can only do a number untyped and typed
>> surface operations, and we probably want to represent them as such.
>>
>> My _SPLIT opcodes are roughly the same as the _LOGICAL opcodes you
>> describe -- as far as the visitor and optimization passes are concerned,
>> they both behave as a normal opcode taking an address, surface,
>> dimensions and size as separate arguments, the main difference is that
>> the lowering to a send-message-style opcode (your step 4) is fully
>> deterministic, as the layout of the message payload is inferred from the
>> source_is_payload(i) and regs_read(i) instruction queries.  This has two
>> obvious advantages:
>>
>> 1/ The same lowering logic can be reused for *all* send-message opcodes
>>    making use of this infrastructure, so there is no need to implement
>>    ad-hoc lowering logic for each message, which seemed like the
>>    greatest annoyance of your proposal.
>
> The fact that you can do that for untyped reads/writes is great.  It
> means we should only need one lowering function for them.
> Unfortunately, other messages such as FB writes aren't going to be
> quite so simple.  I'm not sure what texturing will look like but I'll
> hazard a guess that they won't be as trivial either.
>
No, FB writes and texturing both fit under the same framework just fine.
I'll port them if people consider it useful.

> In other words, while it works nicely for those opcodes, I wouldn't
> bother building a lot of infastructure in the compiler for it unless
> it really saves you something for the untyped surface read/writes.
> The texturing and fb-write code will probably have to be custom.  That
> said, we already have that custom code written we just need to change
> it from emit_single_fb_write to lower_fb_write and make it use the
> builder.
>
>> 2/ It could make the transition easier to Gen9 split send messages, as
>>    we could just change the one lowering pass to emit instructions with
>>    two partially assembled payload sources and let the hardware do the
>>    rest, in a way transparent to the visitor code making use of this
>>    infrastructure.
>>
>> By doing this I can also easily avoid defining the array_reg stuff
>> others seemed to disagree with for some reason, although personally I
>> consider this more an obfuscation than an advantage (sigh).
>
> I understand.  You may not like it, but it's a path towards getting
> things merged.
> --Jason
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 212 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20150519/8a8a9956/attachment.sig>