[Mesa-dev] i965 implementation of the ARB_shader_image_load_store built-ins. (v2)

Thu May 14 08:43:54 PDT 2015

Kenneth Graunke <kenneth at whitecape.org> writes:

> On Wednesday, May 13, 2015 07:33:22 PM Francisco Jerez wrote:
>>  - Nothing prevents you from "evaluating" the builder framework
>>    independent from the tiling and format conversion code.
> [snip]
>
> I think you misunderstand - we all largely agree that the fs_builder
> infrastructure is a good idea.  I think we've mostly evaluated it and
> concluded that it's worth doing.
>
> But the array_utils stuff - a new subclass of backend_reg -
> emit_collect, emit_extract, emit_zip, emit_send, all of that is a bunch
> of new infrastructure that's only used in your new code, and is hard to
> evaluate.
>
Right, I misunderstood because I didn't consider those to be a "large
amount of infrastructure", but rather an implementation detail I wasn't
expecting to receive much attention, and because in his mail Matt
explicitly mentioned the builder and quoted text from my builder patch.

>> That's not nearly what I meant.  Of course whether and how we carry out
>> the transition will still be open for discussion, that statement you
>> quote was intended to show that I'm willing to do the hard work and not
>> going to abandon the new infrastructure while the transition is half-way
>> done (as Ken rudely insinuated on IRC).
>
> That's good news.  In the past, many instances of "we'll fix it later"
> have turned into "we never got around to it" - not from you, but from
> other developers.

I understand your concern, but dismissing a proposal that can
potentially improve the back-end architecture in the long run just
because you're uncertain about the author's commitment doesn't seem like
a very constructive reaction.  Most likely I misunderstood you and you
never meant to dismiss it, but a simple question about my long-term
plans would have been less discouraging.

> We've also been fixing bugs with subreg_offset for about a year
> now...which was added for ARB_image_load_store (IIRC)...

Although it's somewhat off-topic, I think it's good that you're bringing
this point up.  I admit that retrospectively adding the subreg_offset
field at that point was a mistake for several reasons:

 - It was a compromise from the start.  What I originally tried to do
   was to change reg_offset to keep track of the offset in bytes rather
   than in dispatch_width units (which is crazy but it's what it used to
   do back then), but the diff turned out to be massive and caused a
   considerable amount of breakage (this probably will sound familiar to
   Jason), so I settled on using a separate field under the pressure to
   get things working now, what retrospectively seems silly because my
   initial attempt at ARB_shader_image_load_store wasn't accepted.
   Ironically Jason made reg_offset much more sensible (kudos!) last
   year while I wasn't paying attention, but we missed the opportunity
   to unify the two offset fields.

 - It wasn't SSA-friendly.  subreg_offset was introduced before there
   was a consensus about switching the back-end IR to SSA-form.  One of
   its primary use cases was to implement packing efficiently using
   strided sub-dword copies, what fundamentally relies on doing multiple
   partial writes of a register.  Retrospectively it would have been
   better to define virtual opcodes similar to VEC4_OPCODE_PACK_BYTES
   but consistent across back-ends and supporting both packing and
   unpacking and different component widths.  This doesn't mean that
   subreg_offset doesn't make sense in an SSA-form IR, only this
   particular use case doesn't, we probably still want some mechanism to
   represent sub-GRF addressing even while in SSA-form in order to
   implement sub-dword arithmetic reasonably.

At some point my packing code based on strided copies stopped working
because of a bug in lower_load_payload().  I submitted a patch to fix it
but Jason was reluctant to include it for obvious reasons.  At that
point I decided to drop the strided copy path in favor of the more
general implementation which uses bit arithmetic to achieve the same
result but in the worst-case requires three instructions more.  I'm
planning to re-implement the strided copy path eventually as a virtual
opcode to get rid of the unnecessary instructions, if we consider it
worth doing.

> and not accounted for in a lot of places (i.e. dead code elimination
> thought writing g7.2 kills a previous write to any of g7).
>
Really?  I'm pretty sure I fixed fs_inst::is_partial_write() to return
true in that case, which DCE should have been using to decide whether to
kill a previous write.  Maybe you were doing a cross-register write or
something like that by accident?  (which is going to violate plenty of
other invariants of the compiler *and* of the hardware)

> So it's a real concern, no matter who's writing the code.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 212 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20150514/f3c13c83/attachment.sig>