[Mesa-dev] i965 implementation of the ARB_shader_image_load_store built-ins. (v2)
currojerez at riseup.net
Thu May 14 08:43:54 PDT 2015
Kenneth Graunke <kenneth at whitecape.org> writes:
> On Wednesday, May 13, 2015 07:33:22 PM Francisco Jerez wrote:
>> - Nothing prevents you from "evaluating" the builder framework
>> independent from the tiling and format conversion code.
> I think you misunderstand - we all largely agree that the fs_builder
> infrastructure is a good idea. I think we've mostly evaluated it and
> concluded that it's worth doing.
> But the array_utils stuff - a new subclass of backend_reg -
> emit_collect, emit_extract, emit_zip, emit_send, all of that is a bunch
> of new infrastructure that's only used in your new code, and is hard to
Right, I misunderstood because I didn't consider those to be a "large
amount of infrastructure", but rather an implementation detail I wasn't
expecting to receive much attention, and because in his mail Matt
explicitly mentioned the builder and quoted text from my builder patch.
>> That's not nearly what I meant. Of course whether and how we carry out
>> the transition will still be open for discussion, that statement you
>> quote was intended to show that I'm willing to do the hard work and not
>> going to abandon the new infrastructure while the transition is half-way
>> done (as Ken rudely insinuated on IRC).
> That's good news. In the past, many instances of "we'll fix it later"
> have turned into "we never got around to it" - not from you, but from
> other developers.
I understand your concern, but dismissing a proposal that can
potentially improve the back-end architecture in the long run just
because you're uncertain about the author's commitment doesn't seem like
a very constructive reaction. Most likely I misunderstood you and you
never meant to dismiss it, but a simple question about my long-term
plans would have been less discouraging.
> We've also been fixing bugs with subreg_offset for about a year
> now...which was added for ARB_image_load_store (IIRC)...
Although it's somewhat off-topic, I think it's good that you're bringing
this point up. I admit that retrospectively adding the subreg_offset
field at that point was a mistake for several reasons:
- It was a compromise from the start. What I originally tried to do
was to change reg_offset to keep track of the offset in bytes rather
than in dispatch_width units (which is crazy but it's what it used to
do back then), but the diff turned out to be massive and caused a
considerable amount of breakage (this probably will sound familiar to
Jason), so I settled on using a separate field under the pressure to
get things working now, what retrospectively seems silly because my
initial attempt at ARB_shader_image_load_store wasn't accepted.
Ironically Jason made reg_offset much more sensible (kudos!) last
year while I wasn't paying attention, but we missed the opportunity
to unify the two offset fields.
- It wasn't SSA-friendly. subreg_offset was introduced before there
was a consensus about switching the back-end IR to SSA-form. One of
its primary use cases was to implement packing efficiently using
strided sub-dword copies, what fundamentally relies on doing multiple
partial writes of a register. Retrospectively it would have been
better to define virtual opcodes similar to VEC4_OPCODE_PACK_BYTES
but consistent across back-ends and supporting both packing and
unpacking and different component widths. This doesn't mean that
subreg_offset doesn't make sense in an SSA-form IR, only this
particular use case doesn't, we probably still want some mechanism to
represent sub-GRF addressing even while in SSA-form in order to
implement sub-dword arithmetic reasonably.
At some point my packing code based on strided copies stopped working
because of a bug in lower_load_payload(). I submitted a patch to fix it
but Jason was reluctant to include it for obvious reasons. At that
point I decided to drop the strided copy path in favor of the more
general implementation which uses bit arithmetic to achieve the same
result but in the worst-case requires three instructions more. I'm
planning to re-implement the strided copy path eventually as a virtual
opcode to get rid of the unnecessary instructions, if we consider it
> and not accounted for in a lot of places (i.e. dead code elimination
> thought writing g7.2 kills a previous write to any of g7).
Really? I'm pretty sure I fixed fs_inst::is_partial_write() to return
true in that case, which DCE should have been using to decide whether to
kill a previous write. Maybe you were doing a cross-register write or
something like that by accident? (which is going to violate plenty of
other invariants of the compiler *and* of the hardware)
> So it's a real concern, no matter who's writing the code.
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 212 bytes
Desc: not available
More information about the mesa-dev