[Mesa-dev] [PATCH 10/32] i965/fs: Remove logic to keep track of MRF metadata in lower_load_payload().

Fri Feb 20 11:28:08 PST 2015

One more comment here...  This particularly regards your plan of separating
it into "things that match the destination" and "other things" and not copy
prop uniforms or immediates into the "other things".  There is another case
we need to handle.  On older gens (SNB maybe?) the SIMD16 FB write messages
are in the order (r0, g0, b0, a0, r1, g1, b1, a1) instead of the usual (r0,
r1, g0, g1, b0, b1, a0, a1).  This isn't going to work if we expect SIMD16
load_payload instructions to be lowerable to a bunch of SIMD16 MOVs.  On
those platforms, there is a special type of MOV-to-MRF that can move from
(gN, gN+1) to (mK, mK+4) and we handle that in lower_load_payload right
now.  However, it doesn't follow the standard rules.
--Jason

On Fri, Feb 20, 2015 at 2:10 PM, Jason Ekstrand <jason at jlekstrand.net>
wrote:

>
>
> On Fri, Feb 20, 2015 at 1:09 PM, Francisco Jerez <currojerez at riseup.net>
> wrote:
>
>> Jason Ekstrand <jason at jlekstrand.net> writes:
>>
>> > On Fri, Feb 20, 2015 at 4:11 AM, Francisco Jerez <currojerez at riseup.net
>> >
>> > wrote:
>> >
>> >> Jason Ekstrand <jason at jlekstrand.net> writes:
>> >>
>> >> > I'm still a little pensive.  But
>> >> >
>> >> > Reviewed-by: Jason Ekstrand <jason.ekstrand at intel.com>
>> >> >
>> >> Thanks.
>> >>
>> >> > Now for a little aside.  I have come to the conclusion that I made a
>> >> grave
>> >> > mistake when I did the LOAD_PAYLOAD stuff.  In retrospect, I should
>> have
>> >> > just subclassed fs_inst for load_payload.  The problem is that we
>> need to
>> >> > snag a bunch of information for the sources when we create the
>> >> > load_payload.  In particular, we need to know the width of the
>> source so
>> >> > that we know how much space it consumes in the payload and we need to
>> >> know
>> >> > the information required to properly re-create the mov such as
>> >> > force_sechalf and force_writemask_all.  Really, in order to do things
>> >> > properly, we need to gather this information *before* we do any
>> >> > optimizations.  The nasty pile of code that you're editing together
>> with
>> >> > the "effective_width" parameter is a lame attempt to
>> capture/reconstruct
>> >> > this information.  Really, we should just subclass, capture the
>> >> information
>> >> > up-front, and do it properly.
>> >> >
>> >> Yes, absolutely, this lowering pass is a real mess.  There are four
>> more
>> >> unreviewed patches in the mailing list fixing bugs of the metadata
>> >> guessing of lower_load_payload() [1][2][3][4], you may want to take a
>> >> look at them.  There are more bugs I'm aware of but it didn't seem
>> >> practical to fix them.
>> >>
>> >
>> > Yeah, Matt pointed me at those.  I'll give them a look later today.
>> >
>> >
>> >> That said, I don't think it would be worth subclassing fs_inst.
>> >>
>> >> My suggestion would have been to keep it simple and lower LOAD_PAYLOAD
>> >> into a series of MOVs with force_writemask_all enabled in all cases,
>> >> simply rely on the optimizer to eliminate those where possible.  Then
>> >> get rid of the metadata and effective_width guessing.  Instead agree on
>> >> immediates and uniforms being exec_size-wide by convention
>> >> (i.e. LOAD_PAYLOAD's exec_size rather than the original instruction's),
>> >> then prevent constant propagation from propagating an immediate load
>> >> into a LOAD_PAYLOAD if it would lead to a change in the semantics of
>> the
>> >> program, and maybe just run constant propagation after lowering so we
>> >> can be sure those cases are cleaned up properly where register coalesce
>> >> isn't enough.
>> >>
>> >
>> > First off, simply setting force_writemask_all isn't an option.  Consider
>> > the following scenario:
>> >
>> > a = foo;
>> > if (bar) {
>> >    b = 7;
>> > } else {
>> >    use(a);
>> > }
>> > use(b);
>> >
>> > If "use(a)" is the last use of the variable a, then the live ranges of a
>> > and b don't actually over-lap and we can assign a and b to the same
>> > register.  However, if force_writemask_all is set on b, it will destroy
>> the
>> > value in a before its last use.  Right now, we don't actually do this
>> > because our liveness analysis pass flattens everything so it will think
>> > that b and a over-lap even though they don't.  However, if we ever
>> decide
>> > to make the liveness information more accurate, having a pile of
>> > force_writemask_all instructions will be a major problem.  So, while it
>> is
>> > *technically* safe for now, it's a really bad idea in the long-term.
>> >
>> The thing is we *will* have to deal with that scenario.  Building a
>> message payload inherently involves crossing channel boundaries (because
>> of headers, unsupported SIMD modes by some shared functions, etc.).  I'd
>> say it's a bug in the liveness analysis pass if it wouldn't consider the
>> liveness interval of a and b to overlap in your example if the
>> assignment of b had force_writemask_all set.
>>
>
> Yes, I'm aware of that.  However, the part of that register that has to
> squash everything is usually only a couple of registers while the entire
> payload may be up to 13 (if I remmeber correctly).  We'd rather not have to
> squash everything if we can.  All that being said, our liveness/register
> allocation can't handle this and making register allocation handle parts of
> registers interfering but not other bits is going to be a real pain.  Maybe
> not even worth it.  If you'd rather force_writemask_all everything, I'll
> sign off on that for now.  I just wanted to point out that it may not be a
> good permanent solution.
>
>
>> A reasonable compromise might be to leave it up to the caller whether to
>> set the force_writemask_all and force_sechalf flags or not.  I.e. just
>> copy the same flags set on the LOAD_PAYLOAD instruction to the MOV
>> instructions.  That would still allow reducing the liveness intervals in
>> cases where we know that the payload respects channel boundaries.
>>
>> Tracking the flag information per-register in cases where the same
>> payload has well- and ill-behaved values seems rather premature and
>> rather useless to me because the optimizer is likely to be able to get
>> rid of redundant copies with force_writemask_all -- register coalesce
>> is doing this already AFAIK, maybe by accident.
>>
>
> Sure.  I'm not worried about our ability to optimize.  I'm primarily
> worried about register pressure.  Like I said, it's an OK solution in the
> temporary.  I think we'll want to give it more thought in the long-run but
> that's going to interact a lot with how we do register allocation etc.
> --Jason
>
>
>> > Regarding the other suggestion of just requiring width == exec_size for
>> > immediates and uniforms, that seems pretty reasonable to me.  I'd like
>> to
>> > know what it will do to optimizations, but it seems ok initially.  I'm
>> > still a bigger fan of just subclassing and stashing everything we need
>> to
>> > know up-front.  If we do it right, the only things that will actually
>> need
>> > to know about the subclass are the function for creating a LOAD_PAYLOAD
>> and
>> > lower_load_payloads.
>> >
>> > --Jason
>> >
>> >
>> >>
>> >> [1]
>> >>
>> http://lists.freedesktop.org/archives/mesa-dev/2015-January/074614.html
>> >> [2]
>> >>
>> http://lists.freedesktop.org/archives/mesa-dev/2015-February/076094.html
>> >> [3]
>> >>
>> http://lists.freedesktop.org/archives/mesa-dev/2015-February/076097.html
>> >> [4]
>> >>
>> http://lists.freedesktop.org/archives/mesa-dev/2015-February/076098.html
>> >>
>> >>
>> >> > --Jason
>> >> >
>> >> > On Thu, Feb 19, 2015 at 1:53 PM, Jason Ekstrand <
>> jason at jlekstrand.net>
>> >> > wrote:
>> >> >
>> >> >>
>> >> >>
>> >> >> On Thu, Feb 19, 2015 at 1:25 PM, Francisco Jerez <
>> currojerez at riseup.net
>> >> >
>> >> >> wrote:
>> >> >>
>> >> >>> Jason Ekstrand <jason at jlekstrand.net> writes:
>> >> >>>
>> >> >>> > On Thu, Feb 19, 2015 at 12:13 PM, Francisco Jerez <
>> >> >>> currojerez at riseup.net>
>> >> >>> > wrote:
>> >> >>> >
>> >> >>> >> Jason Ekstrand <jason at jlekstrand.net> writes:
>> >> >>> >>
>> >> >>> >> > On Fri, Feb 6, 2015 at 4:01 PM, Francisco Jerez <
>> >> >>> currojerez at riseup.net>
>> >> >>> >> > wrote:
>> >> >>> >> >
>> >> >>> >> >> Hey Matt,
>> >> >>> >> >>
>> >> >>> >> >> Matt Turner <mattst88 at gmail.com> writes:
>> >> >>> >> >>
>> >> >>> >> >> > On Fri, Feb 6, 2015 at 6:42 AM, Francisco Jerez <
>> >> >>> >> currojerez at riseup.net>
>> >> >>> >> >> wrote:
>> >> >>> >> >> >> MRFs cannot be read from anyway so they cannot possibly
>> be a
>> >> >>> valid
>> >> >>> >> >> >> source of LOAD_PAYLOAD.
>> >> >>> >> >> >> ---
>> >> >>> >> >> >
>> >> >>> >> >> > The function only seems to test inst->dst.file == MRF. I
>> don't
>> >> >>> see any
>> >> >>> >> >> > code for handling MRF sources. What am I missing?
>> >> >>> >> >>
>> >> >>> >> >> That test is for "handling" MRF sources -- More precisely,
>> it's
>> >> >>> >> >> collecting the writemask and half flags for MRF writes,
>> which can
>> >> >>> only
>> >> >>> >> >> possibly be useful if we're going to use them later on to
>> read
>> >> >>> something
>> >> >>> >> >> out of an MRF into a payload, which we shouldn't be doing in
>> the
>> >> >>> first
>> >> >>> >> >> place.
>> >> >>> >> >>
>> >> >>> >> >> Aside from simplifying the function somewhat, that allows us
>> to
>> >> >>> drop the
>> >> >>> >> >> 16 register gap reserved for MRFs at register offset zero,
>> what
>> >> will
>> >> >>> >> >> allow us to drop the vgrf_to_reg[] offset calculation
>> completely
>> >> >>> (also
>> >> >>> >> >> in split_virtual_grfs()) in a future patch (not sent for
>> review
>> >> >>> yet).
>> >> >>> >> >>
>> >> >>> >> >
>> >> >>> >> > No, we do read from MRF's sort-of...  Send messages have an
>> >> implicit
>> >> >>> >> "read"
>> >> >>> >> > from an MRF.
>> >> >>> >>
>> >> >>> >> Heh, and that's pretty much the only way you "read" from it.
>> >> >>> >>
>> >> >>> >> > This was written precicely so that we could use LOAD_PAYLOAD
>> >> >>> >> > to build MRF payloads.  We do on pre-GEN6.
>> >> >>> >> >
>> >> >>> >> I'm aware, but you don't need any of this meta-data to
>> LOAD_PAYLOAD
>> >> >>> >> *into* an MRF, and LOAD_PAYLOAD with an MRF as source should be
>> >> illegal
>> >> >>> >> anyway.
>> >> >>> >>
>> >> >>> >
>> >> >>> > And no one is using it that way.  All of the metadata checks you
>> are
>> >> >>> > deleting are checks on the *destination*.
>> >> >>> >
>> >> >>>
>> >> >>> Didn't you write this code yourself?  The only use for the
>> collected
>> >> >>> metadata is initializing the instruction flags of the MOVs
>> subsequent
>> >> >>> LOAD_PAYLOAD instructions are lowered to, based on the metadata
>> already
>> >> >>> collected for its source registers, which can never be MRFs, so the
>> >> >>> metadata you collect from MRF writes is never actually used.
>> >> >>>
>> >> >>
>> >> >> Right... I misred something initially.  Yes, we should never be
>> tracking
>> >> >> MRF's as a source of a LOAD_PAYLOAD.  I'll give it a better look a
>> bit
>> >> >> later, but it looks better.
>> >> >>
>> >>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20150220/ed03efbe/attachment-0001.html>