[Mesa-dev] [PATCH 10/32] i965/fs: Remove logic to keep track of MRF metadata in lower_load_payload().

Fri Feb 20 11:46:03 PST 2015

On Fri, Feb 20, 2015 at 2:43 PM, Francisco Jerez <currojerez at riseup.net>
wrote:

> Jason Ekstrand <jason at jlekstrand.net> writes:
>
> > One more comment here...  This particularly regards your plan of
> separating
> > it into "things that match the destination" and "other things" and not
> copy
> > prop uniforms or immediates into the "other things".  There is another
> case
> > we need to handle.  On older gens (SNB maybe?) the SIMD16 FB write
> messages
> > are in the order (r0, g0, b0, a0, r1, g1, b1, a1) instead of the usual
> (r0,
> > r1, g0, g1, b0, b1, a0, a1).  This isn't going to work if we expect
> SIMD16
> > load_payload instructions to be lowerable to a bunch of SIMD16 MOVs.  On
> > those platforms, there is a special type of MOV-to-MRF that can move from
> > (gN, gN+1) to (mK, mK+4) and we handle that in lower_load_payload right
> > now.  However, it doesn't follow the standard rules.
> > --Jason
> >
> I don't see why it couldn't be handled in roughly the same way you do it
> now?  Recognize when src[i + 4] is the same 8-wide register as src[i]
> shifted by 8 and emit a COMPR4 copy in that case?
>

Sure.  I haven't thought about it hard enough to prove it can't be done.
Just wanted to make sure you were aware of that particular monkey-wrench.
--Jason

> > On Fri, Feb 20, 2015 at 2:10 PM, Jason Ekstrand <jason at jlekstrand.net>
> > wrote:
> >
> >>
> >>
> >> On Fri, Feb 20, 2015 at 1:09 PM, Francisco Jerez <currojerez at riseup.net
> >
> >> wrote:
> >>
> >>> Jason Ekstrand <jason at jlekstrand.net> writes:
> >>>
> >>> > On Fri, Feb 20, 2015 at 4:11 AM, Francisco Jerez <
> currojerez at riseup.net
> >>> >
> >>> > wrote:
> >>> >
> >>> >> Jason Ekstrand <jason at jlekstrand.net> writes:
> >>> >>
> >>> >> > I'm still a little pensive.  But
> >>> >> >
> >>> >> > Reviewed-by: Jason Ekstrand <jason.ekstrand at intel.com>
> >>> >> >
> >>> >> Thanks.
> >>> >>
> >>> >> > Now for a little aside.  I have come to the conclusion that I
> made a
> >>> >> grave
> >>> >> > mistake when I did the LOAD_PAYLOAD stuff.  In retrospect, I
> should
> >>> have
> >>> >> > just subclassed fs_inst for load_payload.  The problem is that we
> >>> need to
> >>> >> > snag a bunch of information for the sources when we create the
> >>> >> > load_payload.  In particular, we need to know the width of the
> >>> source so
> >>> >> > that we know how much space it consumes in the payload and we
> need to
> >>> >> know
> >>> >> > the information required to properly re-create the mov such as
> >>> >> > force_sechalf and force_writemask_all.  Really, in order to do
> things
> >>> >> > properly, we need to gather this information *before* we do any
> >>> >> > optimizations.  The nasty pile of code that you're editing
> together
> >>> with
> >>> >> > the "effective_width" parameter is a lame attempt to
> >>> capture/reconstruct
> >>> >> > this information.  Really, we should just subclass, capture the
> >>> >> information
> >>> >> > up-front, and do it properly.
> >>> >> >
> >>> >> Yes, absolutely, this lowering pass is a real mess.  There are four
> >>> more
> >>> >> unreviewed patches in the mailing list fixing bugs of the metadata
> >>> >> guessing of lower_load_payload() [1][2][3][4], you may want to take
> a
> >>> >> look at them.  There are more bugs I'm aware of but it didn't seem
> >>> >> practical to fix them.
> >>> >>
> >>> >
> >>> > Yeah, Matt pointed me at those.  I'll give them a look later today.
> >>> >
> >>> >
> >>> >> That said, I don't think it would be worth subclassing fs_inst.
> >>> >>
> >>> >> My suggestion would have been to keep it simple and lower
> LOAD_PAYLOAD
> >>> >> into a series of MOVs with force_writemask_all enabled in all cases,
> >>> >> simply rely on the optimizer to eliminate those where possible.
> Then
> >>> >> get rid of the metadata and effective_width guessing.  Instead
> agree on
> >>> >> immediates and uniforms being exec_size-wide by convention
> >>> >> (i.e. LOAD_PAYLOAD's exec_size rather than the original
> instruction's),
> >>> >> then prevent constant propagation from propagating an immediate load
> >>> >> into a LOAD_PAYLOAD if it would lead to a change in the semantics of
> >>> the
> >>> >> program, and maybe just run constant propagation after lowering so
> we
> >>> >> can be sure those cases are cleaned up properly where register
> coalesce
> >>> >> isn't enough.
> >>> >>
> >>> >
> >>> > First off, simply setting force_writemask_all isn't an option.
> Consider
> >>> > the following scenario:
> >>> >
> >>> > a = foo;
> >>> > if (bar) {
> >>> >    b = 7;
> >>> > } else {
> >>> >    use(a);
> >>> > }
> >>> > use(b);
> >>> >
> >>> > If "use(a)" is the last use of the variable a, then the live ranges
> of a
> >>> > and b don't actually over-lap and we can assign a and b to the same
> >>> > register.  However, if force_writemask_all is set on b, it will
> destroy
> >>> the
> >>> > value in a before its last use.  Right now, we don't actually do this
> >>> > because our liveness analysis pass flattens everything so it will
> think
> >>> > that b and a over-lap even though they don't.  However, if we ever
> >>> decide
> >>> > to make the liveness information more accurate, having a pile of
> >>> > force_writemask_all instructions will be a major problem.  So, while
> it
> >>> is
> >>> > *technically* safe for now, it's a really bad idea in the long-term.
> >>> >
> >>> The thing is we *will* have to deal with that scenario.  Building a
> >>> message payload inherently involves crossing channel boundaries
> (because
> >>> of headers, unsupported SIMD modes by some shared functions, etc.).
> I'd
> >>> say it's a bug in the liveness analysis pass if it wouldn't consider
> the
> >>> liveness interval of a and b to overlap in your example if the
> >>> assignment of b had force_writemask_all set.
> >>>
> >>
> >> Yes, I'm aware of that.  However, the part of that register that has to
> >> squash everything is usually only a couple of registers while the entire
> >> payload may be up to 13 (if I remmeber correctly).  We'd rather not
> have to
> >> squash everything if we can.  All that being said, our liveness/register
> >> allocation can't handle this and making register allocation handle
> parts of
> >> registers interfering but not other bits is going to be a real pain.
> Maybe
> >> not even worth it.  If you'd rather force_writemask_all everything, I'll
> >> sign off on that for now.  I just wanted to point out that it may not
> be a
> >> good permanent solution.
> >>
> >>
> >>> A reasonable compromise might be to leave it up to the caller whether
> to
> >>> set the force_writemask_all and force_sechalf flags or not.  I.e. just
> >>> copy the same flags set on the LOAD_PAYLOAD instruction to the MOV
> >>> instructions.  That would still allow reducing the liveness intervals
> in
> >>> cases where we know that the payload respects channel boundaries.
> >>>
> >>> Tracking the flag information per-register in cases where the same
> >>> payload has well- and ill-behaved values seems rather premature and
> >>> rather useless to me because the optimizer is likely to be able to get
> >>> rid of redundant copies with force_writemask_all -- register coalesce
> >>> is doing this already AFAIK, maybe by accident.
> >>>
> >>
> >> Sure.  I'm not worried about our ability to optimize.  I'm primarily
> >> worried about register pressure.  Like I said, it's an OK solution in
> the
> >> temporary.  I think we'll want to give it more thought in the long-run
> but
> >> that's going to interact a lot with how we do register allocation etc.
> >> --Jason
> >>
> >>
> >>> > Regarding the other suggestion of just requiring width == exec_size
> for
> >>> > immediates and uniforms, that seems pretty reasonable to me.  I'd
> like
> >>> to
> >>> > know what it will do to optimizations, but it seems ok initially.
> I'm
> >>> > still a bigger fan of just subclassing and stashing everything we
> need
> >>> to
> >>> > know up-front.  If we do it right, the only things that will actually
> >>> need
> >>> > to know about the subclass are the function for creating a
> LOAD_PAYLOAD
> >>> and
> >>> > lower_load_payloads.
> >>> >
> >>> > --Jason
> >>> >
> >>> >
> >>> >>
> >>> >> [1]
> >>> >>
> >>>
> http://lists.freedesktop.org/archives/mesa-dev/2015-January/074614.html
> >>> >> [2]
> >>> >>
> >>>
> http://lists.freedesktop.org/archives/mesa-dev/2015-February/076094.html
> >>> >> [3]
> >>> >>
> >>>
> http://lists.freedesktop.org/archives/mesa-dev/2015-February/076097.html
> >>> >> [4]
> >>> >>
> >>>
> http://lists.freedesktop.org/archives/mesa-dev/2015-February/076098.html
> >>> >>
> >>> >>
> >>> >> > --Jason
> >>> >> >
> >>> >> > On Thu, Feb 19, 2015 at 1:53 PM, Jason Ekstrand <
> >>> jason at jlekstrand.net>
> >>> >> > wrote:
> >>> >> >
> >>> >> >>
> >>> >> >>
> >>> >> >> On Thu, Feb 19, 2015 at 1:25 PM, Francisco Jerez <
> >>> currojerez at riseup.net
> >>> >> >
> >>> >> >> wrote:
> >>> >> >>
> >>> >> >>> Jason Ekstrand <jason at jlekstrand.net> writes:
> >>> >> >>>
> >>> >> >>> > On Thu, Feb 19, 2015 at 12:13 PM, Francisco Jerez <
> >>> >> >>> currojerez at riseup.net>
> >>> >> >>> > wrote:
> >>> >> >>> >
> >>> >> >>> >> Jason Ekstrand <jason at jlekstrand.net> writes:
> >>> >> >>> >>
> >>> >> >>> >> > On Fri, Feb 6, 2015 at 4:01 PM, Francisco Jerez <
> >>> >> >>> currojerez at riseup.net>
> >>> >> >>> >> > wrote:
> >>> >> >>> >> >
> >>> >> >>> >> >> Hey Matt,
> >>> >> >>> >> >>
> >>> >> >>> >> >> Matt Turner <mattst88 at gmail.com> writes:
> >>> >> >>> >> >>
> >>> >> >>> >> >> > On Fri, Feb 6, 2015 at 6:42 AM, Francisco Jerez <
> >>> >> >>> >> currojerez at riseup.net>
> >>> >> >>> >> >> wrote:
> >>> >> >>> >> >> >> MRFs cannot be read from anyway so they cannot possibly
> >>> be a
> >>> >> >>> valid
> >>> >> >>> >> >> >> source of LOAD_PAYLOAD.
> >>> >> >>> >> >> >> ---
> >>> >> >>> >> >> >
> >>> >> >>> >> >> > The function only seems to test inst->dst.file == MRF. I
> >>> don't
> >>> >> >>> see any
> >>> >> >>> >> >> > code for handling MRF sources. What am I missing?
> >>> >> >>> >> >>
> >>> >> >>> >> >> That test is for "handling" MRF sources -- More precisely,
> >>> it's
> >>> >> >>> >> >> collecting the writemask and half flags for MRF writes,
> >>> which can
> >>> >> >>> only
> >>> >> >>> >> >> possibly be useful if we're going to use them later on to
> >>> read
> >>> >> >>> something
> >>> >> >>> >> >> out of an MRF into a payload, which we shouldn't be doing
> in
> >>> the
> >>> >> >>> first
> >>> >> >>> >> >> place.
> >>> >> >>> >> >>
> >>> >> >>> >> >> Aside from simplifying the function somewhat, that allows
> us
> >>> to
> >>> >> >>> drop the
> >>> >> >>> >> >> 16 register gap reserved for MRFs at register offset zero,
> >>> what
> >>> >> will
> >>> >> >>> >> >> allow us to drop the vgrf_to_reg[] offset calculation
> >>> completely
> >>> >> >>> (also
> >>> >> >>> >> >> in split_virtual_grfs()) in a future patch (not sent for
> >>> review
> >>> >> >>> yet).
> >>> >> >>> >> >>
> >>> >> >>> >> >
> >>> >> >>> >> > No, we do read from MRF's sort-of...  Send messages have an
> >>> >> implicit
> >>> >> >>> >> "read"
> >>> >> >>> >> > from an MRF.
> >>> >> >>> >>
> >>> >> >>> >> Heh, and that's pretty much the only way you "read" from it.
> >>> >> >>> >>
> >>> >> >>> >> > This was written precicely so that we could use
> LOAD_PAYLOAD
> >>> >> >>> >> > to build MRF payloads.  We do on pre-GEN6.
> >>> >> >>> >> >
> >>> >> >>> >> I'm aware, but you don't need any of this meta-data to
> >>> LOAD_PAYLOAD
> >>> >> >>> >> *into* an MRF, and LOAD_PAYLOAD with an MRF as source should
> be
> >>> >> illegal
> >>> >> >>> >> anyway.
> >>> >> >>> >>
> >>> >> >>> >
> >>> >> >>> > And no one is using it that way.  All of the metadata checks
> you
> >>> are
> >>> >> >>> > deleting are checks on the *destination*.
> >>> >> >>> >
> >>> >> >>>
> >>> >> >>> Didn't you write this code yourself?  The only use for the
> >>> collected
> >>> >> >>> metadata is initializing the instruction flags of the MOVs
> >>> subsequent
> >>> >> >>> LOAD_PAYLOAD instructions are lowered to, based on the metadata
> >>> already
> >>> >> >>> collected for its source registers, which can never be MRFs, so
> the
> >>> >> >>> metadata you collect from MRF writes is never actually used.
> >>> >> >>>
> >>> >> >>
> >>> >> >> Right... I misred something initially.  Yes, we should never be
> >>> tracking
> >>> >> >> MRF's as a source of a LOAD_PAYLOAD.  I'll give it a better look
> a
> >>> bit
> >>> >> >> later, but it looks better.
> >>> >> >>
> >>> >>
> >>>
> >>
> >>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20150220/c3f72d60/attachment-0001.html>