[Mesa-dev] [PATCH v2 00/15] i965: Rework uniform handling in the back-end

Fri Apr 8 06:08:02 UTC 2016

On Apr 7, 2016 10:01 PM, "Matt Turner" <mattst88 at gmail.com> wrote:
>
> On Tue, Mar 22, 2016 at 3:33 PM, Jason Ekstrand <jason at jlekstrand.net>
wrote:
> > This is mostly a re-send of a patch series I've had floating around in
one
> > form or a while for quite some time.  It's basically the same except
that
> > the original version was missing a work-around for Sandy Bridge.  For a
> > while, I wasn't really pushing to get it merged because I couldn't
> > demonstrate any actual performance benifit from pushing arrays.
However,
> > with the Vulkan API, the concept of push constants is directly exposed
to
> > the user and we really need to be able to indirect on them.  This series
> > makes the FS backend 100% ready for indirect push constants;  vec4 will
> > take a little more work.
> >
> > It's worth noting that we've been carying these patches around in our
> > Vulkan driver for probably 3 or 4 months now and it's working great.
> >
> > For those that prefer to review on a branch:
> >
> > https://cgit.freedesktop.org/~jekstrand/mesa/log/?h=review/i965-uniforms
> >
> > I think Kristian has mostly reviewed these patches.  However, he never
sent
> > any R-Bs to the list.  I'd also like Ken or Matt to look at it from a
> > design perspective.
>
> I don't know what I think. I'm sympathetic to Curro's argument, but in
> the absence of more data it's hard to judge anything really. I'm not
> at all sympathetic to
>
> """
> Do I have a proof-of-concept in code, no.  However, I've run through
> it in my head and I have a pretty good idea what it would look like.
> You are free to go off and do it if you don't believe me, but I don't
> really want to hold things up while you do.
> """
>
> That's what... An Appeal to Your Brain? :)

Sort-of... It was more a remark of frustration at the (percieved)
implication that I hadn't thought about it or at the very least hadnt given
it a fair shake.  In a bit more detail here are some of my thoughts on
reladdr and an ADDR file in no particular order

a) Not a single FS optimization pass handles it.  Yes, "if you see reladdr,
bail” is a valid (if suboptimal) strategy 90% of the time.  However,
anything that computes any sort of kill set now needs a recursive algorithm
to walk register sources.  We do handle this in NIR and it's not terrible
but it does come with nontrivial pain and retrofitting it isn't necessarily
going to be quick-and-easy.  Curro's response of "use-def chains will fix
this" while probably accurate doesn't solve the immediate problem while
these patches have been on the list for 6 months.

b) The hardware doesn't do reladdr.  It has an address register with
substantial restrictions.  Eventually, we would need to lower to something
that writes the address register and have an indirect source type that
consumes it.  If you end up with two indirect sources, we have to emit a
move for one of them.  Where do we do that lowering?  Do we do it in the
generator or as a pass?

c) If we handle it all in the generator, we have no ability to schedule it
at all.  It also makes the generator far more complex.

d) If we handle it in a lowering pass, what does that pass produce? Do we
expose the ADDR file and try to do RA on it or do we treat it as a fixed
thing like flag?  In either case, we need to add extra logic to at least
the scheduler if not other places to add this whole new concept.

e) If we allow indirect sources of any sort, how do we carry range
information around post-RA.  Pre-RA we can theoretically just say if you
indirect you touch the whole thing.  Post-RA, you either have to carry that
information around per-instruction or you have to assume that any
instruction that uses an indirect source could be reading anything in the
entire GRF and it becomes almost a complete scheduling barrier.

Those are the thoughts that pop to the top. I could come up with more if
you'd like.

So, yes, using reladdr or or an ADDR file would be possible but it would
involve substantial IR surgery.  What's the benefit?  You can put the
relative source directly in the instruction that uses it and maybe do an
address calculation directly to the address register instead of having to
move it there.  The approach I've taken on the other hand, neatly
side-steps all of the issues listed above.  This comes at the cost of a few
extra instructions (which you probably have to spend anyway on gen7).  I
think that trade-off is worthwhile.

--Jason

> I don't know how to proceed on that front if no one is willing or
> interested in trying to implement it using reladdr.
>
> I ran shader-db.
>
> total instructions in shared programs: 7113290 -> 7161760 (0.68%)
> instructions in affected programs: 866011 -> 914481 (5.60%)
> helped: 0
> HURT: 7180
>
> total cycles in shared programs: 64705926 -> 64776118 (0.11%)
> cycles in affected programs: 4951554 -> 5021746 (1.42%)
> helped: 1605
> HURT: 5204
>
> of which the overwhelming majority is vertex shaders (why? this series
> is i965/fs). FS changes are just
>
> instructions in affected programs: 13550 -> 14132 (4.30%)
> helped: 0
> HURT: 50
>
> but I'm having a hard time finding shaders that actually use the
> address register.
>
> What's going on with the shader-db regressions?

I think those are mostly D/UD mismatches.  I looked at it some on FS (hence
the only 50 affected shaders) but vec4 must not have gotten the same love.
There should be zero vec4 changes.  I'll look into it.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20160407/6fd07dcb/attachment-0001.html>