[Mesa-dev] [PATCH v2 00/15] i965: Rework uniform handling in the back-end

Fri Apr 8 15:10:08 UTC 2016

On Apr 7, 2016 11:08 PM, "Jason Ekstrand" <jason at jlekstrand.net> wrote:
>
>
> On Apr 7, 2016 10:01 PM, "Matt Turner" <mattst88 at gmail.com> wrote:
> >
> > On Tue, Mar 22, 2016 at 3:33 PM, Jason Ekstrand <jason at jlekstrand.net>
wrote:
> > > This is mostly a re-send of a patch series I've had floating around
in one
> > > form or a while for quite some time.  It's basically the same except
that
> > > the original version was missing a work-around for Sandy Bridge.  For
a
> > > while, I wasn't really pushing to get it merged because I couldn't
> > > demonstrate any actual performance benifit from pushing arrays.
However,
> > > with the Vulkan API, the concept of push constants is directly
exposed to
> > > the user and we really need to be able to indirect on them.  This
series
> > > makes the FS backend 100% ready for indirect push constants;  vec4
will
> > > take a little more work.
> > >
> > > It's worth noting that we've been carying these patches around in our
> > > Vulkan driver for probably 3 or 4 months now and it's working great.
> > >
> > > For those that prefer to review on a branch:
> > >
> > >
https://cgit.freedesktop.org/~jekstrand/mesa/log/?h=review/i965-uniforms
> > >
> > > I think Kristian has mostly reviewed these patches.  However, he
never sent
> > > any R-Bs to the list.  I'd also like Ken or Matt to look at it from a
> > > design perspective.
> >
> > I don't know what I think. I'm sympathetic to Curro's argument, but in
> > the absence of more data it's hard to judge anything really. I'm not
> > at all sympathetic to
> >
> > """
> > Do I have a proof-of-concept in code, no.  However, I've run through
> > it in my head and I have a pretty good idea what it would look like.
> > You are free to go off and do it if you don't believe me, but I don't
> > really want to hold things up while you do.
> > """
> >
> > That's what... An Appeal to Your Brain? :)
>
> Sort-of... It was more a remark of frustration at the (percieved)
implication that I hadn't thought about it or at the very least hadnt given
it a fair shake.  In a bit more detail here are some of my thoughts on
reladdr and an ADDR file in no particular order
>
> a) Not a single FS optimization pass handles it.  Yes, "if you see
reladdr, bail” is a valid (if suboptimal) strategy 90% of the time.
However, anything that computes any sort of kill set now needs a recursive
algorithm to walk register sources.  We do handle this in NIR and it's not
terrible but it does come with nontrivial pain and retrofitting it isn't
necessarily going to be quick-and-easy.  Curro's response of "use-def
chains will fix this" while probably accurate doesn't solve the immediate
problem while these patches have been on the list for 6 months.
>
> b) The hardware doesn't do reladdr.  It has an address register with
substantial restrictions.  Eventually, we would need to lower to something
that writes the address register and have an indirect source type that
consumes it.  If you end up with two indirect sources, we have to emit a
move for one of them.  Where do we do that lowering?  Do we do it in the
generator or as a pass?
>
> c) If we handle it all in the generator, we have no ability to schedule
it at all.  It also makes the generator far more complex.
>
> d) If we handle it in a lowering pass, what does that pass produce? Do we
expose the ADDR file and try to do RA on it or do we treat it as a fixed
thing like flag?  In either case, we need to add extra logic to at least
the scheduler if not other places to add this whole new concept.
>
> e) If we allow indirect sources of any sort, how do we carry range
information around post-RA.  Pre-RA we can theoretically just say if you
indirect you touch the whole thing.  Post-RA, you either have to carry that
information around per-instruction or you have to assume that any
instruction that uses an indirect source could be reading anything in the
entire GRF and it becomes almost a complete scheduling barrier.
>
> Those are the thoughts that pop to the top. I could come up with more if
you'd like.
>
> So, yes, using reladdr or or an ADDR file would be possible but it would
involve substantial IR surgery.  What's the benefit?  You can put the
relative source directly in the instruction that uses it and maybe do an
address calculation directly to the address register instead of having to
move it there.  The approach I've taken on the other hand, neatly
side-steps all of the issues listed above.  This comes at the cost of a few
extra instructions (which you probably have to spend anyway on gen7).  I
think that trade-off is worthwhile.

There are some other things I found to be very nice about the approach. One
is that we got too stop carrying around these arrays of extra uniform size
information.  They're the only substantial per-backend bit of uniform setup
that we do and they're only used for knowing how big the indirected
uniforms are. Their existence has bugged me ever since we first got NIR
going.  They're not too annoying in GL but setting them up from Vulkan was
going to get painful.

It's also worth mentioning that the primary motivation isn't even pushing
arrays in GL.  When I first did this I was hoping that it would either have
some reasonable performance improvement or else that it would be step 1 on
the way to doing full indirect addressing for local arrays.  Once I
realized that indirecting locals wasn't really practical (destinations have
to be uniformly indirect) and that I could find no measurable performance
improvement from it, I sent it or as a "this is cool" RFC and left it at
that.

The real motivation here is Vulkan where the user provides you with a small
(< 128B) block of push constants that is similar (in almost every way) to a
UBO.  Because it is not individual variables like in GL, it gets a bit more
interesting.  One way is that we really should push everything in that
block because the feature exists exactly to expose push hardware like ours
and the user expects that handful of data to be crazy-fast.  Second is that
it needs to support full std140 packing and indirects.  One option here
(that I did consider) is to break it into individual variables that the
backend can pack however it wants and and handle the user's packing on the
CPU.  This could be made to work if we really wanted to.  The other is to
just make the backend handle it.

I hope that provides a bit more context.

--Jason

> > I don't know how to proceed on that front if no one is willing or
> > interested in trying to implement it using reladdr.
> >
> > I ran shader-db.
> >
> > total instructions in shared programs: 7113290 -> 7161760 (0.68%)
> > instructions in affected programs: 866011 -> 914481 (5.60%)
> > helped: 0
> > HURT: 7180
> >
> > total cycles in shared programs: 64705926 -> 64776118 (0.11%)
> > cycles in affected programs: 4951554 -> 5021746 (1.42%)
> > helped: 1605
> > HURT: 5204
> >
> > of which the overwhelming majority is vertex shaders (why? this series
> > is i965/fs). FS changes are just
> >
> > instructions in affected programs: 13550 -> 14132 (4.30%)
> > helped: 0
> > HURT: 50
> >
> > but I'm having a hard time finding shaders that actually use the
> > address register.
> >
> > What's going on with the shader-db regressions?
>
> I think those are mostly D/UD mismatches.  I looked at it some on FS
(hence the only 50 affected shaders) but vec4 must not have gotten the same
love.  There should be zero vec4 changes.  I'll look into it.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20160408/9cd42c1e/attachment.html>