[Mesa-dev] [RFC] i965: alternative to memctx for cleaning up nir variants
robdclark at gmail.com
Tue Dec 29 11:15:28 PST 2015
On Tue, Dec 29, 2015 at 12:36 PM, Jason Ekstrand <jason at jlekstrand.net> wrote:
> On Tue, Dec 29, 2015 at 7:32 AM, Rob Clark <robdclark at gmail.com> wrote:
>> On Mon, Dec 28, 2015 at 4:23 PM, Connor Abbott <cwabbott0 at gmail.com>
>> > On Mon, Dec 28, 2015 at 3:25 PM, Rob Clark <robdclark at gmail.com> wrote:
>> >>>> It is a mix.. I do texcoord saturate, clip-plane, and 2-sided color
>> >>>> lowering in NIR. But flat-shading, binning-pass, and half vs full
>> >>>> precision color output in ir3.
>> >>>> I do as much lowering in NIR as I can, in an effort to do as much as
>> >>>> possible at compile time, vs draw time. I do the first round of
>> >>>> lowering/opt w/ null shader key, which is enough for the common
>> >>>> cases.
>> >>>> Pretty much independent, I suppose, of whether I came out of SSA or
>> >>>> not first. Although binning-pass variant and the instruction
>> >>>> scheduling I do are easier in SSA.
>> >>>> Somewhat unrelated, but I may end up converting array access to
>> >>>> registers, but leave everything else in SSA, so I can benefit from
>> >>>> converting multi-dimensional offsets into a single offset.. this is
>> >>>> still one open issue w/ gallium glsl_to_nir.. right now I have a
>> >>>> hacked up version of nir_lower_io that converts global/local
>> >>>> load/store_var's into load/store_var2 which take an offset as a src
>> >>>> (like load_input/store_output) instead of deref chain.. not sure yet
>> >>>> whether this will be the permanent solution, but at least it fixes a
>> >>>> huge heap of variable-indexing piglits and lets me continue w/
>> >>>> implementing lowering passes for everything else that was previously
>> >>>> done in glsl->tgsi or tgsi->tgsi passes.
>> >>> If you do this, you'll be back to always needing a mutable copy. Most
>> >>> lowering and optimization passes die the moment they see a register.
>> >>> You'll
>> >>> either have to go fix a bunch of stuff up to no-op properly or run
>> >>> vars_to_regs after doing your NIR lowering but before going into your
>> >>> backend IR. This means that your "gold copy" still has variables and
>> >>> you
>> >>> always need to lower them to registers before you go into the backend.
>> >> ugg.. but good point, thanks for pointing that out before I wasted
>> >> another afternoon on yet another dead-end for handling deref's..
>> >> Ok, I guess I need to think of a better name than load/store_var2 for
>> >> the new intrinsics ;-)
>> > I don't think that "you should throw away registers and use your own
>> > thing" is what Jason wanted you to get out of that.
> Correct. Registers are designed explicitly to do exactly what you want:
> Provide an easy-to-work-with linear view of complex variables. I still
> don't understand why you're trying so hard not to use them. The code is
> already written for you, you just have to turn it on. The only thing you
> might have to do is make it take a type_size function like nir_lower_io does
> so that you can configure offset units to your backend's liking.
> What i was trying to get across is that the situation in which you can avoid
> cloning by clever use of reference counting is specific to your driver and
> exact way that you have your lowering passes set up. If you perturb it even
> a little, you have to do a copy all the time and reference counting isn't
> helping anymore. You're free to design your entire compiler stack around
> avoiding that one copy if you wish, but I wouldn't recommend it.
>> perhaps.. I was considering switching to registers for arrays.
>> Although it would end up forcing an extra clone in the common case
>> where there would otherwise not be one... a bit of a tough pill to
> I don't see why that is such a tough pill. Copying is cheap. When we were
> writing the cloning code, we both ran shader-db runs where we were cloning
> after *every* optimization or lowering pass and it still only hurt runtime
> by something like 10 or 20%. A single clone won't even get noticed.
well, that is encouraging.. although I probably tend a little bit
more to the cpu limited side of things..
> Also, you've already said that you pre-compile for the "common case" of a
> zero shader key so that extra clone gets eaten at compile time where you're
> already doing piles of optimization and lowering. The case you really care
> about is when that key is non-zero and you have to stop everything and
> recompile in the middle of a draw. In that case, you have a non-zero key so
> you have to do a clone anyway.
I guess if I did a clone before to_regs pass.. seems a bit
sub-optimal, and w/ a lower_deref pass (which took type_size fxn
ptr) I could get basically the one part of registers that I want..
 Note that part of my gallium glsl_to_nir branch have started to
de-duplicate all the common type_size implementations. Mesa st uses
one that is basically the same as what is in i965 (vec4) with the
addition of double support..
> At the end of the day, I think we're getting nowhere here. We have two
> different memory management models that are in conflict. The ralloc model
> saves us typing and provides some nice safety and refcounting saves you some
> typing and privdes you some nice safety. It's becoming fairly obvious that
> neither side is going to convince the other that their model is better any
> time soon. I'm open to suggestions on how to proceed. One option would be
> to have Anholt come in and break the tie. I'd be ok with that. In any
> case, we need to solve this one way or another and either commit the patch
> or not.
Well, as it stands, on the refcnt'ing side of things, so far I am
outnumbered. I was planning to re-work my ir3 changes without
refcnt'ing, and then the rest of the gallium glsl_to_nir support,
hopefully sometime in the next few days..
>> > Most of the
>> > existing optimization passes barf on registers for a reason: registers
>> > imply that you've gone from "consumer-agnostic NIR," i.e. what's
>> > produced by gtn and operated on by generic optimizations, to your own
>> > driver-specific thing, and any optimizations you're going to run are
>> > only to clean up the result of the lowering passes, so you won't need
>> > to run most of them. In the few cases where we do need an optimization
>> > after lowering to registers, we've gone and fixed it up to no-op
>> > things properly, but in general it's a lot easier and less confusing
>> > to say "new optimization passes don't have to deal with registers"
>> > than to make everyone go and add support for registers to their
>> > passes. I'm not saying that adding a "here's my driver-specific
>> > offset" thing to load/store_var would necessarily be a bad idea, but
>> > don't just dismiss registers out-of-hand.
>> Yeah, I'm not a big fan of making lowering/etc passes deal w/
>> registers unnecessarily. Seems like coming up w/ some way to lower
>> load/store_var deref chains would be easier.
More information about the mesa-dev