<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Dec 21, 2015 at 2:24 PM, Marek Olšák <span dir="ltr"><<a href="mailto:maraeo@gmail.com" target="_blank">maraeo@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="HOEnZb"><div class="h5">On Mon, Dec 21, 2015 at 11:20 PM, Jason Ekstrand <<a href="mailto:jason@jlekstrand.net">jason@jlekstrand.net</a>> wrote:<br>
><br>
> On Dec 21, 2015 1:37 PM, "Marek Olšák" <<a href="mailto:maraeo@gmail.com">maraeo@gmail.com</a>> wrote:<br>
>><br>
>> On Mon, Dec 21, 2015 at 7:48 PM, Jason Ekstrand <<a href="mailto:jason@jlekstrand.net">jason@jlekstrand.net</a>><br>
>> wrote:<br>
>> ><br>
>> > On Dec 21, 2015 9:09 AM, "Connor Abbott" <<a href="mailto:cwabbott0@gmail.com">cwabbott0@gmail.com</a>> wrote:<br>
>> >><br>
>> >> On Mon, Dec 21, 2015 at 11:45 AM, Marek Olšák <<a href="mailto:maraeo@gmail.com">maraeo@gmail.com</a>> wrote:<br>
>> >> > On Mon, Dec 21, 2015 at 4:38 PM, Connor Abbott <<a href="mailto:cwabbott0@gmail.com">cwabbott0@gmail.com</a>><br>
>> >> > wrote:<br>
>> >> >> On Mon, Dec 21, 2015 at 6:39 AM, Marek Olšák <<a href="mailto:maraeo@gmail.com">maraeo@gmail.com</a>><br>
>> >> >> wrote:<br>
>> >> >>> On Mon, Dec 21, 2015 at 6:48 AM, Jason Ekstrand<br>
>> >> >>> <<a href="mailto:jason@jlekstrand.net">jason@jlekstrand.net</a>><br>
>> >> >>> wrote:<br>
>> >> >>>><br>
>> >> >>>> On Dec 20, 2015 7:43 PM, "Rob Clark" <<a href="mailto:robdclark@gmail.com">robdclark@gmail.com</a>> wrote:<br>
>> >> >>>>><br>
>> >> >>>>> On Sun, Dec 20, 2015 at 10:29 PM, Connor Abbott<br>
>> >> >>>>> <<a href="mailto:cwabbott0@gmail.com">cwabbott0@gmail.com</a>><br>
>> >> >>>>> wrote:<br>
>> >> >>>>> > On Sun, Dec 20, 2015 at 10:04 PM, Rob Clark<br>
>> >> >>>>> > <<a href="mailto:robdclark@gmail.com">robdclark@gmail.com</a>><br>
>> >> >>>>> > wrote:<br>
>> >> >>>>> >> On Sun, Dec 20, 2015 at 9:12 PM, Jason Ekstrand<br>
>> >> >>>>> >> <<a href="mailto:jason@jlekstrand.net">jason@jlekstrand.net</a>><br>
>> >> >>>>> >> wrote:<br>
>> >> >>>>> >>><br>
>> >> >>>>> >>> On Dec 19, 2015 5:55 PM, "Rob Clark" <<a href="mailto:robdclark@gmail.com">robdclark@gmail.com</a>><br>
>> >> >>>>> >>> wrote:<br>
>> >> >>>>> >>>><br>
>> >> >>>>> >>>> From: Rob Clark <<a href="mailto:robclark@freedesktop.org">robclark@freedesktop.org</a>><br>
>> >> >>>>> >>>><br>
>> >> >>>>> >>>> Jason,<br>
>> >> >>>>> >>>><br>
>> >> >>>>> >>>> How much do you hate this idea? Seems like an easy<br>
>> >> >>>>> >>>> alternative<br>
>> >> >>>>> >>>> to<br>
>> >> >>>>> >>>> using ralloc ctx's to clean up nir variants/clones, which<br>
>> >> >>>>> >>>> would<br>
>> >> >>>>> >>>> let<br>
>> >> >>>>> >>>> us drop the parent memctx for nir_shader_create()/clone(),<br>
>> >> >>>>> >>>> making<br>
>> >> >>>>> >>>> it easier to introduce reference counting.<br>
>> >> >>>>> >>><br>
>> >> >>>>> >>> I think "hate" is a but strong. I don't like it but it works.<br>
>> >> >>>>> >>> If we<br>
>> >> >>>>> >>> really<br>
>> >> >>>>> >>> want nir_shader refcounted, we'll have to do something.<br>
>> >> >>>>> >><br>
>> >> >>>>> >> I suppose the alternate idea of moving the nir_shader_clone()<br>
>> >> >>>>> >> out<br>
>> >> >>>>> >> of<br>
>> >> >>>>> >> brw_compile_xyz(), and always passing in the clone would be a<br>
>> >> >>>>> >> cleaner<br>
>> >> >>>>> >> way. It looks like each of the brw_compile_xyz() has exactly<br>
>> >> >>>>> >> one<br>
>> >> >>>>> >> call-site, so doing the nir_shader_clone() inside doesn't<br>
>> >> >>>>> >> really<br>
>> >> >>>>> >> buy<br>
>> >> >>>>> >> anything.<br>
>> >> >>>><br>
>> >> >>>> Your forgetting that there may be *cough* other users of this<br>
>> >> >>>> API...<br>
>> >> >>>> We can<br>
>> >> >>>> change those too but I would like the needs of the compiler users<br>
>> >> >>>> to<br>
>> >> >>>> drive<br>
>> >> >>>> the API, not the cloning. I still have some details to work out<br>
>> >> >>>> there. In<br>
>> >> >>>> any case, it doesn't really matter; we can figure something out.<br>
>> >> >>>><br>
>> >> >>>>> >>> About refcounting... The more I think about it the more I'm<br>
>> >> >>>>> >>> not<br>
>> >> >>>>> >>> convinced<br>
>> >> >>>>> >>> it's useful. As it stands, we have no use for it an I'm not<br>
>> >> >>>>> >>> convinced<br>
>> >> >>>>> >>> you<br>
>> >> >>>>> >>> do either. We'll see if I can convince you. :-)<br>
>> >> >>>>> >>><br>
>> >> >>>>> >>> In the history of i965 using NIR, we've had about three<br>
>> >> >>>>> >>> different ways<br>
>> >> >>>>> >>> of<br>
>> >> >>>>> >>> doing things:<br>
>> >> >>>>> >>><br>
>> >> >>>>> >>> 1) GLSL is the gold copy and we run glsl_to_nir for every<br>
>> >> >>>>> >>> shader/variant<br>
>> >> >>>>> >>> compile. This is what we did when we first stated using NIR<br>
>> >> >>>>> >>> because<br>
>> >> >>>>> >>> it was<br>
>> >> >>>>> >>> easy and didn't involve reworking any plumbing.<br>
>> >> >>>>> >>><br>
>> >> >>>>> >>> 2) Lowered NIR is the gold copy; variants are done entirely<br>
>> >> >>>>> >>> in<br>
>> >> >>>>> >>> the<br>
>> >> >>>>> >>> back-end<br>
>> >> >>>>> >>> IR. This is what we did up until about a month ago. Because<br>
>> >> >>>>> >>> variants<br>
>> >> >>>>> >>> are<br>
>> >> >>>>> >>> done in the back-end, we can run gksl_to_nir and do all of<br>
>> >> >>>>> >>> our<br>
>> >> >>>>> >>> optimizing<br>
>> >> >>>>> >>> and lowering at link time. Going from NIR to the final<br>
>> >> >>>>> >>> shader<br>
>> >> >>>>> >>> binary<br>
>> >> >>>>> >>> is<br>
>> >> >>>>> >>> then a read-only operation as far as NIR is concerned.<br>
>> >> >>>>> >>><br>
>> >> >>>>> >>> 3) Optimized but not lowered NIR is the gold copy; variants<br>
>> >> >>>>> >>> are<br>
>> >> >>>>> >>> sometimes<br>
>> >> >>>>> >>> done in NIR. This is the scheme we use now. We call<br>
>> >> >>>>> >>> glsl_to_nir and<br>
>> >> >>>>> >>> do<br>
>> >> >>>>> >>> some of the optimization and lowering at link time but leave<br>
>> >> >>>>> >>> it<br>
>> >> >>>>> >>> in SSA<br>
>> >> >>>>> >>> form.<br>
>> >> >>>>> >>> When we go to compile the final shader, we make a copy, apply<br>
>> >> >>>>> >>> variants, do<br>
>> >> >>>>> >>> the final lowering and then go into the back-end IR.<br>
>> >> >>>>> >>><br>
>> >> >>>>> >>> In each of these cases, we know exactly where we need to make<br>
>> >> >>>>> >>> a<br>
>> >> >>>>> >>> copy<br>
>> >> >>>>> >>> without<br>
>> >> >>>>> >>> the help of reference counting. In the first, we get a fresh<br>
>> >> >>>>> >>> copy<br>
>> >> >>>>> >>> each time<br>
>> >> >>>>> >>> so we are free to destroy the copy. In the second, we never<br>
>> >> >>>>> >>> have to<br>
>> >> >>>>> >>> modify<br>
>> >> >>>>> >>> the NIR so no copy. In the third scheme, we always have to<br>
>> >> >>>>> >>> make<br>
>> >> >>>>> >>> a<br>
>> >> >>>>> >>> copy<br>
>> >> >>>>> >>> because, even if variants are a no-op, we still have to go<br>
>> >> >>>>> >>> out<br>
>> >> >>>>> >>> of SSA<br>
>> >> >>>>> >>> form<br>
>> >> >>>>> >>> and do final lowering. You could say that we could avoid<br>
>> >> >>>>> >>> making<br>
>> >> >>>>> >>> that<br>
>> >> >>>>> >>> copy.<br>
>> >> >>>>> >>> However, the work to determine when we don't need variants<br>
>> >> >>>>> >>> and<br>
>> >> >>>>> >>> can do<br>
>> >> >>>>> >>> all<br>
>> >> >>>>> >>> our lowering up-front is far more than the work saved by<br>
>> >> >>>>> >>> reference<br>
>> >> >>>>> >>> counting.<br>
>> >> >>>>> >>><br>
>> >> >>>>> >>> How about gallium? Here's how I imagine it would work<br>
>> >> >>>>> >>> (please<br>
>> >> >>>>> >>> correct<br>
>> >> >>>>> >>> me of<br>
>> >> >>>>> >>> I'm wrong):<br>
>> >> >>>>> >>><br>
>> >> >>>>> >>> 1) In the TGSI case, tgsi_to_nir gets called for each compile<br>
>> >> >>>>> >>> so<br>
>> >> >>>>> >>> you<br>
>> >> >>>>> >>> get a<br>
>> >> >>>>> >>> fresh mutable shader each time. In this case, you are free<br>
>> >> >>>>> >>> to<br>
>> >> >>>>> >>> do<br>
>> >> >>>>> >>> whatever<br>
>> >> >>>>> >>> you want with the shader without making a copy.<br>
>> >> >>>>> >>><br>
>> >> >>>>> >>> 2) In the GLSL case, you run glsl_to_nir and do some basic<br>
>> >> >>>>> >>> optimizations at<br>
>> >> >>>>> >>> link time and hold onto the NIR shader. (Hold a reference of<br>
>> >> >>>>> >>> you'd<br>
>> >> >>>>> >>> like.)<br>
>> >> >>>>> >>> When you go to compile it in the back-end, it needs to do<br>
>> >> >>>>> >>> it's<br>
>> >> >>>>> >>> own<br>
>> >> >>>>> >>> lowering<br>
>> >> >>>>> >>> so it takes a reference and ends up making a copy.<br>
>> >> >>>>> >>><br>
>> >> >>>>> >>> If this description is anywhere close to correct, then I<br>
>> >> >>>>> >>> don't<br>
>> >> >>>>> >>> think<br>
>> >> >>>>> >>> you<br>
>> >> >>>>> >>> really need it either. Determining whether or not you need<br>
>> >> >>>>> >>> to<br>
>> >> >>>>> >>> copy is<br>
>> >> >>>>> >>> simply "if (comes_from_tgsi)”. Maybe there's something<br>
>> >> >>>>> >>> subtle<br>
>> >> >>>>> >>> about<br>
>> >> >>>>> >>> the<br>
>> >> >>>>> >>> gallium layer that I don't know that makes refcounting the<br>
>> >> >>>>> >>> best<br>
>> >> >>>>> >>> solution.<br>
>> >> >>>>> >>> Please enlighten me of there is.<br>
>> >> >>>>> >><br>
>> >> >>>>> >> This issue is that we *potentially* have both the state<br>
>> >> >>>>> >> tracker<br>
>> >> >>>>> >> and<br>
>> >> >>>>> >> the driver both doing some of there own variant management.<br>
>> >> >>>>> >> (Which<br>
>> >> >>>>> >> tbh, isn't awesome, it would have been nice if someone<br>
>> >> >>>>> >> realized<br>
>> >> >>>>> >> earlier on that nearly every driver is going to have to do<br>
>> >> >>>>> >> some<br>
>> >> >>>>> >> sort<br>
>> >> >>>>> >> of variant mgmt and figured out a way just to push it all down<br>
>> >> >>>>> >> to<br>
>> >> >>>>> >> the<br>
>> >> >>>>> >> driver.. but I can't see a good way to get there from here.)<br>
>> >> >>>>> >><br>
>> >> >>>>> >> With TGSI as the IR, driver just unconditionally does<br>
>> >> >>>>> >> tgsi_dup_tokens().. because of the whole thing where st does<br>
>> >> >>>>> >> variants<br>
>> >> >>>>> >> in some cases, things are defined that driver doesn't own the<br>
>> >> >>>>> >> copy of<br>
>> >> >>>>> >> the TGSI IR passed in after the fxn call to driver returns.<br>
>> >> >>>>> >><br>
>> >> >>>>> >> With NIR I was hoping to fix this, esp. since<br>
>> >> >>>>> >> nir_shader_clone()<br>
>> >> >>>>> >> is<br>
>> >> >>>>> >> more heavyweight than tgsi_dup_tokens() (memcpy()).<br>
>> >> >>>>> >><br>
>> >> >>>>> >> Refcnt'ing is a nice solution so that we can pass the driver a<br>
>> >> >>>>> >> reference that it owns. In cases where state tracker isn't<br>
>> >> >>>>> >> doing<br>
>> >> >>>>> >> variant mgmt, we pass it the one-and-only ref (avoiding<br>
>> >> >>>>> >> clone).<br>
>> >> >>>>> >><br>
>> >> >>>>> >> I'd suggested that in cases where st does variant mgmt, that<br>
>> >> >>>>> >> st<br>
>> >> >>>>> >> should<br>
>> >> >>>>> >> do the clone/dup. But that was shot down:<br>
>> >> >>>>> >><br>
>> >> >>>>> >><br>
>> >> >>>>> >><br>
>> >> >>>>> >> <a href="http://lists.freedesktop.org/archives/mesa-dev/2015-October/097748.html" rel="noreferrer" target="_blank">http://lists.freedesktop.org/archives/mesa-dev/2015-October/097748.html</a><br>
>> >> >>>><br>
>> >> >>>> It sounds like Marek's argument there is more about lifetime<br>
>> >> >>>> management than<br>
>> >> >>>> anything. He wants gallium modules to be able to create IR, call<br>
>> >> >>>> into the<br>
>> >> >>>> driver, and then throw it away. In particular, he doesn't want<br>
>> >> >>>> them<br>
>> >> >>>> to have<br>
>> >> >>>> to think about cloning. In a lot of ways it sounds a lot like what<br>
>> >> >>>> i965<br>
>> >> >>>> wants too. I really like having brw_compile_foo take a const<br>
>> >> >>>> nir_shader.<br>
>> >> >>>> The difference is that i965 basically always wants to clone<br>
>> >> >>>> whereas a<br>
>> >> >>>> gallium driver may not have to if gallium doesn't care what<br>
>> >> >>>> happens<br>
>> >> >>>> to the<br>
>> >> >>>> shader when it's done. How common is this case? How important is<br>
>> >> >>>> it<br>
>> >> >>>> to<br>
>> >> >>>> optimize for? I don't know.<br>
>> >> >>>><br>
>> >> >>>> One other thing that bothers me a bit: From Marek's comment, it<br>
>> >> >>>> sounds like<br>
>> >> >>>> the components want to just pass in IR and be agnostic about<br>
>> >> >>>> whether<br>
>> >> >>>> the<br>
>> >> >>>> driver wants its own copy or wants to change it or whatever. This<br>
>> >> >>>> seems<br>
>> >> >>>> like an argument for always cloning to me. From the perspective<br>
>> >> >>>> of a<br>
>> >> >>>> gallium module, "I want to hang in to this, I'll keep a reference"<br>
>> >> >>>> seems<br>
>> >> >>>> exactly the same as "I want to hang onto this, I'll give the<br>
>> >> >>>> driver a<br>
>> >> >>>> copy".<br>
>> >> >>>> How are they actually different given that the driver basically<br>
>> >> >>>> has<br>
>> >> >>>> to<br>
>> >> >>>> modify what you give it in order to do lowering?<br>
>> >> >>>><br>
>> >> >>>>> > Ugh... I didn't read this at the time, but I don't like Marek's<br>
>> >> >>>>> > response. My understanding of the situation, based on this<br>
>> >> >>>>> > thread,<br>
>> >> >>>>> > is<br>
>> >> >>>>> > that there are some cases where the st knows that there's only<br>
>> >> >>>>> > going<br>
>> >> >>>>> > to be one variant and can throw away the (NIR or TGSI) shader<br>
>> >> >>>>> > after it<br>
>> >> >>>>> > hands it to the driver, while at other times it has to hold<br>
>> >> >>>>> > onto<br>
>> >> >>>>> > all<br>
>> >> >>>>> > the variants and only give the driver a read-only copy (or<br>
>> >> >>>>> > duplicate<br>
>> >> >>>><br>
>> >> >>>> As per above, my interpretation of Marek's comment is that he<br>
>> >> >>>> doesn't<br>
>> >> >>>> want<br>
>> >> >>>> the st to have to think about cloning ever. He wants it to assume<br>
>> >> >>>> that<br>
>> >> >>>> compilation never modifies the IR so the driver should always<br>
>> >> >>>> clone.<br>
>> >> >>>> You<br>
>> >> >>>> have to keep in mind that Marek is most likely thinking about<br>
>> >> >>>> caching<br>
>> >> >>>> the<br>
>> >> >>>> TGSI rather than doing in-place lowering in it.<br>
>> >> >>>><br>
>> >> >>>> If I'm understanding Marek correctly, then it sounds like shader<br>
>> >> >>>> compilation<br>
>> >> >>>> should never touch the IR that's passed in. If this is the case,<br>
>> >> >>>> it<br>
>> >> >>>> sounds<br>
>> >> >>>> like always cloning is the way to go. At least its not *that*<br>
>> >> >>>> expensive.<br>
>> >> >>><br>
>> >> >>> Note that st/mesa needs to keep the FS IR because of glDrawPixels<br>
>> >> >>> and<br>
>> >> >>> glBitmap, and the VS IR because of edge flags, glRasterPos<br>
>> >> >>> evaluation,<br>
>> >> >>> selection and feedback modes. The last three are done with<br>
>> >> >>> Draw/LLVM<br>
>> >> >>> and only support TGSI.<br>
>> >> >>><br>
>> >> >>> Therefore, st/mesa always hangs onto the IR and drivers can't<br>
>> >> >>> modify<br>
>> >> >>> it. It also needs VS in TGSI to be able to do everything correctly.<br>
>> >> >>><br>
>> >> >>> What other Gallium modules want or not want is not that important,<br>
>> >> >>> but<br>
>> >> >>> changing the current semantics will require fixing a lot of places.<br>
>> >> >>> (state trackers - mesa, nine, xa; modules - blitter, draw, hud,<br>
>> >> >>> postprocess, tests, vl)<br>
>> >> >>><br>
>> >> >>> You really better think about whether changing all those and the<br>
>> >> >>> risk<br>
>> >> >>> of breaking them is worth it.<br>
>> >> >>><br>
>> >> >>> Marek<br>
>> >> >><br>
>> >> >> Well, we're talking about passing NIR here, not TGSI, so none of<br>
>> >> >> those<br>
>> >> >> places will need to be updated. NIR is a much more heavyweight IR,<br>
>> >> >> and<br>
>> >> >> copying is much more expensive, so it makes more sense there for the<br>
>> >> >> st to duplicate it and let the driver own the IR it passes in, to<br>
>> >> >> reduce copying when there's no variant management necessary.<br>
>> >> ><br>
>> >> > My main concern was about TGSI, not so much about NIR.<br>
>> >><br>
>> >> Yes, exactly -- we don't plan on changing the semantics of passing in<br>
>> >> TGSI, so this only matters when the user passes in a NIR shader.<br>
>> ><br>
>> > I think two different concepts of ownership are getting conflated here:<br>
>> > Right/responsibility to delete and right to modify.<br>
>> ><br>
>> > The way I understand it, gallium, as it stands, gives neither to the<br>
>> > driver.<br>
>> > A back-end using NIR requires the right to modify but who deletes it<br>
>> > doesn't<br>
>> > ultimately matter. I think it's dangerous to pass one of these rights<br>
>> > to<br>
>> > the driver and not the other but we need to think about both.<br>
>> ><br>
>> > What I'm trying to say is that we have two options here:<br>
>> ><br>
>> > 1) gallium passes IR to the back-end that it is free to modify and is<br>
>> > required to delete when it's done.<br>
>> ><br>
>> > 2) gallium passes read-only IR to the back-end and it always makes a<br>
>> > copy.<br>
>><br>
>> Not always. The copy is optional. Drivers are encouraged not to make a<br>
>> copy if they don't need it. Or they can keep a copy in a different IR,<br>
>> which is the same scenario for the original IR.<br>
><br>
> Let's suppose that the driver always has to modify the IR before it can be<br>
> used. Does it have to make a copy then? I know that may not be the case for<br>
> TGSI but it is currently the case for NIR.<br>
<br>
</div></div>If the driver wants to modify TGSI, it does have to make a copy<br>
(tgsi_transform_shader automatically makes a copy). Regarding NIR,<br>
define and use whatever makes sense for it.<span class="HOEnZb"><font color="#888888"><br></font></span></blockquote><div><br></div><div>Ok, that's what I thought. I'm just trying to make sure I get it all clear. I'm not 100% sure what the right thing to do for NIR is, but knowing what's required for TGSI helps. Thanks!<br></div><div>--Jason <br></div></div><br></div></div>