<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Sep 23, 2014 at 11:43 AM, Matt Turner <span dir="ltr"><<a href="mailto:mattst88@gmail.com" target="_blank">mattst88@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">On Sat, Sep 20, 2014 at 10:22 AM, Jason Ekstrand <<a href="mailto:jason@jlekstrand.net">jason@jlekstrand.net</a>> wrote:<br> </span><div><div class="h5">> This series does a bunch of refactoring of the i965 fs backend IR to add<br> > concepts of register width and instruction execution size. There's more to<br> > be done yet, but this gets us most of the way there. It also removes the<br> > assumption that scalar values are always 1 register in SIMD8 and 2<br> > registers in SIMD16. In particular, we get the following:<br> ><br> > 1) No more assumption about everything being 1 register. This allows us<br> > to allocate odd numbers of registers in SIMD16 which is needed for some<br> > payloads. Also, it should make implementing fp64 much easier because<br> > we can now sanely registers of size 2 in SIMD8 and size 4 in SIMD16.<br> > There's a little more work to be don there, but this should take care<br> > of a lot of it.<br> ><br> > 2) We can now do other instruction widths with relative ease. The<br> > compiler now detects, based on register widths, the execution size of<br> > the instruction and passes it down to the generator. One example of<br> > this is the patches in this series for UNTYPED_ATOMIC and<br> > UNTYPED_SURFACE_READ where part of setting up the payload is to do an<br> > 8-wide move to fill a register with 0 and then a 1-wide move to set one<br> > particular component. We can now simply do this at the fs level and it<br> > will be get translated down to the correct assembly and properly<br> > handled by the compiler optimizations. There is more work to be done<br> > here at the generator level, but this series is already long enough<br> ><br> > 3) Thanks to the above mentioned things, we can easily do send from GRF<br> > for FB writes. One of the major blockers here before was that the<br> > beginning of the FB write message was anywhere between 0 and 4<br> > registers regardless of whether you are in SIMD8 or SIMD16. Due to the<br> > implicit register doubling in SIMD16, it would have been a real pain to<br> > implement this properly. Now, it's trivial.<br> ><br> > I could go on about other changes, but those are the major ones.<br> <br> </div></div>This all sounds great, Jason. I'm really happy with how you split the<br> patches out. It made reviewing this amazingly easy.<br> <br> I'm made a relatively quick review, and it looks good to me. (I'm<br> operating under the assumption that what you said below about piglit<br> passing is indicative of there not being bugs :) </blockquote><div><br></div><div>Heh... If only that were true... I seem to have killed ILK and I'm not sure why. I'm going to look into that today and see if I can get it patched up.<br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">I've sent a bunch of<br> smallish comments. Nothing major. I guess patch 41 is in flux though<br> based on Connor's review.<br></blockquote><div><br></div><div>I think Ken's comment demonstrates that it's not an issue. I'm convinced at any rate. Not sure If I can count on Connor getting back to me right away either. :-)<br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> I suppose the next steps are to sort out the preliminary 12 patch<br> series (which I think you said you were going to revise?). I think the<br> ball's in your court there.<br></blockquote><div><br></div><div>Yeah, I need to come up with something better for compact_virtual_grfs and I think there was another thing or two to clean up. Also, the split_virtual_grfs patch still needs a good review. You looked at it and gave me a non-comment.<br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> <br> So, you can apply my R-b to the first 40 patches with comments<br> addressed. When this series goes in, I think we should preserve the<br> commit info of the squash patches (i.e., actually squash instead of<br> fixup), like in commit 79d77b38.<br> <span class=""><br> > The requisite Shader DB results:<br> ><br> > total instructions in shared programs: 4999994 -> 4971746 (-0.56%)<br> > instructions in affected programs: 959392 -> 931144 (-2.94%)<br> > GAINED: 138<br> > LOST: 71<br> <br> </span>Any ideas about the lost programs?<br> </blockquote></div><br></div><div class="gmail_extra">I think some (may be 12 or so) are because if an optimization pass that's failing (there were a few programs that grew by 1 or 2 instructions). The other 60 or so seem to be because freedom in register allocation isn't always a good thing. Prior to doing send-from-GRF for FB writes and this last rebase, I had only 4 programs in all of shader-db that had any difference in the number of instructions, so I think I was generating basically identical programs except for register allocation. However, I gained some programs and lost about 65. I think this is a case of those programs being right on the edge of what our allocator could do and tweaking things pushed them over the edge. :-(<br><br></div><div class="gmail_extra">--Jason<br></div></div>