<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Sep 23, 2014 at 11:43 AM, Matt Turner <span dir="ltr"><<a href="mailto:mattst88@gmail.com" target="_blank">mattst88@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">On Sat, Sep 20, 2014 at 10:22 AM, Jason Ekstrand <<a href="mailto:jason@jlekstrand.net">jason@jlekstrand.net</a>> wrote:<br>
</span><div><div class="h5">> This series does a bunch of refactoring of the i965 fs backend IR to add<br>
> concepts of register width and instruction execution size. There's more to<br>
> be done yet, but this gets us most of the way there. It also removes the<br>
> assumption that scalar values are always 1 register in SIMD8 and 2<br>
> registers in SIMD16. In particular, we get the following:<br>
><br>
> 1) No more assumption about everything being 1 register. This allows us<br>
> to allocate odd numbers of registers in SIMD16 which is needed for some<br>
> payloads. Also, it should make implementing fp64 much easier because<br>
> we can now sanely registers of size 2 in SIMD8 and size 4 in SIMD16.<br>
> There's a little more work to be don there, but this should take care<br>
> of a lot of it.<br>
><br>
> 2) We can now do other instruction widths with relative ease. The<br>
> compiler now detects, based on register widths, the execution size of<br>
> the instruction and passes it down to the generator. One example of<br>
> this is the patches in this series for UNTYPED_ATOMIC and<br>
> UNTYPED_SURFACE_READ where part of setting up the payload is to do an<br>
> 8-wide move to fill a register with 0 and then a 1-wide move to set one<br>
> particular component. We can now simply do this at the fs level and it<br>
> will be get translated down to the correct assembly and properly<br>
> handled by the compiler optimizations. There is more work to be done<br>
> here at the generator level, but this series is already long enough<br>
><br>
> 3) Thanks to the above mentioned things, we can easily do send from GRF<br>
> for FB writes. One of the major blockers here before was that the<br>
> beginning of the FB write message was anywhere between 0 and 4<br>
> registers regardless of whether you are in SIMD8 or SIMD16. Due to the<br>
> implicit register doubling in SIMD16, it would have been a real pain to<br>
> implement this properly. Now, it's trivial.<br>
><br>
> I could go on about other changes, but those are the major ones.<br>
<br>
</div></div>This all sounds great, Jason. I'm really happy with how you split the<br>
patches out. It made reviewing this amazingly easy.<br>
<br>
I'm made a relatively quick review, and it looks good to me. (I'm<br>
operating under the assumption that what you said below about piglit<br>
passing is indicative of there not being bugs :) </blockquote><div><br></div><div>Heh... If only that were true... I seem to have killed ILK and I'm not sure why. I'm going to look into that today and see if I can get it patched up.<br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">I've sent a bunch of<br>
smallish comments. Nothing major. I guess patch 41 is in flux though<br>
based on Connor's review.<br></blockquote><div><br></div><div>I think Ken's comment demonstrates that it's not an issue. I'm convinced at any rate. Not sure If I can count on Connor getting back to me right away either. :-)<br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
I suppose the next steps are to sort out the preliminary 12 patch<br>
series (which I think you said you were going to revise?). I think the<br>
ball's in your court there.<br></blockquote><div><br></div><div>Yeah, I need to come up with something better for compact_virtual_grfs and I think there was another thing or two to clean up. Also, the split_virtual_grfs patch still needs a good review. You looked at it and gave me a non-comment.<br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
So, you can apply my R-b to the first 40 patches with comments<br>
addressed. When this series goes in, I think we should preserve the<br>
commit info of the squash patches (i.e., actually squash instead of<br>
fixup), like in commit 79d77b38.<br>
<span class=""><br>
> The requisite Shader DB results:<br>
><br>
> total instructions in shared programs: 4999994 -> 4971746 (-0.56%)<br>
> instructions in affected programs: 959392 -> 931144 (-2.94%)<br>
> GAINED: 138<br>
> LOST: 71<br>
<br>
</span>Any ideas about the lost programs?<br>
</blockquote></div><br></div><div class="gmail_extra">I think some (may be 12 or so) are because if an optimization pass that's failing (there were a few programs that grew by 1 or 2 instructions). The other 60 or so seem to be because freedom in register allocation isn't always a good thing. Prior to doing send-from-GRF for FB writes and this last rebase, I had only 4 programs in all of shader-db that had any difference in the number of instructions, so I think I was generating basically identical programs except for register allocation. However, I gained some programs and lost about 65. I think this is a case of those programs being right on the edge of what our allocator could do and tweaking things pushed them over the edge. :-(<br><br></div><div class="gmail_extra">--Jason<br></div></div>