<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Sat, Sep 20, 2014 at 12:44 PM, Connor Abbott <span dir="ltr"><<a href="mailto:cwabbott0@gmail.com" target="_blank">cwabbott0@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="HOEnZb"><div class="h5">On Sat, Sep 20, 2014 at 1:23 PM, Jason Ekstrand <<a href="mailto:jason@jlekstrand.net">jason@jlekstrand.net</a>> wrote:<br> > ---<br> > src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp | 24 ++++++++++++++++++++++-<br> > src/mesa/drivers/dri/i965/intel_screen.h | 5 +++++<br> > 2 files changed, 28 insertions(+), 1 deletion(-)<br> ><br> > diff --git a/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp b/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp<br> > index b42f1e5..41e8855 100644<br> > --- a/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp<br> > +++ b/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp<br> > @@ -116,8 +116,10 @@ brw_alloc_reg_set(struct intel_screen *screen, int reg_width)<br> > /* Compute the total number of registers across all classes. */<br> > int ra_reg_count = 0;<br> > for (int i = 0; i < class_count; i++) {<br> > + screen->wm_reg_sets[index].class_first_reg[i] = ra_reg_count;<br> > ra_reg_count += base_reg_count - (class_sizes[i] - 1);<br> > }<br> > + screen->wm_reg_sets[index].class_first_reg[class_count] = ra_reg_count;<br> ><br> > uint8_t *ra_reg_to_grf = ralloc_array(screen, uint8_t, ra_reg_count);<br> > struct ra_regs *regs = ra_alloc_reg_set(screen, ra_reg_count);<br> > @@ -469,9 +471,29 @@ fs_visitor::assign_regs(bool allow_spilling)<br> > }<br> ><br> > setup_payload_interference(g, payload_node_count, first_payload_node);<br> > - if (brw->gen >= 7)<br> > + if (brw->gen >= 7) {<br> > setup_mrf_hack_interference(g, first_mrf_hack_node);<br> ><br> > + foreach_in_list(fs_inst, inst, &instructions) {<br> > + /* When we do send-from-GRF for FB writes, we need to ensure that<br> > + * the last write instruction sends from a high register. This is<br> > + * because the vertex fetcher wants to start filling the low<br> > + * payload registers while the pixel data port is still working on<br> > + * writing out the memory. If we don't do this, we get rendering<br> > + * artifacts.<br> > + *<br> > + * We could just do "something high". Instead, we just pick the<br> > + * highest register that works.<br> > + */<br> > + if (inst->opcode == FS_OPCODE_FB_WRITE && inst->eot) {<br> > + int size = virtual_grf_sizes[inst->src[0].reg];<br> > + int reg = screen->wm_reg_sets[rsi].class_first_reg[size] - 1;<br> > + ra_set_node_reg(g, inst->src[0].reg, reg);<br> > + break;<br> > + }<br> > + }<br> > + }<br> > +<br> > if (dispatch_width > 8) {<br> > /* In 16-wide dispatch we have an issue where a compressed<br> > * instruction is actually two instructions executed simultaneiously.<br> > diff --git a/src/mesa/drivers/dri/i965/intel_screen.h b/src/mesa/drivers/dri/i965/intel_screen.h<br> > index 945f6f5..8bc6abd 100644<br> > --- a/src/mesa/drivers/dri/i965/intel_screen.h<br> > +++ b/src/mesa/drivers/dri/i965/intel_screen.h<br> > @@ -90,6 +90,11 @@ struct intel_screen<br> > int classes[16];<br> ><br> > /**<br> > + * Array of the first ra reg in eahch ra class.<br> <br> </div></div>Typo.<br></blockquote><div><br></div><div>Thanks<br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> <span class=""><br> > + */<br> > + int class_first_reg[17];<br> > +<br> > + /**<br> > * Mapping for register-allocated objects in *regs to the first<br> > * GRF for that object.<br> > */<br> > --<br> > 2.1.0<br> ><br> </span>> _______________________________________________<br> > mesa-dev mailing list<br> > <a href="mailto:mesa-dev@lists.freedesktop.org">mesa-dev@lists.freedesktop.org</a><br> > <a href="http://lists.freedesktop.org/mailman/listinfo/mesa-dev" target="_blank">http://lists.freedesktop.org/mailman/listinfo/mesa-dev</a><br> <br> I think it would be less confusing if instead of introducing a<br> "class_first_reg" array you added a class_last_reg array and then did<br> class_last_reg[size - 1] - iiuc, you're relying on the fact that the<br> first reg of one class minus one is the last reg of the previous<br> class, which isn't obvious and it takes a while to realize what's<br> going on.<br></blockquote><div><br></div><div>That's fair. I can change that easily enough. It doesn't really matter that much. I just kind of liked the fact that last_reg[n] - last_reg[n-1] == class_size. But it doesn't matter and what you suggested would be far more obvious.<br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> BTW, just so you know, I don't think it's generally a good idea to set<br> more than one node to be pre-allocated to the same register, since<br> things will blow up if those nodes interfere. It's probably OK here<br> since I can't think of a way that an input register preallocated to<br> the highest register would interfere with the input to the fb write,<br></blockquote><div><br></div><div>Yes, I'm aware of the problem. However, the only things we preallocate are payload registers which only ever go at the beginning and the final FB write which goes at the end. As long as we don't start preallocating more things, I think we'll be fine. Also, isn't there some cap on how many payload registers we can have? If so then that already solves our problem.<br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> but in the future we'll probably want to think about how to handle<br> this in a more sophisticated way. This is sort of on a tangent, but<br> with SSA register allocation we should probably allocate load_payload<br> destinations before anything else and then treat load_payload as a<br> series of writes to preallocated registers, since spilling the result<br> of a load_payload seems like a silly thing to do and then we don't<br> have to deal with allocating a contiguous set of registers which<br></blockquote><div><br></div><div>If only that were true... The problem is that we will *always* have the problem of contiguous registers. Once we add fp64 to the mix, we will have registers of size 1 and 2 in SIMD8 and registers of size 1 (Unsigned single words), 2 (dwords, floats, etc) and 4 (doubles) in SIMD16. Sure, we can probably preallocate payload registers, but we'll never get away from handling multiple sizes.<br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> breaks some nice properties of SSA register allocation. Then, we could<br> have a simple heuristic to allocate the source of an fb write as high<br> as possible, or just allocate all load_payload instructions as high as<br> possible.<br> </blockquote></div><br></div><div class="gmail_extra">I've actually been thinking about the register allocation problem quite a bit lately. One of the things I've got on my todo list is to do some research and figure out if we can be doing it better. There may be a better way to handle the multiple sizes problem than we're doing currently. Unfortunately, I can't really say much now because I haven't dug into it very far yet.<br></div></div>