On Sep 20, 2014 2:52 PM, "Connor Abbott" <<a href="mailto:cwabbott0@gmail.com">cwabbott0@gmail.com</a>> wrote: > > On Sat, Sep 20, 2014 at 4:56 PM, Jason Ekstrand <<a href="mailto:jason@jlekstrand.net">jason@jlekstrand.net</a>> wrote: > > > > > > On Sat, Sep 20, 2014 at 12:44 PM, Connor Abbott <<a href="mailto:cwabbott0@gmail.com">cwabbott0@gmail.com</a>> wrote: > >> > >> On Sat, Sep 20, 2014 at 1:23 PM, Jason Ekstrand <<a href="mailto:jason@jlekstrand.net">jason@jlekstrand.net</a>> > >> wrote: > >> > --- > >> > src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp | 24 > >> > ++++++++++++++++++++++- > >> > src/mesa/drivers/dri/i965/intel_screen.h | 5 +++++ > >> > 2 files changed, 28 insertions(+), 1 deletion(-) > >> > > >> > diff --git a/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp > >> > b/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp > >> > index b42f1e5..41e8855 100644 > >> > --- a/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp > >> > +++ b/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp > >> > @@ -116,8 +116,10 @@ brw_alloc_reg_set(struct intel_screen *screen, int > >> > reg_width) > >> > /* Compute the total number of registers across all classes. */ > >> > int ra_reg_count = 0; > >> > for (int i = 0; i < class_count; i++) { > >> > + screen->wm_reg_sets[index].class_first_reg[i] = ra_reg_count; > >> > ra_reg_count += base_reg_count - (class_sizes[i] - 1); > >> > } > >> > + screen->wm_reg_sets[index].class_first_reg[class_count] = > >> > ra_reg_count; > >> > > >> > uint8_t *ra_reg_to_grf = ralloc_array(screen, uint8_t, > >> > ra_reg_count); > >> > struct ra_regs *regs = ra_alloc_reg_set(screen, ra_reg_count); > >> > @@ -469,9 +471,29 @@ fs_visitor::assign_regs(bool allow_spilling) > >> > } > >> > > >> > setup_payload_interference(g, payload_node_count, > >> > first_payload_node); > >> > - if (brw->gen >= 7) > >> > + if (brw->gen >= 7) { > >> > setup_mrf_hack_interference(g, first_mrf_hack_node); > >> > > >> > + foreach_in_list(fs_inst, inst, &instructions) { > >> > + /* When we do send-from-GRF for FB writes, we need to ensure > >> > that > >> > + * the last write instruction sends from a high register. > >> > This is > >> > + * because the vertex fetcher wants to start filling the low > >> > + * payload registers while the pixel data port is still > >> > working on > >> > + * writing out the memory. If we don't do this, we get > >> > rendering > >> > + * artifacts. > >> > + * > >> > + * We could just do "something high". Instead, we just pick > >> > the > >> > + * highest register that works. > >> > + */ > >> > + if (inst->opcode == FS_OPCODE_FB_WRITE && inst->eot) { > >> > + int size = virtual_grf_sizes[inst->src[0].reg]; > >> > + int reg = screen->wm_reg_sets[rsi].class_first_reg[size] - > >> > 1; > >> > + ra_set_node_reg(g, inst->src[0].reg, reg); > >> > + break; > >> > + } > >> > + } > >> > + } > >> > + > >> > if (dispatch_width > 8) { > >> > /* In 16-wide dispatch we have an issue where a compressed > >> > * instruction is actually two instructions executed > >> > simultaneiously. > >> > diff --git a/src/mesa/drivers/dri/i965/intel_screen.h > >> > b/src/mesa/drivers/dri/i965/intel_screen.h > >> > index 945f6f5..8bc6abd 100644 > >> > --- a/src/mesa/drivers/dri/i965/intel_screen.h > >> > +++ b/src/mesa/drivers/dri/i965/intel_screen.h > >> > @@ -90,6 +90,11 @@ struct intel_screen > >> > int classes[16]; > >> > > >> > /** > >> > + * Array of the first ra reg in eahch ra class. > >> > >> Typo. > > > > > > Thanks > > > >> > >> > >> > + */ > >> > + int class_first_reg[17]; > >> > + > >> > + /** > >> > * Mapping for register-allocated objects in *regs to the first > >> > * GRF for that object. > >> > */ > >> > -- > >> > 2.1.0 > >> > > >> > _______________________________________________ > >> > mesa-dev mailing list > >> > <a href="mailto:mesa-dev@lists.freedesktop.org">mesa-dev@lists.freedesktop.org</a> > >> > <a href="http://lists.freedesktop.org/mailman/listinfo/mesa-dev">http://lists.freedesktop.org/mailman/listinfo/mesa-dev</a> > >> > >> I think it would be less confusing if instead of introducing a > >> "class_first_reg" array you added a class_last_reg array and then did > >> class_last_reg[size - 1] - iiuc, you're relying on the fact that the > >> first reg of one class minus one is the last reg of the previous > >> class, which isn't obvious and it takes a while to realize what's > >> going on. > > > > > > That's fair. I can change that easily enough. It doesn't really matter > > that much. I just kind of liked the fact that last_reg[n] - last_reg[n-1] > > == class_size. But it doesn't matter and what you suggested would be far > > more obvious. > > > >> > >> BTW, just so you know, I don't think it's generally a good idea to set > >> more than one node to be pre-allocated to the same register, since > >> things will blow up if those nodes interfere. It's probably OK here > >> since I can't think of a way that an input register preallocated to > >> the highest register would interfere with the input to the fb write, > > > > > > Yes, I'm aware of the problem. However, the only things we preallocate are > > payload registers which only ever go at the beginning and the final FB write > > which goes at the end. As long as we don't start preallocating more things, > > I think we'll be fine. Also, isn't there some cap on how many payload > > registers we can have? If so then that already solves our problem. > > Yeah, that's what I meant. I'm not sure if the varying payload > registers can fill up the entire register space (and hence one gets > allocated to the same register as the fb write load_payload > destination), but I guess it doesn't matter too much. > > > > >> > >> but in the future we'll probably want to think about how to handle > >> this in a more sophisticated way. This is sort of on a tangent, but > >> with SSA register allocation we should probably allocate load_payload > >> destinations before anything else and then treat load_payload as a > >> series of writes to preallocated registers, since spilling the result > >> of a load_payload seems like a silly thing to do and then we don't > >> have to deal with allocating a contiguous set of registers which > > > > > > If only that were true... The problem is that we will *always* have the > > problem of contiguous registers. Once we add fp64 to the mix, we will have > > registers of size 1 and 2 in SIMD8 and registers of size 1 (Unsigned single > > words), 2 (dwords, floats, etc) and 4 (doubles) in SIMD16. Sure, we can > > probably preallocate payload registers, but we'll never get away from > > handling multiple sizes. > > True, there's still that problem. And don't forget that we want to be > able to allocate scalar/uniform registers, so now you have registers > of size 1, 8, and 16... Yeah. I think right now we always do those with pulls so its not a problem yet. We may want to do that in the future though. > I think Periera's puzzle-solving allocator > (<a href="http://compilers.cs.ucla.edu/fernando/publications/papers/PhdDiss.pdf">http://compilers.cs.ucla.edu/fernando/publications/papers/PhdDiss.pdf</a>) > handles the case where the register classes are aligned, so if we make > SIMD16 registers aligned like we currently do it might work. It's > definitely a difficult problem though, and it'll take a lot of work to > solve. Thanks for the link. I'll give it a look. --Jason > > > > >> > >> breaks some nice properties of SSA register allocation. Then, we could > >> have a simple heuristic to allocate the source of an fb write as high > >> as possible, or just allocate all load_payload instructions as high as > >> possible. > > > > > > I've actually been thinking about the register allocation problem quite a > > bit lately. One of the things I've got on my todo list is to do some > > research and figure out if we can be doing it better. There may be a better > > way to handle the multiple sizes problem than we're doing currently. > > Unfortunately, I can't really say much now because I haven't dug into it > > very far yet.