[Mesa-dev] [PATCH 41/41] SQUASH: i965/fs: Force a high register for the final FB write
Connor Abbott
cwabbott0 at gmail.com
Sat Sep 20 14:52:32 PDT 2014
On Sat, Sep 20, 2014 at 4:56 PM, Jason Ekstrand <jason at jlekstrand.net> wrote:
>
>
> On Sat, Sep 20, 2014 at 12:44 PM, Connor Abbott <cwabbott0 at gmail.com> wrote:
>>
>> On Sat, Sep 20, 2014 at 1:23 PM, Jason Ekstrand <jason at jlekstrand.net>
>> wrote:
>> > ---
>> > src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp | 24
>> > ++++++++++++++++++++++-
>> > src/mesa/drivers/dri/i965/intel_screen.h | 5 +++++
>> > 2 files changed, 28 insertions(+), 1 deletion(-)
>> >
>> > diff --git a/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp
>> > b/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp
>> > index b42f1e5..41e8855 100644
>> > --- a/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp
>> > +++ b/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp
>> > @@ -116,8 +116,10 @@ brw_alloc_reg_set(struct intel_screen *screen, int
>> > reg_width)
>> > /* Compute the total number of registers across all classes. */
>> > int ra_reg_count = 0;
>> > for (int i = 0; i < class_count; i++) {
>> > + screen->wm_reg_sets[index].class_first_reg[i] = ra_reg_count;
>> > ra_reg_count += base_reg_count - (class_sizes[i] - 1);
>> > }
>> > + screen->wm_reg_sets[index].class_first_reg[class_count] =
>> > ra_reg_count;
>> >
>> > uint8_t *ra_reg_to_grf = ralloc_array(screen, uint8_t,
>> > ra_reg_count);
>> > struct ra_regs *regs = ra_alloc_reg_set(screen, ra_reg_count);
>> > @@ -469,9 +471,29 @@ fs_visitor::assign_regs(bool allow_spilling)
>> > }
>> >
>> > setup_payload_interference(g, payload_node_count,
>> > first_payload_node);
>> > - if (brw->gen >= 7)
>> > + if (brw->gen >= 7) {
>> > setup_mrf_hack_interference(g, first_mrf_hack_node);
>> >
>> > + foreach_in_list(fs_inst, inst, &instructions) {
>> > + /* When we do send-from-GRF for FB writes, we need to ensure
>> > that
>> > + * the last write instruction sends from a high register.
>> > This is
>> > + * because the vertex fetcher wants to start filling the low
>> > + * payload registers while the pixel data port is still
>> > working on
>> > + * writing out the memory. If we don't do this, we get
>> > rendering
>> > + * artifacts.
>> > + *
>> > + * We could just do "something high". Instead, we just pick
>> > the
>> > + * highest register that works.
>> > + */
>> > + if (inst->opcode == FS_OPCODE_FB_WRITE && inst->eot) {
>> > + int size = virtual_grf_sizes[inst->src[0].reg];
>> > + int reg = screen->wm_reg_sets[rsi].class_first_reg[size] -
>> > 1;
>> > + ra_set_node_reg(g, inst->src[0].reg, reg);
>> > + break;
>> > + }
>> > + }
>> > + }
>> > +
>> > if (dispatch_width > 8) {
>> > /* In 16-wide dispatch we have an issue where a compressed
>> > * instruction is actually two instructions executed
>> > simultaneiously.
>> > diff --git a/src/mesa/drivers/dri/i965/intel_screen.h
>> > b/src/mesa/drivers/dri/i965/intel_screen.h
>> > index 945f6f5..8bc6abd 100644
>> > --- a/src/mesa/drivers/dri/i965/intel_screen.h
>> > +++ b/src/mesa/drivers/dri/i965/intel_screen.h
>> > @@ -90,6 +90,11 @@ struct intel_screen
>> > int classes[16];
>> >
>> > /**
>> > + * Array of the first ra reg in eahch ra class.
>>
>> Typo.
>
>
> Thanks
>
>>
>>
>> > + */
>> > + int class_first_reg[17];
>> > +
>> > + /**
>> > * Mapping for register-allocated objects in *regs to the first
>> > * GRF for that object.
>> > */
>> > --
>> > 2.1.0
>> >
>> > _______________________________________________
>> > mesa-dev mailing list
>> > mesa-dev at lists.freedesktop.org
>> > http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>>
>> I think it would be less confusing if instead of introducing a
>> "class_first_reg" array you added a class_last_reg array and then did
>> class_last_reg[size - 1] - iiuc, you're relying on the fact that the
>> first reg of one class minus one is the last reg of the previous
>> class, which isn't obvious and it takes a while to realize what's
>> going on.
>
>
> That's fair. I can change that easily enough. It doesn't really matter
> that much. I just kind of liked the fact that last_reg[n] - last_reg[n-1]
> == class_size. But it doesn't matter and what you suggested would be far
> more obvious.
>
>>
>> BTW, just so you know, I don't think it's generally a good idea to set
>> more than one node to be pre-allocated to the same register, since
>> things will blow up if those nodes interfere. It's probably OK here
>> since I can't think of a way that an input register preallocated to
>> the highest register would interfere with the input to the fb write,
>
>
> Yes, I'm aware of the problem. However, the only things we preallocate are
> payload registers which only ever go at the beginning and the final FB write
> which goes at the end. As long as we don't start preallocating more things,
> I think we'll be fine. Also, isn't there some cap on how many payload
> registers we can have? If so then that already solves our problem.
Yeah, that's what I meant. I'm not sure if the varying payload
registers can fill up the entire register space (and hence one gets
allocated to the same register as the fb write load_payload
destination), but I guess it doesn't matter too much.
>
>>
>> but in the future we'll probably want to think about how to handle
>> this in a more sophisticated way. This is sort of on a tangent, but
>> with SSA register allocation we should probably allocate load_payload
>> destinations before anything else and then treat load_payload as a
>> series of writes to preallocated registers, since spilling the result
>> of a load_payload seems like a silly thing to do and then we don't
>> have to deal with allocating a contiguous set of registers which
>
>
> If only that were true... The problem is that we will *always* have the
> problem of contiguous registers. Once we add fp64 to the mix, we will have
> registers of size 1 and 2 in SIMD8 and registers of size 1 (Unsigned single
> words), 2 (dwords, floats, etc) and 4 (doubles) in SIMD16. Sure, we can
> probably preallocate payload registers, but we'll never get away from
> handling multiple sizes.
True, there's still that problem. And don't forget that we want to be
able to allocate scalar/uniform registers, so now you have registers
of size 1, 8, and 16... I think Periera's puzzle-solving allocator
(http://compilers.cs.ucla.edu/fernando/publications/papers/PhdDiss.pdf)
handles the case where the register classes are aligned, so if we make
SIMD16 registers aligned like we currently do it might work. It's
definitely a difficult problem though, and it'll take a lot of work to
solve.
>
>>
>> breaks some nice properties of SSA register allocation. Then, we could
>> have a simple heuristic to allocate the source of an fb write as high
>> as possible, or just allocate all load_payload instructions as high as
>> possible.
>
>
> I've actually been thinking about the register allocation problem quite a
> bit lately. One of the things I've got on my todo list is to do some
> research and figure out if we can be doing it better. There may be a better
> way to handle the multiple sizes problem than we're doing currently.
> Unfortunately, I can't really say much now because I haven't dug into it
> very far yet.
More information about the mesa-dev
mailing list