[Mesa-dev] [PATCH] i965/fs: In the pre-regalloc schedule, try harder at reducing reg pressure.

Mon Oct 21 21:05:32 CEST 2013

Chia-I Wu <olvaffe at gmail.com> writes:

> On Thu, Oct 17, 2013 at 3:24 AM, Matt Turner <mattst88 at gmail.com> wrote:
>> On Mon, Oct 14, 2013 at 4:14 PM, Eric Anholt <eric at anholt.net> wrote:
>>> Previously, the best thing we had was to schedule the things unblocked by
>>> the current instruction, on the hope that it would be consuming two values
>>> at the end of their live intervals while only producing one new value.
>>> Sometimes that wasn't the case.
>>>
>>> Now, when an instruction is the first user of a GRF we schedule (i.e. it
>>> will probably be the virtual_grf_def[] instruction after computing live
>>> intervals again), penalize it by how many regs it would take up.  When an
>>> instruction is the last user of a GRF we have to schedule (when it will
>>> probably be the virtual_grf_end[] instruction), give it a boost by how
>>> many regs it would free.
>>>
>>> The new functions are made virtual (only 1 of 2 really needs to be
>>> virtual) because I expect we'll soon lift the pre-regalloc scheduling
>>> heuristic over to the vec4 backend.
>>>
>>> shader-db:
>>> total instructions in shared programs: 1512756 -> 1511604 (-0.08%)
>>> instructions in affected programs:     10292 -> 9140 (-11.19%)
>>> GAINED:                                121
>>> LOST:                                  38
>>>
>>> Improves tropics performance at my current settings by 4.50602% +/-
>>> 2.60694% (n=5).  No difference on Lightsmark (n=5).  No difference on
>>> GLB2.7 (n=11).
>>>
>>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=70445
>>> ---
>>
>> I think we're on the right track by considering register pressure when
>> scheduling, but one aspect we're not considering is simply how many
>> registers we think we're using.
>>
>> If I understand correctly, the pre-register allocation wants to
>> shorten live intervals as much as possible which reduces register
>> pressure but at the cost of larger stalls and less instruction level
>> parallelism. We end up scheduling things like
>>
>> produce result 4
>> produce result 3
>> produce result 2
>> produce result 1
>> use result 1
>> use result 2
>> use result 3
>> use result 4
>>
>> (this is why the MRF writes for the FB write are always done in the
>> reverse order)
> In this example, it will actually be
>
>  produce result 4
>  use result 4
>  produce result 3
>  use result 3
>  produce result 2
>  use result 2
>  produce result 1
>  use result 1
>
> and post-regalloc will schedule again to something like
>
>  produce result 4
>  produce result 3
>  produce result 2
>  produce result 1
>  use result 4
>  use result 3
>  use result 2
>  use result 1
>
> The pre-regalloc scheduling attempts to consume the results as soon as
> they are available.
>
> FB write is done in reverse order because, when a result is available,
> its consumers are scheduled in reverse order.  The epilog of fragment
> shaders is usually like this:
>
>  placeholder_halt
>  mov m1, g1
>  mov m2, g2
>  mov m3, g3
>  mov m4, g4
>  send
>
> MOVs depend on placeholder_halt, and send depends on MOVs.  The
> scheduler will schedule it as follows:
>
>  placeholder_halt
>  mov m4, g4
>  mov m3, g3
>  mov m2, g2
>  mov m1, g1
>  send
>
> The order can be corrected with the change proposed here
>
>   http://lists.freedesktop.org/archives/mesa-dev/2013-October/046570.html
>
> But there is no point for making the change the current heuristic for
> pre-regalloc is to be reworked.

Flipping the order in which we prefer ties (on betterthanlifo-2):

commit 11a511576e465f02875f39c452561775a97416a1
Author: Eric Anholt <eric at anholt.net>
Date:   Mon Oct 21 11:45:53 2013 -0700

    otherway

diff --git a/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp b/src/mesa/
index 9a480b4..b123015 100644
--- a/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp
+++ b/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp
@@ -1049,9 +1049,9 @@ fs_instruction_scheduler::choose_instruction_to_schedule()
        * it's the first use of a GRF, reduce its score since it means it
        * should be increasing register pressure.
        */
-      for (schedule_node *node = (schedule_node *)instructions.get_tail();
-           node != instructions.get_head()->prev;
-           node = (schedule_node *)node->prev) {
+      for (schedule_node *node = (schedule_node *)instructions.get_head();
+           node != instructions.get_head()->next;
+           node = (schedule_node *)node->next) {
          schedule_node *n = (schedule_node *)node;
          fs_inst *inst = (fs_inst *)n->inst;

gives:

total instructions in shared programs: 1544638 -> 1546794 (0.14%)
instructions in affected programs:     7163 -> 9319 (30.10%)
GAINED:                                16
LOST:                                  289

with massive spilling on tropics, and a bit on lightsmark and csgo.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 835 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20131021/0a38e7d3/attachment.pgp>