[Mesa-dev] [PATCH] i965/fs: In the pre-regalloc schedule, try harder at reducing reg pressure.
Eric Anholt
eric at anholt.net
Mon Oct 21 21:05:32 CEST 2013
Chia-I Wu <olvaffe at gmail.com> writes:
> On Thu, Oct 17, 2013 at 3:24 AM, Matt Turner <mattst88 at gmail.com> wrote:
>> On Mon, Oct 14, 2013 at 4:14 PM, Eric Anholt <eric at anholt.net> wrote:
>>> Previously, the best thing we had was to schedule the things unblocked by
>>> the current instruction, on the hope that it would be consuming two values
>>> at the end of their live intervals while only producing one new value.
>>> Sometimes that wasn't the case.
>>>
>>> Now, when an instruction is the first user of a GRF we schedule (i.e. it
>>> will probably be the virtual_grf_def[] instruction after computing live
>>> intervals again), penalize it by how many regs it would take up. When an
>>> instruction is the last user of a GRF we have to schedule (when it will
>>> probably be the virtual_grf_end[] instruction), give it a boost by how
>>> many regs it would free.
>>>
>>> The new functions are made virtual (only 1 of 2 really needs to be
>>> virtual) because I expect we'll soon lift the pre-regalloc scheduling
>>> heuristic over to the vec4 backend.
>>>
>>> shader-db:
>>> total instructions in shared programs: 1512756 -> 1511604 (-0.08%)
>>> instructions in affected programs: 10292 -> 9140 (-11.19%)
>>> GAINED: 121
>>> LOST: 38
>>>
>>> Improves tropics performance at my current settings by 4.50602% +/-
>>> 2.60694% (n=5). No difference on Lightsmark (n=5). No difference on
>>> GLB2.7 (n=11).
>>>
>>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=70445
>>> ---
>>
>> I think we're on the right track by considering register pressure when
>> scheduling, but one aspect we're not considering is simply how many
>> registers we think we're using.
>>
>> If I understand correctly, the pre-register allocation wants to
>> shorten live intervals as much as possible which reduces register
>> pressure but at the cost of larger stalls and less instruction level
>> parallelism. We end up scheduling things like
>>
>> produce result 4
>> produce result 3
>> produce result 2
>> produce result 1
>> use result 1
>> use result 2
>> use result 3
>> use result 4
>>
>> (this is why the MRF writes for the FB write are always done in the
>> reverse order)
> In this example, it will actually be
>
> produce result 4
> use result 4
> produce result 3
> use result 3
> produce result 2
> use result 2
> produce result 1
> use result 1
>
> and post-regalloc will schedule again to something like
>
> produce result 4
> produce result 3
> produce result 2
> produce result 1
> use result 4
> use result 3
> use result 2
> use result 1
>
> The pre-regalloc scheduling attempts to consume the results as soon as
> they are available.
>
> FB write is done in reverse order because, when a result is available,
> its consumers are scheduled in reverse order. The epilog of fragment
> shaders is usually like this:
>
> placeholder_halt
> mov m1, g1
> mov m2, g2
> mov m3, g3
> mov m4, g4
> send
>
> MOVs depend on placeholder_halt, and send depends on MOVs. The
> scheduler will schedule it as follows:
>
> placeholder_halt
> mov m4, g4
> mov m3, g3
> mov m2, g2
> mov m1, g1
> send
>
> The order can be corrected with the change proposed here
>
> http://lists.freedesktop.org/archives/mesa-dev/2013-October/046570.html
>
> But there is no point for making the change the current heuristic for
> pre-regalloc is to be reworked.
Flipping the order in which we prefer ties (on betterthanlifo-2):
commit 11a511576e465f02875f39c452561775a97416a1
Author: Eric Anholt <eric at anholt.net>
Date: Mon Oct 21 11:45:53 2013 -0700
otherway
diff --git a/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp b/src/mesa/
index 9a480b4..b123015 100644
--- a/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp
+++ b/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp
@@ -1049,9 +1049,9 @@ fs_instruction_scheduler::choose_instruction_to_schedule()
* it's the first use of a GRF, reduce its score since it means it
* should be increasing register pressure.
*/
- for (schedule_node *node = (schedule_node *)instructions.get_tail();
- node != instructions.get_head()->prev;
- node = (schedule_node *)node->prev) {
+ for (schedule_node *node = (schedule_node *)instructions.get_head();
+ node != instructions.get_head()->next;
+ node = (schedule_node *)node->next) {
schedule_node *n = (schedule_node *)node;
fs_inst *inst = (fs_inst *)n->inst;
gives:
total instructions in shared programs: 1544638 -> 1546794 (0.14%)
instructions in affected programs: 7163 -> 9319 (30.10%)
GAINED: 16
LOST: 289
with massive spilling on tropics, and a bit on lightsmark and csgo.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 835 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20131021/0a38e7d3/attachment.pgp>
More information about the mesa-dev
mailing list