[Mesa-dev] [PATCH] i965/fs: In the pre-regalloc schedule, try harder at reducing reg pressure.
Chia-I Wu
olvaffe at gmail.com
Tue Oct 22 02:31:47 CEST 2013
On Tue, Oct 22, 2013 at 3:05 AM, Eric Anholt <eric at anholt.net> wrote:
> Chia-I Wu <olvaffe at gmail.com> writes:
>
>> On Thu, Oct 17, 2013 at 3:24 AM, Matt Turner <mattst88 at gmail.com> wrote:
>>> On Mon, Oct 14, 2013 at 4:14 PM, Eric Anholt <eric at anholt.net> wrote:
>>>> Previously, the best thing we had was to schedule the things unblocked by
>>>> the current instruction, on the hope that it would be consuming two values
>>>> at the end of their live intervals while only producing one new value.
>>>> Sometimes that wasn't the case.
>>>>
>>>> Now, when an instruction is the first user of a GRF we schedule (i.e. it
>>>> will probably be the virtual_grf_def[] instruction after computing live
>>>> intervals again), penalize it by how many regs it would take up. When an
>>>> instruction is the last user of a GRF we have to schedule (when it will
>>>> probably be the virtual_grf_end[] instruction), give it a boost by how
>>>> many regs it would free.
>>>>
>>>> The new functions are made virtual (only 1 of 2 really needs to be
>>>> virtual) because I expect we'll soon lift the pre-regalloc scheduling
>>>> heuristic over to the vec4 backend.
>>>>
>>>> shader-db:
>>>> total instructions in shared programs: 1512756 -> 1511604 (-0.08%)
>>>> instructions in affected programs: 10292 -> 9140 (-11.19%)
>>>> GAINED: 121
>>>> LOST: 38
>>>>
>>>> Improves tropics performance at my current settings by 4.50602% +/-
>>>> 2.60694% (n=5). No difference on Lightsmark (n=5). No difference on
>>>> GLB2.7 (n=11).
>>>>
>>>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=70445
>>>> ---
>>>
>>> I think we're on the right track by considering register pressure when
>>> scheduling, but one aspect we're not considering is simply how many
>>> registers we think we're using.
>>>
>>> If I understand correctly, the pre-register allocation wants to
>>> shorten live intervals as much as possible which reduces register
>>> pressure but at the cost of larger stalls and less instruction level
>>> parallelism. We end up scheduling things like
>>>
>>> produce result 4
>>> produce result 3
>>> produce result 2
>>> produce result 1
>>> use result 1
>>> use result 2
>>> use result 3
>>> use result 4
>>>
>>> (this is why the MRF writes for the FB write are always done in the
>>> reverse order)
>> In this example, it will actually be
>>
>> produce result 4
>> use result 4
>> produce result 3
>> use result 3
>> produce result 2
>> use result 2
>> produce result 1
>> use result 1
>>
>> and post-regalloc will schedule again to something like
>>
>> produce result 4
>> produce result 3
>> produce result 2
>> produce result 1
>> use result 4
>> use result 3
>> use result 2
>> use result 1
>>
>> The pre-regalloc scheduling attempts to consume the results as soon as
>> they are available.
>>
>> FB write is done in reverse order because, when a result is available,
>> its consumers are scheduled in reverse order. The epilog of fragment
>> shaders is usually like this:
>>
>> placeholder_halt
>> mov m1, g1
>> mov m2, g2
>> mov m3, g3
>> mov m4, g4
>> send
>>
>> MOVs depend on placeholder_halt, and send depends on MOVs. The
>> scheduler will schedule it as follows:
>>
>> placeholder_halt
>> mov m4, g4
>> mov m3, g3
>> mov m2, g2
>> mov m1, g1
>> send
>>
>> The order can be corrected with the change proposed here
>>
>> http://lists.freedesktop.org/archives/mesa-dev/2013-October/046570.html
>>
>> But there is no point for making the change the current heuristic for
>> pre-regalloc is to be reworked.
>
> Flipping the order in which we prefer ties (on betterthanlifo-2):
>
> commit 11a511576e465f02875f39c452561775a97416a1
> Author: Eric Anholt <eric at anholt.net>
> Date: Mon Oct 21 11:45:53 2013 -0700
>
> otherway
>
> diff --git a/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp b/src/mesa/
> index 9a480b4..b123015 100644
> --- a/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp
> @@ -1049,9 +1049,9 @@ fs_instruction_scheduler::choose_instruction_to_schedule()
> * it's the first use of a GRF, reduce its score since it means it
> * should be increasing register pressure.
> */
> - for (schedule_node *node = (schedule_node *)instructions.get_tail();
> - node != instructions.get_head()->prev;
> - node = (schedule_node *)node->prev) {
> + for (schedule_node *node = (schedule_node *)instructions.get_head();
> + node != instructions.get_head()->next;
> + node = (schedule_node *)node->next) {
> schedule_node *n = (schedule_node *)node;
> fs_inst *inst = (fs_inst *)n->inst;
>
> gives:
>
> total instructions in shared programs: 1544638 -> 1546794 (0.14%)
> instructions in affected programs: 7163 -> 9319 (30.10%)
> GAINED: 16
> LOST: 289
>
> with massive spilling on tropics, and a bit on lightsmark and csgo.
Children of a schedule_node also need to be pushed to the head in reverse order
for (int i = chosen->child_count - 1; i >= 0; i--) {
...;
if (child->parent_count == 0)
instructions.push_head(child);
}
so that when you loop from head, you still get LIFO.
--
olv at LunarG.com
More information about the mesa-dev
mailing list