[Mesa-dev] [PATCH] radv: drop wrong initialization of COMPUTE_RESOURCE_LIMITS

Tue Aug 21 18:26:45 UTC 2018

On 2018-08-21 01:54 PM, Marek Olšák wrote:
> Note that WAVES_PER_SH should be 0x3ff on the compute ring for the
> ring priorities to be applied.

Correct, we would need to set WAVES_PER_SH during pipeline creation.

> I don't know if you need to do the same
> thing for the gfx ring. You can ask Andres for more info.

The gfx side seems to be okay, I only borked the compute side it seems.

This was probably working during tests since we have two mechanism to 
control scheduling priority at the hardware level. One is based on 
CP_HQD_*_PRIORITY and SPI_ARB_PRIORITY.PIPE_ORDER*.

This approach has the advantage of not requiring driver opt-in, and it 
dynamically adjusts the resource split between different queues. 
However, it does not offer low latency when high priority compute 
initially pre-empts a gfx worload.

The second approach uses SPI_SHADER_PGM_RSRC3_* and WAVES_PER_SH to 
indicate driver opt-in. Then when the kernel scheduler detects that we 
have some high priority work, it will create a static resource split 
between all clients using SPI_WCL_PIPE_PERCENT_*. Drivers that did not 
opt in will not be force to abide to the resource split by the hardware.

This approach has the disadvantage of possibly leaving some resources 
idle since we have a static split. However, it does provide good latency 
for a high priority compute job pre-empting a heavy graphics workload.

So even though we didn't opt-in for compute resource splits, we still 
had the first approach to fall back on.

This sort of raises the question, should compute drivers really opt-in 
to static resource splits? Since we have the first approach that does a 
better job at managing compute/compute scheduling priorities, do we want 
to bring in the disadvantages of approach #2?

I haven't run into any good real world examples where this has come into 
play, since game usage of multiple compute queues has been rare. But 
maybe it is something worth keeping in mind to revisit in the future.

Regards,
Andres

> 
> Marek
> On Tue, Aug 14, 2018 at 12:10 PM Samuel Pitoiset
> <samuel.pitoiset at gmail.com> wrote:
>>
>> The last parameter of radeon_set_sh_reg_seq() is the number of
>> dwords to emit. We were lucky because WAVES_PER_SH(0x3) is 3 but
>> it was initialized to 0.
>>
>> COMPUTE_RESOURCE_LIMITS is correctly set when generating
>> compute pipelines, so we don't need to initialize it.
>>
>> Signed-off-by: Samuel Pitoiset <samuel.pitoiset at gmail.com>
>> ---
>>   src/amd/vulkan/si_cmd_buffer.c | 4 +---
>>   1 file changed, 1 insertion(+), 3 deletions(-)
>>
>> diff --git a/src/amd/vulkan/si_cmd_buffer.c b/src/amd/vulkan/si_cmd_buffer.c
>> index 2337036c67..2cfa7f4c2c 100644
>> --- a/src/amd/vulkan/si_cmd_buffer.c
>> +++ b/src/amd/vulkan/si_cmd_buffer.c
>> @@ -88,9 +88,7 @@ si_emit_compute(struct radv_physical_device *physical_device,
>>          radeon_emit(cs, 0);
>>          radeon_emit(cs, 0);
>>
>> -       radeon_set_sh_reg_seq(cs, R_00B854_COMPUTE_RESOURCE_LIMITS,
>> -                             S_00B854_WAVES_PER_SH(0x3));
>> -       radeon_emit(cs, 0);
>> +       radeon_set_sh_reg_seq(cs, R_00B858_COMPUTE_STATIC_THREAD_MGMT_SE0, 2);
>>          /* R_00B858_COMPUTE_STATIC_THREAD_MGMT_SE0 / SE1 */
>>          radeon_emit(cs, S_00B858_SH0_CU_EN(0xffff) | S_00B858_SH1_CU_EN(0xffff));
>>          radeon_emit(cs, S_00B85C_SH0_CU_EN(0xffff) | S_00B85C_SH1_CU_EN(0xffff));
>> --
>> 2.18.0
>>
>> _______________________________________________
>> mesa-dev mailing list
>> mesa-dev at lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev