[Mesa-dev] [PATCH 2/2] radeonsi: rewrite late alloc VS limit computation
Marek Olšák
maraeo at gmail.com
Mon Aug 28 19:43:38 UTC 2017
On Mon, Aug 28, 2017 at 12:50 PM, Nicolai Hähnle <nhaehnle at gmail.com> wrote:
> On 25.08.2017 02:57, Marek Olšák wrote:
>>
>> From: Marek Olšák <marek.olsak at amd.com>
>>
>> This is still very simple, but it's better than before.
>>
>> Loosely ported from Vulkan.
>> ---
>> src/gallium/drivers/radeonsi/si_state.c | 38
>> ++++++++++++++++++++++-----------
>> 1 file changed, 26 insertions(+), 12 deletions(-)
>>
>> diff --git a/src/gallium/drivers/radeonsi/si_state.c
>> b/src/gallium/drivers/radeonsi/si_state.c
>> index 24e509c..393f960 100644
>> --- a/src/gallium/drivers/radeonsi/si_state.c
>> +++ b/src/gallium/drivers/radeonsi/si_state.c
>> @@ -4649,40 +4649,54 @@ static void si_init_config(struct si_context
>> *sctx)
>> /* If this is 0, Bonaire can hang even if GS isn't
>> being used.
>> * Other chips are unaffected. These are
>> suboptimal values,
>> * but we don't use on-chip GS.
>> */
>> si_pm4_set_reg(pm4, R_028A44_VGT_GS_ONCHIP_CNTL,
>> S_028A44_ES_VERTS_PER_SUBGRP(64) |
>> S_028A44_GS_PRIMS_PER_SUBGRP(4));
>> }
>> si_pm4_set_reg(pm4, R_00B21C_SPI_SHADER_PGM_RSRC3_GS,
>> S_00B21C_CU_EN(0xffff));
>> - if (sscreen->b.info.num_good_compute_units /
>> - (sscreen->b.info.max_se *
>> sscreen->b.info.max_sh_per_se) <= 4) {
>> + /* Compute LATE_ALLOC_VS.LIMIT. */
>> + unsigned num_cu_per_sh =
>> sscreen->b.info.num_good_compute_units /
>> + (sscreen->b.info.max_se *
>> + sscreen->b.info.max_sh_per_se);
>> + unsigned late_alloc_limit; /* The limit is per SH. */
>> +
>> + if (sctx->b.family == CHIP_KABINI) {
>> + late_alloc_limit = 0; /* Potential hang on Kabini.
>> */
>> + } else if (num_cu_per_sh <= 4) {
>> /* Too few available compute units per SH.
>> Disallowing
>> - * VS to run on CU0 could hurt us more than late
>> VS
>> + * VS to run on one CU could hurt us more than
>> late VS
>> * allocation would help.
>> *
>> - * LATE_ALLOC_VS = 2 is the highest safe number.
>> + * 2 is the highest safe number that allows us to
>> keep
>> + * all CUs enabled.
>
>
> Just to clarify: is the logic here that LATE_ALLOC_VS = 2 means up to 3 late
> alloc waves, which means at least one SIMD is always free, even when there's
> only a single CU?
Yes according to the Vulkan driver.
>
>
>> */
>> - si_pm4_set_reg(pm4,
>> R_00B118_SPI_SHADER_PGM_RSRC3_VS, S_00B118_CU_EN(0xffff));
>> - si_pm4_set_reg(pm4,
>> R_00B11C_SPI_SHADER_LATE_ALLOC_VS, S_00B11C_LIMIT(2));
>> + late_alloc_limit = 2;
>> } else {
>> - /* Set LATE_ALLOC_VS == 31. It should be less than
>> - * the number of scratch waves. Limitations:
>> - * - VS can't execute on CU0.
>> - * - If HS writes outputs to LDS, LS can't execute
>> on CU0.
>> + /* This is a good initial value.
>> + * We shouldn't run into a VS-PS deadlock, because
>> it
>> + * only allows 1 late_alloc wave per SIMD on
>> num_cu - 2.
>
>
> Disabling VS on CU0 should already guarantee that there's no VS-PS deadlock.
> So maybe with this change, we can allow VS to run on all CUs? At least the
> comment should reflect that.
>
> On the other hand, Vulkan actually looks at the "always-on CUs", which I
> guess is related to power-gating turning off CUs. So as long as we don't
> look at that -- and I kind of think we shouldn't, because we care about the
> high performance case where all CUs are turned on anyway -- we do need to
> keep CU0 disabled for VS to avoid deadlocks when CUs happen to be
> power-gated.
Well, I can change the comment, but I don't see anything else here
that should be changed.
>
> BTW, I recently realized that "late alloc" simply means that the VS waves
> get their export allocation as soon as possible, i.e. as soon as there's
> free space in the PC / POS export. So even late alloc waves might end up
> never being stalled on exports, depending on the overall latency of the VS
> relative to PS. Not important for the driver, but I found it interesting to
> know.
OK makes sense.
Marek
More information about the mesa-dev
mailing list