[Mesa-dev] [PATCH 2/2] radeonsi: rewrite late alloc VS limit computation
Nicolai Hähnle
nhaehnle at gmail.com
Mon Aug 28 20:25:42 UTC 2017
On 28.08.2017 21:43, Marek Olšák wrote:
> On Mon, Aug 28, 2017 at 12:50 PM, Nicolai Hähnle <nhaehnle at gmail.com> wrote:
>> On 25.08.2017 02:57, Marek Olšák wrote:
>>>
>>> From: Marek Olšák <marek.olsak at amd.com>
>>>
>>> This is still very simple, but it's better than before.
>>>
>>> Loosely ported from Vulkan.
>>> ---
>>> src/gallium/drivers/radeonsi/si_state.c | 38
>>> ++++++++++++++++++++++-----------
>>> 1 file changed, 26 insertions(+), 12 deletions(-)
>>>
>>> diff --git a/src/gallium/drivers/radeonsi/si_state.c
>>> b/src/gallium/drivers/radeonsi/si_state.c
>>> index 24e509c..393f960 100644
>>> --- a/src/gallium/drivers/radeonsi/si_state.c
>>> +++ b/src/gallium/drivers/radeonsi/si_state.c
>>> @@ -4649,40 +4649,54 @@ static void si_init_config(struct si_context
>>> *sctx)
>>> /* If this is 0, Bonaire can hang even if GS isn't
>>> being used.
>>> * Other chips are unaffected. These are
>>> suboptimal values,
>>> * but we don't use on-chip GS.
>>> */
>>> si_pm4_set_reg(pm4, R_028A44_VGT_GS_ONCHIP_CNTL,
>>> S_028A44_ES_VERTS_PER_SUBGRP(64) |
>>> S_028A44_GS_PRIMS_PER_SUBGRP(4));
>>> }
>>> si_pm4_set_reg(pm4, R_00B21C_SPI_SHADER_PGM_RSRC3_GS,
>>> S_00B21C_CU_EN(0xffff));
>>> - if (sscreen->b.info.num_good_compute_units /
>>> - (sscreen->b.info.max_se *
>>> sscreen->b.info.max_sh_per_se) <= 4) {
>>> + /* Compute LATE_ALLOC_VS.LIMIT. */
>>> + unsigned num_cu_per_sh =
>>> sscreen->b.info.num_good_compute_units /
>>> + (sscreen->b.info.max_se *
>>> + sscreen->b.info.max_sh_per_se);
>>> + unsigned late_alloc_limit; /* The limit is per SH. */
>>> +
>>> + if (sctx->b.family == CHIP_KABINI) {
>>> + late_alloc_limit = 0; /* Potential hang on Kabini.
>>> */
>>> + } else if (num_cu_per_sh <= 4) {
>>> /* Too few available compute units per SH.
>>> Disallowing
>>> - * VS to run on CU0 could hurt us more than late
>>> VS
>>> + * VS to run on one CU could hurt us more than
>>> late VS
>>> * allocation would help.
>>> *
>>> - * LATE_ALLOC_VS = 2 is the highest safe number.
>>> + * 2 is the highest safe number that allows us to
>>> keep
>>> + * all CUs enabled.
>>
>>
>> Just to clarify: is the logic here that LATE_ALLOC_VS = 2 means up to 3 late
>> alloc waves, which means at least one SIMD is always free, even when there's
>> only a single CU?
>
> Yes according to the Vulkan driver.
>
>>
>>
>>> */
>>> - si_pm4_set_reg(pm4,
>>> R_00B118_SPI_SHADER_PGM_RSRC3_VS, S_00B118_CU_EN(0xffff));
>>> - si_pm4_set_reg(pm4,
>>> R_00B11C_SPI_SHADER_LATE_ALLOC_VS, S_00B11C_LIMIT(2));
>>> + late_alloc_limit = 2;
>>> } else {
>>> - /* Set LATE_ALLOC_VS == 31. It should be less than
>>> - * the number of scratch waves. Limitations:
>>> - * - VS can't execute on CU0.
>>> - * - If HS writes outputs to LDS, LS can't execute
>>> on CU0.
>>> + /* This is a good initial value.
>>> + * We shouldn't run into a VS-PS deadlock, because
>>> it
>>> + * only allows 1 late_alloc wave per SIMD on
>>> num_cu - 2.
>>
>>
>> Disabling VS on CU0 should already guarantee that there's no VS-PS deadlock.
>> So maybe with this change, we can allow VS to run on all CUs? At least the
>> comment should reflect that.
>>
>> On the other hand, Vulkan actually looks at the "always-on CUs", which I
>> guess is related to power-gating turning off CUs. So as long as we don't
>> look at that -- and I kind of think we shouldn't, because we care about the
>> high performance case where all CUs are turned on anyway -- we do need to
>> keep CU0 disabled for VS to avoid deadlocks when CUs happen to be
>> power-gated.
>
> Well, I can change the comment, but I don't see anything else here
> that should be changed.
Sure, with some comment change you can add my R-b.
Cheers,
Nicolai
>
>>
>> BTW, I recently realized that "late alloc" simply means that the VS waves
>> get their export allocation as soon as possible, i.e. as soon as there's
>> free space in the PC / POS export. So even late alloc waves might end up
>> never being stalled on exports, depending on the overall latency of the VS
>> relative to PS. Not important for the driver, but I found it interesting to
>> know.
>
> OK makes sense.
>
> Marek
>
--
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.
More information about the mesa-dev
mailing list