[Mesa-dev] [PATCH] radeonsi: enable late VS export memory allocation
Nicolai Hähnle
nhaehnle at gmail.com
Tue Jan 12 16:00:49 PST 2016
On 12.01.2016 18:55, Marek Olšák wrote:
> Hi Nicolai,
>
> I only tested piglit on Bonaire.
>
> The scratch buffer can deadlock too, though I guess we have enough
> scratch space that we should never run out of it.
>
> a) The lockup is prevented by the change to the value of
> SHADER_PGM_RSRC3_VS. It means that 1 compute unit is reserved for PS
> only. This is similar to tessellation, which isn't allowed to run on 2
> compute units.
>
> b) Yes, it can hurt performance if there are too many PS waves.
> However, hw guys told me that the rule of thumb is to set the number
> of waves to about a half, which is 32. The number can also be
> calculated dynamically based on VS and PS resource usage, but I don't
> know the equations.
Thanks for the explanation, that makes sense.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle at amd.com>
>
> Axel Davy benchmarked this briefly. We may need more benchmarks though.
>
> Marek
>
>
> On Tue, Jan 12, 2016 at 11:38 PM, Nicolai Hähnle <nhaehnle at gmail.com> wrote:
>> How well-tested is this? I'm a bit confused because I thought the
>> scratch_waves is about the scratch buffer ring (which should always be
>> allocated immediately when the shader starts), while the late_alloc is about
>> allocating the space where exports go.
>>
>> So late_alloc is basically bounded by
>>
>> (a) the need to avoid lockups (when all compute units are running VS that
>> don't have their export space allocated yet, so that there is nothing left
>> for the next shader to run on and start processing the queue) and
>>
>> (b) even when there's no lockup, it may still hurt performance to have lots
>> of VS waiting for export space to become available.
>>
>> I'm not sure if (a) is already covered by some of the registers that do
>> reservations (i.e., reserve one CU to be used only for pixel shaders etc.),
>> or if it's not an issue because every chip can have more than 31 waves in
>> flight (which I think is true).
>>
>> As for (b), it's less critical but has anybody run benchmarks with this
>> patch?
>>
>> Cheers,
>> Nicolai
>>
>>
>> On 12.01.2016 13:02, Marek Olšák wrote:
>>>
>>> From: Marek Olšák <marek.olsak at amd.com>
>>>
>>> This allows more VS waves to run on CIK and VI.
>>> ---
>>> src/gallium/drivers/radeonsi/si_pipe.c | 4 +++-
>>> src/gallium/drivers/radeonsi/si_pipe.h | 4 ++++
>>> src/gallium/drivers/radeonsi/si_state.c | 5 +++--
>>> 3 files changed, 10 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/src/gallium/drivers/radeonsi/si_pipe.c
>>> b/src/gallium/drivers/radeonsi/si_pipe.c
>>> index 4e23cb1..cb0b8b2 100644
>>> --- a/src/gallium/drivers/radeonsi/si_pipe.c
>>> +++ b/src/gallium/drivers/radeonsi/si_pipe.c
>>> @@ -207,8 +207,10 @@ static struct pipe_context *si_create_context(struct
>>> pipe_screen *screen,
>>> /* XXX: This is the maximum value allowed. I'm not sure how to
>>> compute
>>> * this for non-cs shaders. Using the wrong value here can result
>>> in
>>> * GPU lockups, but the maximum value seems to always work.
>>> + *
>>> + * If this is changed, SI_LATE_ALLOC_VS_WAVES should be updated
>>> accordingly.
>>> */
>>> - sctx->scratch_waves = 32 * sscreen->b.info.max_compute_units;
>>> + sctx->scratch_waves = SI_NUM_SCRATCH_WAVES *
>>> sscreen->b.info.max_compute_units;
>>>
>>> #if HAVE_LLVM >= 0x0306
>>> /* Initialize LLVM TargetMachine */
>>> diff --git a/src/gallium/drivers/radeonsi/si_pipe.h
>>> b/src/gallium/drivers/radeonsi/si_pipe.h
>>> index f83cb02..6ea3017 100644
>>> --- a/src/gallium/drivers/radeonsi/si_pipe.h
>>> +++ b/src/gallium/drivers/radeonsi/si_pipe.h
>>> @@ -36,6 +36,10 @@
>>> #define SI_BIG_ENDIAN 0
>>> #endif
>>>
>>> +#define SI_NUM_SCRATCH_WAVES 32
>>> +#define SI_LATE_ALLOC_VS_WAVES (SI_NUM_SCRATCH_WAVES - 1)
>>> +
>>> +
>>> /* The base vertex and primitive restart can be any number, but we must
>>> pick
>>> * one which will mean "unknown" for the purpose of state tracking and
>>> * the number shouldn't be a commonly-used one. */
>>> diff --git a/src/gallium/drivers/radeonsi/si_state.c
>>> b/src/gallium/drivers/radeonsi/si_state.c
>>> index 2a6d2c6..a8b292f 100644
>>> --- a/src/gallium/drivers/radeonsi/si_state.c
>>> +++ b/src/gallium/drivers/radeonsi/si_state.c
>>> @@ -3600,8 +3600,9 @@ static void si_init_config(struct si_context *sctx)
>>> si_pm4_set_reg(pm4, R_00B41C_SPI_SHADER_PGM_RSRC3_HS, 0);
>>> si_pm4_set_reg(pm4, R_00B31C_SPI_SHADER_PGM_RSRC3_ES,
>>> S_00B31C_CU_EN(0xfffe));
>>> si_pm4_set_reg(pm4, R_00B21C_SPI_SHADER_PGM_RSRC3_GS,
>>> S_00B21C_CU_EN(0xffff));
>>> - si_pm4_set_reg(pm4, R_00B118_SPI_SHADER_PGM_RSRC3_VS,
>>> S_00B118_CU_EN(0xffff));
>>> - si_pm4_set_reg(pm4, R_00B11C_SPI_SHADER_LATE_ALLOC_VS,
>>> S_00B11C_LIMIT(0));
>>> + si_pm4_set_reg(pm4, R_00B118_SPI_SHADER_PGM_RSRC3_VS,
>>> S_00B118_CU_EN(0xfffe));
>>> + si_pm4_set_reg(pm4, R_00B11C_SPI_SHADER_LATE_ALLOC_VS,
>>> + S_00B11C_LIMIT(SI_LATE_ALLOC_VS_WAVES));
>>> si_pm4_set_reg(pm4, R_00B01C_SPI_SHADER_PGM_RSRC3_PS,
>>> S_00B01C_CU_EN(0xffff));
>>> }
>>>
>>>
>> _______________________________________________
>> mesa-dev mailing list
>> mesa-dev at lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
More information about the mesa-dev
mailing list