[Mesa-dev] [PATCH] radeonsi: enable late VS export memory allocation

Nicolai Hähnle nhaehnle at gmail.com
Tue Jan 12 16:00:49 PST 2016


On 12.01.2016 18:55, Marek Olšák wrote:
> Hi Nicolai,
>
> I only tested piglit on Bonaire.
>
> The scratch buffer can deadlock too, though I guess we have enough
> scratch space that we should never run out of it.
>
> a) The lockup is prevented by the change to the value of
> SHADER_PGM_RSRC3_VS. It means that 1 compute unit is reserved for PS
> only. This is similar to tessellation, which isn't allowed to run on 2
> compute units.
>
> b) Yes, it can hurt performance if there are too many PS waves.
> However, hw guys told me that the rule of thumb is to set the number
> of waves to about a half, which is 32. The number can also be
> calculated dynamically based on VS and PS resource usage, but I don't
> know the equations.

Thanks for the explanation, that makes sense.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle at amd.com>

>
> Axel Davy benchmarked this briefly. We may need more benchmarks though.
>
> Marek
>
>
> On Tue, Jan 12, 2016 at 11:38 PM, Nicolai Hähnle <nhaehnle at gmail.com> wrote:
>> How well-tested is this? I'm a bit confused because I thought the
>> scratch_waves is about the scratch buffer ring (which should always be
>> allocated immediately when the shader starts), while the late_alloc is about
>> allocating the space where exports go.
>>
>> So late_alloc is basically bounded by
>>
>> (a) the need to avoid lockups (when all compute units are running VS that
>> don't have their export space allocated yet, so that there is nothing left
>> for the next shader to run on and start processing the queue) and
>>
>> (b) even when there's no lockup, it may still hurt performance to have lots
>> of VS waiting for export space to become available.
>>
>> I'm not sure if (a) is already covered by some of the registers that do
>> reservations (i.e., reserve one CU to be used only for pixel shaders etc.),
>> or if it's not an issue because every chip can have more than 31 waves in
>> flight (which I think is true).
>>
>> As for (b), it's less critical but has anybody run benchmarks with this
>> patch?
>>
>> Cheers,
>> Nicolai
>>
>>
>> On 12.01.2016 13:02, Marek Olšák wrote:
>>>
>>> From: Marek Olšák <marek.olsak at amd.com>
>>>
>>> This allows more VS waves to run on CIK and VI.
>>> ---
>>>    src/gallium/drivers/radeonsi/si_pipe.c  | 4 +++-
>>>    src/gallium/drivers/radeonsi/si_pipe.h  | 4 ++++
>>>    src/gallium/drivers/radeonsi/si_state.c | 5 +++--
>>>    3 files changed, 10 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/src/gallium/drivers/radeonsi/si_pipe.c
>>> b/src/gallium/drivers/radeonsi/si_pipe.c
>>> index 4e23cb1..cb0b8b2 100644
>>> --- a/src/gallium/drivers/radeonsi/si_pipe.c
>>> +++ b/src/gallium/drivers/radeonsi/si_pipe.c
>>> @@ -207,8 +207,10 @@ static struct pipe_context *si_create_context(struct
>>> pipe_screen *screen,
>>>          /* XXX: This is the maximum value allowed.  I'm not sure how to
>>> compute
>>>           * this for non-cs shaders.  Using the wrong value here can result
>>> in
>>>           * GPU lockups, but the maximum value seems to always work.
>>> +        *
>>> +        * If this is changed, SI_LATE_ALLOC_VS_WAVES should be updated
>>> accordingly.
>>>           */
>>> -       sctx->scratch_waves = 32 * sscreen->b.info.max_compute_units;
>>> +       sctx->scratch_waves = SI_NUM_SCRATCH_WAVES *
>>> sscreen->b.info.max_compute_units;
>>>
>>>    #if HAVE_LLVM >= 0x0306
>>>          /* Initialize LLVM TargetMachine */
>>> diff --git a/src/gallium/drivers/radeonsi/si_pipe.h
>>> b/src/gallium/drivers/radeonsi/si_pipe.h
>>> index f83cb02..6ea3017 100644
>>> --- a/src/gallium/drivers/radeonsi/si_pipe.h
>>> +++ b/src/gallium/drivers/radeonsi/si_pipe.h
>>> @@ -36,6 +36,10 @@
>>>    #define SI_BIG_ENDIAN 0
>>>    #endif
>>>
>>> +#define SI_NUM_SCRATCH_WAVES 32
>>> +#define SI_LATE_ALLOC_VS_WAVES (SI_NUM_SCRATCH_WAVES - 1)
>>> +
>>> +
>>>    /* The base vertex and primitive restart can be any number, but we must
>>> pick
>>>     * one which will mean "unknown" for the purpose of state tracking and
>>>     * the number shouldn't be a commonly-used one. */
>>> diff --git a/src/gallium/drivers/radeonsi/si_state.c
>>> b/src/gallium/drivers/radeonsi/si_state.c
>>> index 2a6d2c6..a8b292f 100644
>>> --- a/src/gallium/drivers/radeonsi/si_state.c
>>> +++ b/src/gallium/drivers/radeonsi/si_state.c
>>> @@ -3600,8 +3600,9 @@ static void si_init_config(struct si_context *sctx)
>>>                  si_pm4_set_reg(pm4, R_00B41C_SPI_SHADER_PGM_RSRC3_HS, 0);
>>>                  si_pm4_set_reg(pm4, R_00B31C_SPI_SHADER_PGM_RSRC3_ES,
>>> S_00B31C_CU_EN(0xfffe));
>>>                  si_pm4_set_reg(pm4, R_00B21C_SPI_SHADER_PGM_RSRC3_GS,
>>> S_00B21C_CU_EN(0xffff));
>>> -               si_pm4_set_reg(pm4, R_00B118_SPI_SHADER_PGM_RSRC3_VS,
>>> S_00B118_CU_EN(0xffff));
>>> -               si_pm4_set_reg(pm4, R_00B11C_SPI_SHADER_LATE_ALLOC_VS,
>>> S_00B11C_LIMIT(0));
>>> +               si_pm4_set_reg(pm4, R_00B118_SPI_SHADER_PGM_RSRC3_VS,
>>> S_00B118_CU_EN(0xfffe));
>>> +               si_pm4_set_reg(pm4, R_00B11C_SPI_SHADER_LATE_ALLOC_VS,
>>> +                              S_00B11C_LIMIT(SI_LATE_ALLOC_VS_WAVES));
>>>                  si_pm4_set_reg(pm4, R_00B01C_SPI_SHADER_PGM_RSRC3_PS,
>>> S_00B01C_CU_EN(0xffff));
>>>          }
>>>
>>>
>> _______________________________________________
>> mesa-dev mailing list
>> mesa-dev at lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev


More information about the mesa-dev mailing list