[Mesa-dev] [PATCH 2/2] radeonsi: move instance divisors into a constant buffer

Marek Olšák maraeo at gmail.com
Tue Jun 27 16:59:50 UTC 2017


On Tue, Jun 27, 2017 at 6:50 PM, Nicolai Hähnle <nhaehnle at gmail.com> wrote:
> On 27.06.2017 17:07, Marek Olšák wrote:
>>
>> On Tue, Jun 27, 2017 at 9:22 AM, Nicolai Hähnle <nhaehnle at gmail.com>
>> wrote:
>>>
>>> On 27.06.2017 02:14, Marek Olšák wrote:
>>>>
>>>>
>>>> From: Marek Olšák <marek.olsak at amd.com>
>>>>
>>>> Shader key size: 107 -> 47
>>>
>>>
>>>
>>> Nice improvement.
>>>
>>>
>>>> Divisors of 0 and 1 are encoded in the shader key. Greater instance
>>>> divisors
>>>> are loaded from a constant buffer.
>>>>
>>>> The shader code doing the division is huge. Is it something we need to
>>>> worry about? Does any app use instance divisors >= 2?
>>>
>>>
>>>
>>> This reminds me of a certain LLVM improvement that I still need to clear.
>>>
>>> I doubt instance divisors >= 2 are used. As a data point, Vulkan doesn't
>>> support it as a feature at all, IIRC.
>>>
>>> Can we get an optimized monotholic shader variant built for shaders that
>>> have to fetch? This should help if anybody ever triggers this, because
>>
>>
>> We can't get optimized variants if we want to keep the shader key
>> small. If I put all instance divisors into key.opt, it would defeat
>> the effect of this patch.
>
>
> What I meant is an optimized variant that still loads the instance divisors
> from the constant buffer, but compiles the prolog and main parts together.
> That way, LLVM can potentially schedule some of the division code before
> waiting for the loads of other attributes that are per-vertex. That should
> only require a single bit in the key.opt part.

Like this?

diff --git a/src/gallium/drivers/radeonsi/si_state_shaders.c
b/src/gallium/drivers/radeonsi/si_state_shaders.c
index 63cc746..af3f2a9 100644
--- a/src/gallium/drivers/radeonsi/si_state_shaders.c
+++ b/src/gallium/drivers/radeonsi/si_state_shaders.c
@@ -1192,6 +1192,11 @@ static void si_shader_selector_key_vs(struct
si_context *sctx,
        prolog_key->instance_divisor_is_fetched =
                sctx->vertex_elements->instance_divisor_is_fetched;

+       /* Prefer a monolithic shader to allow scheduling divisions around
+        * VBO loads. */
+       if (prolog_key->instance_divisor_is_fetched)
+               key->opt.prefer_mono = 1;
+
        unsigned count = MIN2(vs->info.num_inputs,
                              sctx->vertex_elements->count);
        memcpy(key->mono.vs_fix_fetch, sctx->vertex_elements->fix_fetch, count);

Marek


More information about the mesa-dev mailing list