[Mesa-dev] [PATCH 4/4] radeonsi: add instance divisor support

Marek Olšák maraeo at gmail.com
Wed Mar 27 06:10:25 PDT 2013


On Wed, Mar 27, 2013 at 1:43 PM, Alex Deucher <alexdeucher at gmail.com> wrote:
> On Wed, Mar 27, 2013 at 7:25 AM, Michel Dänzer <michel at daenzer.net> wrote:
>> On Mit, 2013-03-27 at 12:02 +0100, Christian König wrote:
>>> Am 26.03.2013 18:03, schrieb Michel Dänzer:
>>> > On Die, 2013-03-26 at 17:37 +0100, Christian König wrote:
>>> >> Am 26.03.2013 15:56, schrieb Michel Dänzer:
>>> >>> On Die, 2013-03-26 at 14:51 +0100, Christian König wrote:
>>> >>>> From: Christian König <christian.koenig at amd.com>
>>> >>>>
>>> >>>> Signed-off-by: Christian König <christian.koenig at amd.com>
>>> >>>> [...]
>>> >>>> diff --git a/src/gallium/drivers/radeonsi/radeonsi_shader.h b/src/gallium/drivers/radeonsi/radeonsi_shader.h
>>> >>>> index 9dae742..e09f297 100644
>>> >>>> --- a/src/gallium/drivers/radeonsi/radeonsi_shader.h
>>> >>>> +++ b/src/gallium/drivers/radeonsi/radeonsi_shader.h
>>> >>>> @@ -111,13 +111,18 @@ struct si_shader {
>>> >>>>          unsigned                nr_cbufs;
>>> >>>>    };
>>> >>>>
>>> >>>> -struct si_shader_key {
>>> >>>> -        unsigned                export_16bpc:8;
>>> >>>> -        unsigned                nr_cbufs:4;
>>> >>>> -        unsigned                color_two_side:1;
>>> >>>> -        unsigned                alpha_func:3;
>>> >>>> -        unsigned                flatshade:1;
>>> >>>> -        float                   alpha_ref;
>>> >>>> +union si_shader_key {
>>> >>>> +        struct {
>>> >>>> +                unsigned        export_16bpc:8;
>>> >>>> +                unsigned        nr_cbufs:4;
>>> >>>> +                unsigned        color_two_side:1;
>>> >>>> +                unsigned        alpha_func:3;
>>> >>>> +                unsigned        flatshade:1;
>>> >>>> +                float           alpha_ref;
>>> >>>> +        } ps;
>>> >>>> +        struct {
>>> >>>> +                unsigned        instance_divisors[PIPE_MAX_ATTRIBS];
>>> >>>> +        } vs;
>>> >>>>    };
>>> >>> This grows the shader key from 8 to 128 bytes. I don't suppose the
>>> >>> instance divisors could be encoded in a more compact way? E.g. loading
>>> >>> the divisor values from constants and only tracking which elements use a
>>> >>> divisor in a bitmask in the key.
>>> >> Considered that also, and I have two problems with that approach:
>>> >> 1. While immediates are converted to shifts & muls, dividing even by a
>>> >> constant in the shader isn't cheap.
>>> > Is that really significant? How much work would it be to come up with a
>>> > worst case test and measure the difference?
>>>
>>> Well no idea how to measure that on SI,
>>
>> I'd guess something like: With otherwise trivial vertex and pixel
>> shaders, draw huge numbers of triangles generating one pixel each, and
>> measure how long it takes.
>>
>>
>>> but when I implemented the same feature on R600 the difference between
>>> using reciprocal and mul compared to mulhi where quite significant.
>>
>> I don't see anything about this in the r600g shader key though. What's
>> the difference?
>>
>>
>>> >> How about storing only a byte for the instance_divisor? That limit's the
>>> >> divisor to a modulo of 256, but I don't think that would be so extremly bad.
>>> > I have no idea what the impact of that would be. What happens if an app
>>> > tries to use a divisor >= 256?
>>>
>>> It probably would select the wrong shader :(
>>>
>>> >> That would reduce the key to 32 bytes instead.
>>> > Still seems kind of big.
>>>
>>> Ok how about the following compromise: First we use a short for the
>>> instance divisor, that makes the key 32 bytes in size and should leave
>>> enough room for larger instance divisors, and second we don't copy the
>>> key around so much any more.
>>
>> This still won't work correctly for some legal divisor values, right? So
>> I'm afraid this is a flawed compromise, as it doesn't really address the
>> key size issues (potentially huge number of shader variants, cycles
>> spent in memcmp, memory usage) either. If those can't be addressed, it
>> should at least handle all legal values correctly.
>
> We use the large keys now and look into runtime shader patching for
> certain values as an improvement later on.

Yeah something like that, or we could just put the whole key in a
constant buffer and have the shader figure out what to do => no shader
recompilations. Or we could use indirect subroutines calls (as in
GL_ARB_shader_subroutine) and decide at draw time which subroutine
should be called depending on the key.

Marek


More information about the mesa-dev mailing list