[Mesa-dev] [PATCH] r600g: fix color exports when we have no CBs

Vadim Girlin vadimgirlin at gmail.com
Tue Aug 27 16:17:01 PDT 2013


On 08/28/2013 02:28 AM, Roland Scheidegger wrote:
> Am 27.08.2013 23:52, schrieb Vadim Girlin:
>> On 08/28/2013 12:43 AM, Marek Olšák wrote:
>>> Shader variants are BAD, BAD, BAD. Have you ever played an AAA game
>>> with a Mesa driver that likes to compile shader variants on first use?
>>> It's HORRIBLE.
>>
>> I don't think that shader variants are bad, but it's definitely bad when
>> we are compiling variants that are never used. Currently glxgears
>> compiles 18 ps/vs shaders. In my branch with initial GS support [1] I
>> switched handling of the shaders to deferred compilation, that is,
>> shaders are compiled only before the actual draw. I found later that
>> it's not really required for GS, but IIRC this change results in only 5
>> shaders being compiled for glxgears instead of 18. It seems most of the
>> useless variants are results of state changes between creation of the
>> shader state (initial compilation) and actual draw call.
>>
>> I had some concerns about increased overhead with those changes, and
>> it's actually noticeable with drawoverhead demo, but I didn't see any
>> regressions with a few real apps that I tested, e.g. glxgears even
>> showed slightly better performance with these changes. Probably I also
>> implemented it in a not very optimal way (I was mostly concentrated on
>> GS support) and the overhead can be reduced.
>>
>> One more thing is duplicate shaders, I've analyzed shader dumps from
>> Unigine Heaven 3.0 some time ago and found that from about 320 compiled
>> shaders, only about 180 (50%) were unique, others were duplicates
>> (detected by comparing the bytecode dumps for them in an automated way),
>> maybe they had different shader keys (which still resulted in the same
>> bytecode), but I suspect duplicate pipe shaders were also involved.
>> Unfortunately I didn't have a time to investigate it more thoroughly
>> since then.
>>
>> So my point is that we don't really need to eliminate shader variants,
>> first we need to eliminate compilation of unused variants and duplicate
>> shaders. Also we might want to consider offloading of the compilation to
>> separate thread(s) and caching of shader binaries between runs.
>
> Hmm ok that seems a way more complicated problem than I thought :-).
> Compile early and you might compile variants you will never use, compile
> late and the delay might be noticeable.

Compilation of unused variants is not bad if they are compiled at the 
game/level loading time, I think many apps are trying to compile shaders 
early to avoid freezes during gameplay. But trying to compile early in 
the driver doesn't make sense currently because it's already too late 
anyway, if I'm not missing something, it's deferred in mesa or state 
tracker.

Otherwise probably it would be preferable for the driver to precompile 
variants that are likely to be used (but only if we really can do it 
early, at the loading time when shaders are created by the app).

> I just thought it might be unlikely you'd actually need two variants -
> e.g. some depth exporting shader is probably unlikely to use alpha test.
> But ok I guess it shouldn't write color in this case, so even then it
> might never be worth bothering. Was just a random idea ;-).

I think it's a good idea that just needs some benchmarking to make sure 
that it can provide any real benefits. I only wanted to say that it can 
be done separately from this fix.

Vadim


>
> Roland
>
>
>>
>> Vadim
>>
>>   [1] http://cgit.freedesktop.org/~vadimg/mesa/log/?h=r600-geom-shaders
>>
>>>
>>> What the patch does is probably the right solution. At least
>>> alpha-test state changes don't cause shader recompilation and
>>> re-binding, which also negatively affects performance. Ideally we
>>> shouldn't depend on the framebuffer state at all, but we need to
>>> emulate the TGSI property FS_COLOR0_WRITES_ALL_CBUFS. I think we
>>> should always be fine with key.nr_cbufs forced to 8 for any shader
>>> without that property. I expect app developers to do the right thing
>>> and not write outputs they don't need.
>>>
>>> Marek
>>>
>>> On Tue, Aug 27, 2013 at 9:00 PM, Roland Scheidegger
>>> <sroland at vmware.com> wrote:
>>>> Not that I'm qualified to review r600 code, but couldn't you create
>>>> different shader variants depending on whether you need alpha test? At
>>>> least I would assume shader exports aren't free.
>>>>
>>>> Roland
>>>>
>>>> Am 27.08.2013 19:56, schrieb Vadim Girlin:
>>>>> We need to export at least one color if the shader writes it,
>>>>> even when nr_cbufs==0.
>>>>>
>>>>> Signed-off-by: Vadim Girlin <vadimgirlin at gmail.com>
>>>>> ---
>>>>>
>>>>> Tested on evergreen with multiple combinations of backends - no
>>>>> regressions,
>>>>> fixes some tests:
>>>>>
>>>>>     default    - fixes fb-alphatest-nocolor and fb_alphatest-nocolor-ff
>>>>>     default+sb - fixes fb-alphatest-nocolor and fb_alphatest-nocolor-ff
>>>>>     llvm       - fixes about 25 tests related to depth/stencil
>>>>>     llvm+sb    - fixes about 300 tests (llvm's depth/stencil issues and
>>>>>                  regressions cased by reordering of exports in sb)
>>>>>
>>>>> With this patch, there are no regressions with default+sb vs default.
>>>>> There is one regression with llvm+sb vs llvm -
>>>>> fs-texturegrad-miplevels,
>>>>> AFAICS it's a problem with llvm backend uncovered by sb -
>>>>> SET_GRADIENTS_V/H
>>>>> instructions are not placed in the same TEX clause with
>>>>> corresponding SAMPLE_G.
>>>>>
>>>>>    src/gallium/drivers/r600/r600_shader.c | 7 ++++---
>>>>>    1 file changed, 4 insertions(+), 3 deletions(-)
>>>>>
>>>>> diff --git a/src/gallium/drivers/r600/r600_shader.c
>>>>> b/src/gallium/drivers/r600/r600_shader.c
>>>>> index 300b5c4..f7eab76 100644
>>>>> --- a/src/gallium/drivers/r600/r600_shader.c
>>>>> +++ b/src/gallium/drivers/r600/r600_shader.c
>>>>> @@ -918,6 +918,7 @@ static int r600_shader_from_tgsi(struct
>>>>> r600_screen *rscreen,
>>>>>         unsigned opcode;
>>>>>         int i, j, k, r = 0;
>>>>>         int next_pos_base = 60, next_param_base = 0;
>>>>> +     int max_color_exports = MAX2(key.nr_cbufs, 1);
>>>>>         /* Declarations used by llvm code */
>>>>>         bool use_llvm = false;
>>>>>         bool indirect_gprs;
>>>>> @@ -1130,7 +1131,7 @@ static int r600_shader_from_tgsi(struct
>>>>> r600_screen *rscreen,
>>>>>                 radeon_llvm_ctx.face_gpr = ctx.face_gpr;
>>>>>                 radeon_llvm_ctx.r600_inputs = ctx.shader->input;
>>>>>                 radeon_llvm_ctx.r600_outputs = ctx.shader->output;
>>>>> -             radeon_llvm_ctx.color_buffer_count = MAX2(key.nr_cbufs
>>>>> , 1);
>>>>> +             radeon_llvm_ctx.color_buffer_count = max_color_exports;
>>>>>                 radeon_llvm_ctx.chip_class = ctx.bc->chip_class;
>>>>>                 radeon_llvm_ctx.fs_color_all = shader->fs_write_all
>>>>> && (rscreen->chip_class >= EVERGREEN);
>>>>>                 radeon_llvm_ctx.stream_outputs = &so;
>>>>> @@ -1440,7 +1441,7 @@ static int r600_shader_from_tgsi(struct
>>>>> r600_screen *rscreen,
>>>>>                 case TGSI_PROCESSOR_FRAGMENT:
>>>>>                         if (shader->output[i].name ==
>>>>> TGSI_SEMANTIC_COLOR) {
>>>>>                                 /* never export more colors than the
>>>>> number of CBs */
>>>>> -                             if (shader->output[i].sid >=
>>>>> key.nr_cbufs) {
>>>>> +                             if (shader->output[i].sid >=
>>>>> max_color_exports) {
>>>>>                                         /* skip export */
>>>>>                                         j--;
>>>>>                                         continue;
>>>>> @@ -1450,7 +1451,7 @@ static int r600_shader_from_tgsi(struct
>>>>> r600_screen *rscreen,
>>>>>                                 output[j].type =
>>>>> V_SQ_CF_ALLOC_EXPORT_WORD0_SQ_EXPORT_PIXEL;
>>>>>                                 shader->nr_ps_color_exports++;
>>>>>                                 if (shader->fs_write_all &&
>>>>> (rscreen->chip_class >= EVERGREEN)) {
>>>>> -                                     for (k = 1; k < key.nr_cbufs;
>>>>> k++) {
>>>>> +                                     for (k = 1; k <
>>>>> max_color_exports; k++) {
>>>>>                                                 j++;
>>>>>                                                 memset(&output[j], 0,
>>>>> sizeof(struct r600_bytecode_output));
>>>>>                                                 output[j].gpr =
>>>>> shader->output[i].gpr;
>>>>>
>>>> _______________________________________________
>>>> mesa-dev mailing list
>>>> mesa-dev at lists.freedesktop.org
>>>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>>> _______________________________________________
>>> mesa-dev mailing list
>>> mesa-dev at lists.freedesktop.org
>>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>>>
>>
>> _______________________________________________
>> mesa-dev mailing list
>> mesa-dev at lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>



More information about the mesa-dev mailing list