[Mesa-dev] [PATCH] radeonsi: implement legacy RCP behavior to fix Unreal engine demos

Thu Dec 4 12:10:01 PST 2014

I certainly don't want to deter you from pushing a quick fix now.

If possible I'd just like any further information you might have 
collected while debugging this issue in the hope we can figure out a 
more comprehensive solution should be.

Because I'm sure other drivers (including VMware's svga and llvmpipe) 
will face the exact same issue.

Jose

On 04/12/14 19:49, Marek Olšák wrote:
> I agree that this should be fixed properly, but my time is limited at
> the moment. We know there's a problem with our shader backend, because
> there are issues with other games too. We can fix it later and we
> certainly won't forget about it.
>
> Note that r600g is using a lot of these non-standard legacy
> instructions (like dx9 MUL where x*0=0, clamped RCP, clamped RSQ),
> while radeonsi is mostly using the IEEE versions. I'd say radeonsi is
> not so horrible yet, but yeah the situation is very messy.
>
> If we want a driver to use IEEE behavior for all instructions, the
> upper layers that also transform and optimize shaders must be ready
> for it and must handle NaNs and Infs correctly. For starters, !(x > y)
> is not equivalent to (x <= y), so the few comparison instructions that
> we have currently aren't enough.
>
> Marek
>
> On Thu, Dec 4, 2014 at 8:03 PM, Roland Scheidegger <sroland at vmware.com> wrote:
>> The problem is, while glsl was quite lenient in older versions wrt
>> precision or things like div by zero, starting with glsl 4.10 you are
>> _required_ to return +-/Inf for divs by zero. Hence if you use hacks
>> here to make some app run, you _will_ have to fix that properly in the
>> future in any case.
>> (That said, somewhat related coming up from time to time, we don't have
>> a way to express "legacy" vs. "modern" behavior in tgsi - glsl may have
>> been indifferent at least in theory, but d3d9 was not and legacy
>> arb_fp/vp also might have expected the no-nans, no-infs, rcp(0) = 0
>> (though the spec doesn't tell that) and so on - it is unclear if this
>> should be done in tgsi (different instructions? shader property?) or
>> just translated away in state trackers if legacy behavior is expected,
>> though the latter is no good for older drivers. But just expecting
>> legacy behavior in backends is NOT feasible. RCP might be something of a
>> special case because modern apis (be it glsl, opencl, d3d10) don't have
>> it so legacy behavior may be expcted, but since it's used to lower other
>> instructions this isn't really true neither.)
>>
>> RCP_CLAMP I guess looks somewhat better (because it's "closer" to the
>> real result) but does not solve that issue neither.
>>
>> Like I said though, I have no idea if this issue is really coming from
>> the app, or somehow the div lowering or other optimization somewhere (it
>> looks like since both returning zero or +MAX_FLT work I guess it's
>> really the NaN turning up at some point when probably doing a mul of it
>> with zero or something like that either in the shader or even blend is
>> the problem why things are screwed in the end).
>>
>> I don't really object this patch though, you can do in the driver
>> whatever you want, but this is something which I feel should be figured
>> out for real.
>>
>> Roland
>>
>> Am 04.12.2014 um 18:52 schrieb Marek Olšák:
>>> Returning FLT_MAX instead of 0 also works, which is similar to another
>>> hw instruction: V_RCP_CLAMP_F32.
>>>
>>> The Unreal engine is a pretty big target with a lot of apps out there.
>>> I'm afraid a driconf option isn't feasible.
>>>
>>> Marek
>>>
>>> On Thu, Dec 4, 2014 at 5:39 PM, Roland Scheidegger <sroland at vmware.com> wrote:
>>>> Hmm I have to say I'm not really convinced of that solution. Because all
>>>> divs are lowered, this will screw the results of all divs (if the rcp
>>>> would come from a legacy arb_fp rcp that would be different and quite
>>>> possible some apps depending on it, problems like that are very common
>>>> for d3d9 apps too). But really there's some expectations stuff conforms
>>>> to ieee754 rules these days, and making divs by zero return 0 ain't so hot.
>>>> Maybe it's the div lowering itself which causes this, in which case it
>>>> should probably be disabled? Might be a good idea anyway (if the driver
>>>> supports native div) since rcp isn't accurate usually.
>>>> Difficult to tell though without seeing the glsl and tgsi shader. But if
>>>> it was really the app expecting zero out of a div by zero (but I have
>>>> doubts about that), I'd certainly classify that as an app bug, and any
>>>> workarounds only be enabled by some dri conf option.
>>>>
>>>> Roland
>>>>
>>>>
>>>>
>>>> Am 04.12.2014 um 13:34 schrieb Marek Olšák:
>>>>> From: Marek Olšák <marek.olsak at amd.com>
>>>>>
>>>>> Discussion: https://urldefense.proofpoint.com/v2/url?u=https-3A__bugs.freedesktop.org_show-5Fbug.cgi-3Fid-3D83510-23c8&d=AAIFaQ&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs&r=Vjtt0vs_iqoI31UfJxBl7yv9I2FeiaeAYgMTLKRBc_I&m=HWErIxbQoflHYrNCrKP9DUUVY69hrn9DXmH6mjo7TRE&s=djKOxL0S3qmyGMXZQcCFe2SkcTCdbpIrvB_EaeA2Xbc&e=
>>>>> ---
>>>>>   src/gallium/drivers/radeonsi/si_shader.c | 27 +++++++++++++++++++++++++++
>>>>>   1 file changed, 27 insertions(+)
>>>>>
>>>>> diff --git a/src/gallium/drivers/radeonsi/si_shader.c b/src/gallium/drivers/radeonsi/si_shader.c
>>>>> index 973bac2..e0799c9 100644
>>>>> --- a/src/gallium/drivers/radeonsi/si_shader.c
>>>>> +++ b/src/gallium/drivers/radeonsi/si_shader.c
>>>>> @@ -2744,6 +2744,32 @@ static int si_generate_gs_copy_shader(struct si_screen *sscreen,
>>>>>        return r;
>>>>>   }
>>>>>
>>>>> +/**
>>>>> + * This emulates V_RCP_LEGACY_F32, which has the following rule for division
>>>>> + * by zero: 1 / 0 = 0
>>>>> + *
>>>>> + * V_RCP_F32(x) = 1 / x
>>>>> + * V_RCP_LEGACY_F32(x) = (x != +-0) ? V_RCP_F32(x) : 0.
>>>>> + */
>>>>> +static void si_llvm_emit_rcp_legacy(const struct lp_build_tgsi_action * action,
>>>>> +                                 struct lp_build_tgsi_context * bld_base,
>>>>> +                                 struct lp_build_emit_data * emit_data)
>>>>> +{
>>>>> +     LLVMValueRef cmp =
>>>>> +             lp_build_cmp(&bld_base->base,
>>>>> +                          PIPE_FUNC_NOTEQUAL,
>>>>> +                          emit_data->args[0],
>>>>> +                          bld_base->base.zero);
>>>>> +
>>>>> +     LLVMValueRef div =
>>>>> +             lp_build_emit_llvm_binary(bld_base, TGSI_OPCODE_DIV,
>>>>> +                                       bld_base->base.one,
>>>>> +                                       emit_data->args[0]);
>>>>> +
>>>>> +     emit_data->output[emit_data->chan] =
>>>>> +             lp_build_select(&bld_base->base, cmp, div, bld_base->base.zero);
>>>>> +}
>>>>> +
>>>>>   int si_shader_create(struct si_screen *sscreen, struct si_shader *shader)
>>>>>   {
>>>>>        struct si_shader_selector *sel = shader->selector;
>>>>> @@ -2798,6 +2824,7 @@ int si_shader_create(struct si_screen *sscreen, struct si_shader *shader)
>>>>>                bld_base->op_actions[TGSI_OPCODE_MIN].emit = build_tgsi_intrinsic_nomem;
>>>>>                bld_base->op_actions[TGSI_OPCODE_MIN].intr_name = "llvm.minnum.f32";
>>>>>        }
>>>>> +     bld_base->op_actions[TGSI_OPCODE_RCP].emit = si_llvm_emit_rcp_legacy;
>>>>>
>>>>>        si_shader_ctx.radeon_bld.load_system_value = declare_system_value;
>>>>>        si_shader_ctx.tokens = sel->tokens;
>>>>>
>>>>
>>
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.freedesktop.org_mailman_listinfo_mesa-2Ddev&d=AAIGaQ&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs&r=zfmBZnnVGHeYde45pMKNnVyzeaZbdIqVLprmZCM2zzE&m=FiE6oNGxIBhAhWYTizfUU0U-CMLJWgwTBa8HHlEwmYk&s=ix4zRbhUVrkjpEX2HelK0L1X-NFxAVdGALq9vyUv7eo&e=
>