[Mesa-dev] [PATCH] radeonsi: implement legacy RCP behavior to fix Unreal engine demos

Marek Olšák maraeo at gmail.com
Thu Dec 4 11:49:56 PST 2014


I agree that this should be fixed properly, but my time is limited at
the moment. We know there's a problem with our shader backend, because
there are issues with other games too. We can fix it later and we
certainly won't forget about it.

Note that r600g is using a lot of these non-standard legacy
instructions (like dx9 MUL where x*0=0, clamped RCP, clamped RSQ),
while radeonsi is mostly using the IEEE versions. I'd say radeonsi is
not so horrible yet, but yeah the situation is very messy.

If we want a driver to use IEEE behavior for all instructions, the
upper layers that also transform and optimize shaders must be ready
for it and must handle NaNs and Infs correctly. For starters, !(x > y)
is not equivalent to (x <= y), so the few comparison instructions that
we have currently aren't enough.

Marek

On Thu, Dec 4, 2014 at 8:03 PM, Roland Scheidegger <sroland at vmware.com> wrote:
> The problem is, while glsl was quite lenient in older versions wrt
> precision or things like div by zero, starting with glsl 4.10 you are
> _required_ to return +-/Inf for divs by zero. Hence if you use hacks
> here to make some app run, you _will_ have to fix that properly in the
> future in any case.
> (That said, somewhat related coming up from time to time, we don't have
> a way to express "legacy" vs. "modern" behavior in tgsi - glsl may have
> been indifferent at least in theory, but d3d9 was not and legacy
> arb_fp/vp also might have expected the no-nans, no-infs, rcp(0) = 0
> (though the spec doesn't tell that) and so on - it is unclear if this
> should be done in tgsi (different instructions? shader property?) or
> just translated away in state trackers if legacy behavior is expected,
> though the latter is no good for older drivers. But just expecting
> legacy behavior in backends is NOT feasible. RCP might be something of a
> special case because modern apis (be it glsl, opencl, d3d10) don't have
> it so legacy behavior may be expcted, but since it's used to lower other
> instructions this isn't really true neither.)
>
> RCP_CLAMP I guess looks somewhat better (because it's "closer" to the
> real result) but does not solve that issue neither.
>
> Like I said though, I have no idea if this issue is really coming from
> the app, or somehow the div lowering or other optimization somewhere (it
> looks like since both returning zero or +MAX_FLT work I guess it's
> really the NaN turning up at some point when probably doing a mul of it
> with zero or something like that either in the shader or even blend is
> the problem why things are screwed in the end).
>
> I don't really object this patch though, you can do in the driver
> whatever you want, but this is something which I feel should be figured
> out for real.
>
> Roland
>
> Am 04.12.2014 um 18:52 schrieb Marek Olšák:
>> Returning FLT_MAX instead of 0 also works, which is similar to another
>> hw instruction: V_RCP_CLAMP_F32.
>>
>> The Unreal engine is a pretty big target with a lot of apps out there.
>> I'm afraid a driconf option isn't feasible.
>>
>> Marek
>>
>> On Thu, Dec 4, 2014 at 5:39 PM, Roland Scheidegger <sroland at vmware.com> wrote:
>>> Hmm I have to say I'm not really convinced of that solution. Because all
>>> divs are lowered, this will screw the results of all divs (if the rcp
>>> would come from a legacy arb_fp rcp that would be different and quite
>>> possible some apps depending on it, problems like that are very common
>>> for d3d9 apps too). But really there's some expectations stuff conforms
>>> to ieee754 rules these days, and making divs by zero return 0 ain't so hot.
>>> Maybe it's the div lowering itself which causes this, in which case it
>>> should probably be disabled? Might be a good idea anyway (if the driver
>>> supports native div) since rcp isn't accurate usually.
>>> Difficult to tell though without seeing the glsl and tgsi shader. But if
>>> it was really the app expecting zero out of a div by zero (but I have
>>> doubts about that), I'd certainly classify that as an app bug, and any
>>> workarounds only be enabled by some dri conf option.
>>>
>>> Roland
>>>
>>>
>>>
>>> Am 04.12.2014 um 13:34 schrieb Marek Olšák:
>>>> From: Marek Olšák <marek.olsak at amd.com>
>>>>
>>>> Discussion: https://urldefense.proofpoint.com/v2/url?u=https-3A__bugs.freedesktop.org_show-5Fbug.cgi-3Fid-3D83510-23c8&d=AAIFaQ&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs&r=Vjtt0vs_iqoI31UfJxBl7yv9I2FeiaeAYgMTLKRBc_I&m=HWErIxbQoflHYrNCrKP9DUUVY69hrn9DXmH6mjo7TRE&s=djKOxL0S3qmyGMXZQcCFe2SkcTCdbpIrvB_EaeA2Xbc&e=
>>>> ---
>>>>  src/gallium/drivers/radeonsi/si_shader.c | 27 +++++++++++++++++++++++++++
>>>>  1 file changed, 27 insertions(+)
>>>>
>>>> diff --git a/src/gallium/drivers/radeonsi/si_shader.c b/src/gallium/drivers/radeonsi/si_shader.c
>>>> index 973bac2..e0799c9 100644
>>>> --- a/src/gallium/drivers/radeonsi/si_shader.c
>>>> +++ b/src/gallium/drivers/radeonsi/si_shader.c
>>>> @@ -2744,6 +2744,32 @@ static int si_generate_gs_copy_shader(struct si_screen *sscreen,
>>>>       return r;
>>>>  }
>>>>
>>>> +/**
>>>> + * This emulates V_RCP_LEGACY_F32, which has the following rule for division
>>>> + * by zero: 1 / 0 = 0
>>>> + *
>>>> + * V_RCP_F32(x) = 1 / x
>>>> + * V_RCP_LEGACY_F32(x) = (x != +-0) ? V_RCP_F32(x) : 0.
>>>> + */
>>>> +static void si_llvm_emit_rcp_legacy(const struct lp_build_tgsi_action * action,
>>>> +                                 struct lp_build_tgsi_context * bld_base,
>>>> +                                 struct lp_build_emit_data * emit_data)
>>>> +{
>>>> +     LLVMValueRef cmp =
>>>> +             lp_build_cmp(&bld_base->base,
>>>> +                          PIPE_FUNC_NOTEQUAL,
>>>> +                          emit_data->args[0],
>>>> +                          bld_base->base.zero);
>>>> +
>>>> +     LLVMValueRef div =
>>>> +             lp_build_emit_llvm_binary(bld_base, TGSI_OPCODE_DIV,
>>>> +                                       bld_base->base.one,
>>>> +                                       emit_data->args[0]);
>>>> +
>>>> +     emit_data->output[emit_data->chan] =
>>>> +             lp_build_select(&bld_base->base, cmp, div, bld_base->base.zero);
>>>> +}
>>>> +
>>>>  int si_shader_create(struct si_screen *sscreen, struct si_shader *shader)
>>>>  {
>>>>       struct si_shader_selector *sel = shader->selector;
>>>> @@ -2798,6 +2824,7 @@ int si_shader_create(struct si_screen *sscreen, struct si_shader *shader)
>>>>               bld_base->op_actions[TGSI_OPCODE_MIN].emit = build_tgsi_intrinsic_nomem;
>>>>               bld_base->op_actions[TGSI_OPCODE_MIN].intr_name = "llvm.minnum.f32";
>>>>       }
>>>> +     bld_base->op_actions[TGSI_OPCODE_RCP].emit = si_llvm_emit_rcp_legacy;
>>>>
>>>>       si_shader_ctx.radeon_bld.load_system_value = declare_system_value;
>>>>       si_shader_ctx.tokens = sel->tokens;
>>>>
>>>
>


More information about the mesa-dev mailing list