[Mesa-dev] [PATCH 4/4] RFC: nir: add lowering for idiv/udiv/umod

Wed Apr 1 08:39:23 PDT 2015

Am 01.04.2015 um 15:50 schrieb Ilia Mirkin:
> On Wed, Apr 1, 2015 at 7:09 AM, Roland Scheidegger <sroland at vmware.com> wrote:
>> Am 01.04.2015 um 03:44 schrieb Rob Clark:
>>> On Tue, Mar 31, 2015 at 9:03 PM, Roland Scheidegger <sroland at vmware.com> wrote:
>>>> Am 01.04.2015 um 00:57 schrieb Rob Clark:
>>>>> From: Rob Clark <robclark at freedesktop.org>
>>>>>
>>>>> Based on the algo from NV50LegalizeSSA::handleDIV() and handleMOD().
>>>>> See also trans_idiv() in freedreno/ir3/ir3_compiler.c (which was an
>>>>> adaptation of the nv50 code from Ilia).
>>>>>
>>>>> Just sending as an rfc right now, since I'm not quite at the point to be
>>>>> able to test it on actual hw.
>>>>>
>>>>> Signed-off-by: Rob Clark <robclark at freedesktop.org>
>>>>> ---
>>>>>  src/glsl/Makefile.sources     |   1 +
>>>>>  src/glsl/nir/nir.h            |   1 +
>>>>>  src/glsl/nir/nir_lower_idiv.c | 212 ++++++++++++++++++++++++++++++++++++++++++
>>>>>  3 files changed, 214 insertions(+)
>>>>>  create mode 100644 src/glsl/nir/nir_lower_idiv.c
>>>>>
>>>>> diff --git a/src/glsl/Makefile.sources b/src/glsl/Makefile.sources
>>>>> index 18fff38..e426970 100644
>>>>> --- a/src/glsl/Makefile.sources
>>>>> +++ b/src/glsl/Makefile.sources
>>>>> @@ -32,6 +32,7 @@ NIR_FILES = \
>>>>>       nir/nir_lower_atomics.c \
>>>>>       nir/nir_lower_global_vars_to_local.c \
>>>>>       nir/nir_lower_locals_to_regs.c \
>>>>> +     nir/nir_lower_idiv.c \
>>>>>       nir/nir_lower_io.c \
>>>>>       nir/nir_lower_phis_to_scalar.c \
>>>>>       nir/nir_lower_samplers.cpp \
>>>>> diff --git a/src/glsl/nir/nir.h b/src/glsl/nir/nir.h
>>>>> index cd03d6b..e002d6f 100644
>>>>> --- a/src/glsl/nir/nir.h
>>>>> +++ b/src/glsl/nir/nir.h
>>>>> @@ -1605,6 +1605,7 @@ void nir_lower_samplers(nir_shader *shader,
>>>>>
>>>>>  void nir_lower_system_values(nir_shader *shader);
>>>>>  void nir_lower_tex_projector(nir_shader *shader);
>>>>> +void nir_lower_idiv(nir_shader *shader);
>>>>>
>>>>>  void nir_lower_atomics(nir_shader *shader);
>>>>>  void nir_lower_to_source_mods(nir_shader *shader);
>>>>> diff --git a/src/glsl/nir/nir_lower_idiv.c b/src/glsl/nir/nir_lower_idiv.c
>>>>> new file mode 100644
>>>>> index 0000000..e95c57e
>>>>> --- /dev/null
>>>>> +++ b/src/glsl/nir/nir_lower_idiv.c
>>>>> @@ -0,0 +1,212 @@
>>>>> +/*
>>>>> + * Copyright © 2015 Red Hat
>>>>> + *
>>>>> + * Permission is hereby granted, free of charge, to any person obtaining a
>>>>> + * copy of this software and associated documentation files (the "Software"),
>>>>> + * to deal in the Software without restriction, including without limitation
>>>>> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
>>>>> + * and/or sell copies of the Software, and to permit persons to whom the
>>>>> + * Software is furnished to do so, subject to the following conditions:
>>>>> + *
>>>>> + * The above copyright notice and this permission notice (including the next
>>>>> + * paragraph) shall be included in all copies or substantial portions of the
>>>>> + * Software.
>>>>> + *
>>>>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
>>>>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>>>>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
>>>>> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
>>>>> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
>>>>> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
>>>>> + * IN THE SOFTWARE.
>>>>> + *
>>>>> + * Authors:
>>>>> + *    Rob Clark <robclark at freedesktop.org>
>>>>> + */
>>>>> +
>>>>> +#include "nir.h"
>>>>> +#include "nir_builder.h"
>>>>> +
>>>>> +/* Lowers idiv/udiv/umod
>>>>> + * Based on NV50LegalizeSSA::handleDIV()
>>>>> + *
>>>>> + * Note that this is probably not enough precision for compute shaders.
>>>>> + * Perhaps we want a second higher precision (looping) version of this?
>>>>> + * Or perhaps we assume if you can do compute shaders you can also
>>>>> + * branch out to a pre-optimized shader library routine..
>>>>
>>>> So if this is not enough precision, maybe should state how large the
>>>> error can be?
>>>>
>>>
>>> tbh, if I knew what the error for this approach was, I would have
>>> included it.  I'm not the original author, but this is based on
>>> nouveau codegen code (as mentioned in the comment).  I guess it is
>>> better than converting to float and dividing and converting back, but
>>> worse than an iterative (ie. looping, ie. divergent flow control)
>>> approach.  It is apparently enough to keep piglit happy.
>>>
>>> The original algo in nv50 lowering code is from
>>> 322bc7ed68ed92233c97168c036d0aa50c11a20e (ie. 'nv50/ir: import nv50
>>> target') which doesn't really give more clue about the origin..
>>>
>>> if anyone knows, I'm all ears and will add relevant links/info to comment..
>>
>> Ah ok. Well it isn't even obvious to me if the results are not actually
>> always exact.
> 
> Should be easy enough to take the algo, express it in terms of e.g.
> numpy (or even, *gasp*, a C program), and then do a randomized search
> over the 32bit x 32bit input space to see if there are any errors, and
> what they are. (Since the full input space would take too long...)
> 
> Looks like I did just that when debugging the freedreno impl...
> available at https://urldefense.proofpoint.com/v2/url?u=http-3A__hastebin.com_ewimuvobin.py&d=AwIFaQ&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs&r=Vjtt0vs_iqoI31UfJxBl7yv9I2FeiaeAYgMTLKRBc_I&m=Pn5cwt36o82bdZbB71-Fk0rkg7gYy4gsfb7UdU_yJOw&s=lmUwJATGP5llzxaPiaPMTL49paZCc6o-aTCWyaulGMw&e= 
>
Though actually, it uses frcp, which is usually not exact on gpus
(unless you translate that to a fdiv if your gpu has one). So,
potentially this could rely on the exact implementation of frcp...

Roland