[Mesa-dev] [PATCH 4/4] RFC: nir: add lowering for idiv/udiv/umod

Fri Apr 3 06:00:36 PDT 2015

On Wed, Apr 1, 2015 at 9:50 AM, Ilia Mirkin <imirkin at alum.mit.edu> wrote:
> On Wed, Apr 1, 2015 at 7:09 AM, Roland Scheidegger <sroland at vmware.com> wrote:
>> Am 01.04.2015 um 03:44 schrieb Rob Clark:
>>> On Tue, Mar 31, 2015 at 9:03 PM, Roland Scheidegger <sroland at vmware.com> wrote:
>>>> Am 01.04.2015 um 00:57 schrieb Rob Clark:
>>>>>
>>>>> +/* Lowers idiv/udiv/umod
>>>>> + * Based on NV50LegalizeSSA::handleDIV()
>>>>> + *
>>>>> + * Note that this is probably not enough precision for compute shaders.
>>>>> + * Perhaps we want a second higher precision (looping) version of this?
>>>>> + * Or perhaps we assume if you can do compute shaders you can also
>>>>> + * branch out to a pre-optimized shader library routine..
>>>>
>>>> So if this is not enough precision, maybe should state how large the
>>>> error can be?
>>>>
>>>
>>> tbh, if I knew what the error for this approach was, I would have
>>> included it.  I'm not the original author, but this is based on
>>> nouveau codegen code (as mentioned in the comment).  I guess it is
>>> better than converting to float and dividing and converting back, but
>>> worse than an iterative (ie. looping, ie. divergent flow control)
>>> approach.  It is apparently enough to keep piglit happy.
>>>
>>> The original algo in nv50 lowering code is from
>>> 322bc7ed68ed92233c97168c036d0aa50c11a20e (ie. 'nv50/ir: import nv50
>>> target') which doesn't really give more clue about the origin..
>>>
>>> if anyone knows, I'm all ears and will add relevant links/info to comment..
>>
>> Ah ok. Well it isn't even obvious to me if the results are not actually
>> always exact.
>
> Should be easy enough to take the algo, express it in terms of e.g.
> numpy (or even, *gasp*, a C program), and then do a randomized search
> over the 32bit x 32bit input space to see if there are any errors, and
> what they are. (Since the full input space would take too long...)
>
> Looks like I did just that when debugging the freedreno impl...
> available at http://hastebin.com/ewimuvobin.py
>

fwiw, looks like you still had some broken hacks in that script,
probably left overs from your earlier experiments..  I fixed it up (or
at least it seems to be giving the same results piglit expects for the
same inputs) and also added udiv vs idiv support.. guess I should add
umod support too and commit it along side the idiv lowering (when that
actually works too)

would appreciate a second set of eyes on this since I'm pretty much a
python and numpy newbie:

   http://hastebin.com/orogikadey.vhdl

now to figure out what my idiv lowering is doing differently :-P

BR,
-R

>   -ilia