<div dir="ltr"><div class="gmail_quote"><div dir="ltr">On Mon, Sep 24, 2018 at 7:15 PM Marek Olšák <<a href="mailto:maraeo@gmail.com">maraeo@gmail.com</a>> wrote: </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">This patch also handles all types, just differently. If you change the typedefs in the header, you'll get a different type and the code is exactly the same for all types, but that's not important (to me anyway). It also supports signed division. (not important to me either, but may be important to you) </blockquote><div> </div><div>Mine supported signed division as well though what's here might be a bit more clever. I'll have to give it some thought. </div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> Did you figure out the algorithm by yourself or did you copy it from somewhere? The reason I'm asking is that yours only seems to implement the Round-Up algorithm and you said: "In particular, we want to have m < 2^N so that we don't have any overflow problems. Unfortunately, this isn't always achievable." </blockquote><div> </div><div>Yes, mine is based on the round up algorithm. However, it's not the blind round-up algorithm; it's a bit smarter than that. </div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> Let me tell you what. This patch achieves it ALWAYS. </blockquote><div> </div><div>I don't think that's true. You still have an N+1 bit multiplier, you just call it the increment bit. The saturated add, however, is a neat trick that probably lets you avoid the weirness around adding in the increment factor. I'll need to look at the web-site you linked and think about this stuff again before I can verify it. </div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> This patch implements 2 algorithms for unsigned division: Round-Up and Round-Down. The paper I linked shows that the Round Down algorithm generates better code for some divisors than the Round Up algorithm, because the multiplier always fits into 32 bits. The most operations you'll ever need are: 2 shifts, 32-bit saturated ADD and UMUL_HI. Marek On Mon, Sep 24, 2018 at 7:41 PM, Marek Olšák <<a href="mailto:maraeo@gmail.com" target="_blank">maraeo@gmail.com</a>> wrote: > Did you copy the code from the same author? > > Does your version also have an interface for dividing by a uniform > instead of a compile time constant? > > Note that this algorithm was originally only written for > non-power-of-two divisors and I extended it to support 1 and > power-of-two divisors in order to support dividing by a uniform in a > generic way. The other two generic variants that I added are also > important. One of them assumes no unsigned wraparounds and the other > one assumes operands have 31 bits and the divisor is >= 2. > > Marek > > On Mon, Sep 24, 2018 at 10:00 AM, Jason Ekstrand <<a href="mailto:jason@jlekstrand.net" target="_blank">jason@jlekstrand.net</a>> wrote: >> Very similar.... And mine handles 8, 16, and 64-bit types. :-D >> >> --Jason >> >> On Mon, Sep 24, 2018 at 8:53 AM Ian Romanick <<a href="mailto:idr@freedesktop.org" target="_blank">idr@freedesktop.org</a>> wrote: >>> >>> I didn't look really closely at either set, but this seems really >>> similar to something Jason sent out a week or two. Perhaps you guys >>> could unify these? >>> >>> On 09/23/2018 09:57 AM, Marek Olšák wrote: >>> > From: Marek Olšák <<a href="mailto:marek.olsak@amd.com" target="_blank">marek.olsak@amd.com</a>> >>> > >>> > Compilers can use this to generate optimal code for integer division >>> > by a constant. >>> > >>> > Additionally, an unsigned division by a uniform that is constant but not >>> > known at compile time can still be optimized by passing 2-4 division >>> > factors to the shader as uniforms and executing one of the fast_udiv* >>> > variants. The signed division algorithm doesn't have this capability. >>> > --- >>> > src/util/Makefile.sources | 2 + >>> > src/util/fast_idiv_by_const.c | 245 >>> > ++++++++++++++++++++++++++++++++++++++++++ >>> > src/util/fast_idiv_by_const.h | 173 +++++++++++++++++++++++++++++ >>> > src/util/meson.build | 2 + >>> > 4 files changed, 422 insertions(+) >>> > create mode 100644 src/util/fast_idiv_by_const.c >>> > create mode 100644 src/util/fast_idiv_by_const.h >>> > >>> > diff --git a/src/util/Makefile.sources b/src/util/Makefile.sources >>> > index b562d6c..f741b2a 100644 >>> > --- a/src/util/Makefile.sources >>> > +++ b/src/util/Makefile.sources >>> > @@ -3,20 +3,22 @@ MESA_UTIL_FILES := \ >>> > bitscan.h \ >>> > bitset.h \ >>> > build_id.c \ >>> > build_id.h \ >>> > crc32.c \ >>> > crc32.h \ >>> > debug.c \ >>> > debug.h \ >>> > disk_cache.c \ >>> > disk_cache.h \ >>> > + fast_idiv_by_const.c \ >>> > + fast_idiv_by_const.h \ >>> > format_r11g11b10f.h \ >>> > format_rgb9e5.h \ >>> > format_srgb.h \ >>> > futex.h \ >>> > half_float.c \ >>> > half_float.h \ >>> > hash_table.c \ >>> > hash_table.h \ >>> > list.h \ >>> > macros.h \ >>> > diff --git a/src/util/fast_idiv_by_const.c >>> > b/src/util/fast_idiv_by_const.c >>> > new file mode 100644 >>> > index 0000000..f247b66 >>> > --- /dev/null >>> > +++ b/src/util/fast_idiv_by_const.c >>> > @@ -0,0 +1,245 @@ >>> > +/* >>> > + * Copyright © 2018 Advanced Micro Devices, Inc. >>> > + * >>> > + * Permission is hereby granted, free of charge, to any person >>> > obtaining a >>> > + * copy of this software and associated documentation files (the >>> > "Software"), >>> > + * to deal in the Software without restriction, including without >>> > limitation >>> > + * the rights to use, copy, modify, merge, publish, distribute, >>> > sublicense, >>> > + * and/or sell copies of the Software, and to permit persons to whom >>> > the >>> > + * Software is furnished to do so, subject to the following conditions: >>> > + * >>> > + * The above copyright notice and this permission notice (including the >>> > next >>> > + * paragraph) shall be included in all copies or substantial portions >>> > of the >>> > + * Software. >>> > + * >>> > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, >>> > EXPRESS OR >>> > + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF >>> > MERCHANTABILITY, >>> > + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT >>> > SHALL >>> > + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR >>> > OTHER >>> > + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, >>> > ARISING >>> > + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER >>> > DEALINGS >>> > + * IN THE SOFTWARE. >>> > + */ >>> > + >>> > +/* Imported from: >>> > + * >>> > <a href="https://raw.githubusercontent.com/ridiculousfish/libdivide/master/divide_by_constants_codegen_reference.c" rel="noreferrer" target="_blank">https://raw.githubusercontent.com/ridiculousfish/libdivide/master/divide_by_constants_codegen_reference.c</a> >>> > + * Paper: >>> > + * >>> > <a href="http://ridiculousfish.com/files/faster_unsigned_division_by_constants.pdf" rel="noreferrer" target="_blank">http://ridiculousfish.com/files/faster_unsigned_division_by_constants.pdf</a> >>> > + * >>> > + * The author, ridiculous_fish, wrote: >>> > + * >>> > + * ''Reference implementations of computing and using the "magic >>> > number" >>> > + * approach to dividing by constants, including codegen >>> > instructions. >>> > + * The unsigned division incorporates the "round down" optimization >>> > per >>> > + * ridiculous_fish. >>> > + * >>> > + * This is free and unencumbered software. Any copyright is >>> > dedicated >>> > + * to the Public Domain.'' >>> > + */ >>> > + >>> > +#include "fast_idiv_by_const.h" >>> > +#include "u_math.h" >>> > +#include <limits.h> >>> > +#include <assert.h> >>> > + >>> > +/* uint_t and sint_t can be replaced by different integer types and the >>> > code >>> > + * will work as-is. The only requirement is that sizeof(uintN) == >>> > sizeof(intN). >>> > + */ >>> > + >>> > +struct util_fast_udiv_info >>> > +util_compute_fast_udiv_info(uint_t D, unsigned num_bits) >>> > +{ >>> > + /* The numerator must fit in a uint_t */ >>> > + assert(num_bits > 0 && num_bits <= sizeof(uint_t) * CHAR_BIT); >>> > + assert(D != 0); >>> > + >>> > + /* The eventual result */ >>> > + struct util_fast_udiv_info result; >>> > + >>> > + if (util_is_power_of_two_nonzero(D)) { >>> > + unsigned div_shift = util_logbase2(D); >>> > + >>> > + if (div_shift) { >>> > + /* Dividing by a power of two. */ >>> > + result.multiplier = 1 << 31; </blockquote><div> </div><div>This is wrong for non-32-bit </div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> >>> > + result.pre_shift = 0; >>> > + result.post_shift = div_shift - 1; >>> > + result.increment = 0; >>> > + return result; >>> > + } else { >>> > + /* Dividing by 1. */ >>> > + /* Assuming: floor((num + 1) * (2^32 - 1) / 2^32) = num */ >>> > + result.multiplier = UINT_MAX; </blockquote><div> </div><div>So is this.</div><div> </div><div>Can we at the very least pull in the unit tests from my series?</div><div> </div><div>--Jason </div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> >>> > + result.pre_shift = 0; >>> > + result.post_shift = 0; >>> > + result.increment = 1; >>> > + return result; >>> > + } >>> > + } >>> > + >>> > + /* Bits in a uint_t */ >>> > + const unsigned UINT_BITS = sizeof(uint_t) * CHAR_BIT; >>> > + >>> > + /* The extra shift implicit in the difference between UINT_BITS and >>> > num_bits >>> > + */ >>> > + const unsigned extra_shift = UINT_BITS - num_bits; >>> > + >>> > + /* The initial power of 2 is one less than the first one that can >>> > possibly >>> > + * work. >>> > + */ >>> > + const uint_t initial_power_of_2 = (uint_t)1 << (UINT_BITS-1); >>> > + >>> > + /* The remainder and quotient of our power of 2 divided by d */ >>> > + uint_t quotient = initial_power_of_2 / D; >>> > + uint_t remainder = initial_power_of_2 % D; >>> > + >>> > + /* ceil(log_2 D) */ >>> > + unsigned ceil_log_2_D; >>> > + >>> > + /* The magic info for the variant "round down" algorithm */ >>> > + uint_t down_multiplier = 0; >>> > + unsigned down_exponent = 0; >>> > + int has_magic_down = 0; >>> > + >>> > + /* Compute ceil(log_2 D) */ >>> > + ceil_log_2_D = 0; >>> > + uint_t tmp; >>> > + for (tmp = D; tmp > 0; tmp >>= 1) >>> > + ceil_log_2_D += 1; >>> > + >>> > + >>> > + /* Begin a loop that increments the exponent, until we find a power >>> > of 2 >>> > + * that works. >>> > + */ >>> > + unsigned exponent; >>> > + for (exponent = 0; ; exponent++) { >>> > + /* Quotient and remainder is from previous exponent; compute it >>> > for this >>> > + * exponent. >>> > + */ >>> > + if (remainder >= D - remainder) { >>> > + /* Doubling remainder will wrap around D */ >>> > + quotient = quotient * 2 + 1; >>> > + remainder = remainder * 2 - D; >>> > + } else { >>> > + /* Remainder will not wrap */ >>> > + quotient = quotient * 2; >>> > + remainder = remainder * 2; >>> > + } >>> > + >>> > + /* We're done if this exponent works for the round_up algorithm. >>> > + * Note that exponent may be larger than the maximum shift >>> > supported, >>> > + * so the check for >= ceil_log_2_D is critical. >>> > + */ >>> > + if ((exponent + extra_shift >= ceil_log_2_D) || >>> > + (D - remainder) <= ((uint_t)1 << (exponent + extra_shift))) >>> > + break; >>> > + >>> > + /* Set magic_down if we have not set it yet and this exponent >>> > works for >>> > + * the round_down algorithm >>> > + */ >>> > + if (!has_magic_down && >>> > + remainder <= ((uint_t)1 << (exponent + extra_shift))) { >>> > + has_magic_down = 1; >>> > + down_multiplier = quotient; >>> > + down_exponent = exponent; >>> > + } >>> > + } >>> > + >>> > + if (exponent < ceil_log_2_D) { >>> > + /* magic_up is efficient */ >>> > + result.multiplier = quotient + 1; >>> > + result.pre_shift = 0; >>> > + result.post_shift = exponent; >>> > + result.increment = 0; >>> > + } else if (D & 1) { >>> > + /* Odd divisor, so use magic_down, which must have been set */ >>> > + assert(has_magic_down); >>> > + result.multiplier = down_multiplier; >>> > + result.pre_shift = 0; >>> > + result.post_shift = down_exponent; >>> > + result.increment = 1; >>> > + } else { >>> > + /* Even divisor, so use a prefix-shifted dividend */ >>> > + unsigned pre_shift = 0; >>> > + uint_t shifted_D = D; >>> > + while ((shifted_D & 1) == 0) { >>> > + shifted_D >>= 1; >>> > + pre_shift += 1; >>> > + } >>> > + result = util_compute_fast_udiv_info(shifted_D, num_bits - >>> > pre_shift); >>> > + /* expect no increment or pre_shift in this path */ >>> > + assert(result.increment == 0 && result.pre_shift == 0); >>> > + result.pre_shift = pre_shift; >>> > + } >>> > + return result; >>> > +} >>> > + >>> > +struct util_fast_sdiv_info >>> > +util_compute_fast_sdiv_info(sint_t D) >>> > +{ >>> > + /* D must not be zero. */ >>> > + assert(D != 0); >>> > + /* The result is not correct for these divisors. */ >>> > + assert(D != 1 && D != -1); >>> > + >>> > + /* Our result */ >>> > + struct util_fast_sdiv_info result; >>> > + >>> > + /* Bits in an sint_t */ >>> > + const unsigned SINT_BITS = sizeof(sint_t) * CHAR_BIT; >>> > + >>> > + /* Absolute value of D (we know D is not the most negative value >>> > since >>> > + * that's a power of 2) >>> > + */ >>> > + const uint_t abs_d = (D < 0 ? -D : D); >>> > + >>> > + /* The initial power of 2 is one less than the first one that can >>> > possibly >>> > + * work */ >>> > + /* "two31" in Warren */ >>> > + unsigned exponent = SINT_BITS - 1; >>> > + const uint_t initial_power_of_2 = (uint_t)1 << exponent; >>> > + >>> > + /* Compute the absolute value of our "test numerator," >>> > + * which is the largest dividend whose remainder with d is d-1. >>> > + * This is called anc in Warren. >>> > + */ >>> > + const uint_t tmp = initial_power_of_2 + (D < 0); >>> > + const uint_t abs_test_numer = tmp - 1 - tmp % abs_d; >>> > + >>> > + /* Initialize our quotients and remainders (q1, r1, q2, r2 in >>> > Warren) */ >>> > + uint_t quotient1 = initial_power_of_2 / abs_test_numer; >>> > + uint_t remainder1 = initial_power_of_2 % abs_test_numer; >>> > + uint_t quotient2 = initial_power_of_2 / abs_d; >>> > + uint_t remainder2 = initial_power_of_2 % abs_d; >>> > + uint_t delta; >>> > + >>> > + /* Begin our loop */ >>> > + do { >>> > + /* Update the exponent */ >>> > + exponent++; >>> > + >>> > + /* Update quotient1 and remainder1 */ >>> > + quotient1 *= 2; >>> > + remainder1 *= 2; >>> > + if (remainder1 >= abs_test_numer) { >>> > + quotient1 += 1; >>> > + remainder1 -= abs_test_numer; >>> > + } >>> > + >>> > + /* Update quotient2 and remainder2 */ >>> > + quotient2 *= 2; >>> > + remainder2 *= 2; >>> > + if (remainder2 >= abs_d) { >>> > + quotient2 += 1; >>> > + remainder2 -= abs_d; >>> > + } >>> > + >>> > + /* Keep going as long as (2**exponent) / abs_d <= delta */ >>> > + delta = abs_d - remainder2; >>> > + } while (quotient1 < delta || (quotient1 == delta && remainder1 == >>> > 0)); >>> > + >>> > + result.multiplier = quotient2 + 1; >>> > + if (D < 0) result.multiplier = -result.multiplier; >>> > + result.shift = exponent - SINT_BITS; >>> > + return result; >>> > +} >>> > diff --git a/src/util/fast_idiv_by_const.h >>> > b/src/util/fast_idiv_by_const.h >>> > new file mode 100644 >>> > index 0000000..e8debbf >>> > --- /dev/null >>> > +++ b/src/util/fast_idiv_by_const.h >>> > @@ -0,0 +1,173 @@ >>> > +/* >>> > + * Copyright © 2018 Advanced Micro Devices, Inc. >>> > + * >>> > + * Permission is hereby granted, free of charge, to any person >>> > obtaining a >>> > + * copy of this software and associated documentation files (the >>> > "Software"), >>> > + * to deal in the Software without restriction, including without >>> > limitation >>> > + * the rights to use, copy, modify, merge, publish, distribute, >>> > sublicense, >>> > + * and/or sell copies of the Software, and to permit persons to whom >>> > the >>> > + * Software is furnished to do so, subject to the following conditions: >>> > + * >>> > + * The above copyright notice and this permission notice (including the >>> > next >>> > + * paragraph) shall be included in all copies or substantial portions >>> > of the >>> > + * Software. >>> > + * >>> > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, >>> > EXPRESS OR >>> > + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF >>> > MERCHANTABILITY, >>> > + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT >>> > SHALL >>> > + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR >>> > OTHER >>> > + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, >>> > ARISING >>> > + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER >>> > DEALINGS >>> > + * IN THE SOFTWARE. >>> > + */ >>> > + >>> > +#ifndef FAST_IDIV_BY_CONST_H >>> > +#define FAST_IDIV_BY_CONST_H >>> > + >>> > +/* Imported from: >>> > + * >>> > <a href="https://raw.githubusercontent.com/ridiculousfish/libdivide/master/divide_by_constants_codegen_reference.c" rel="noreferrer" target="_blank">https://raw.githubusercontent.com/ridiculousfish/libdivide/master/divide_by_constants_codegen_reference.c</a> >>> > + */ >>> > + >>> > +#include <inttypes.h> >>> > +#include <limits.h> >>> > +#include <assert.h> >>> > + >>> > +/* You can set these to different types to get different precision. */ >>> > +typedef int32_t sint_t; >>> > +typedef uint32_t uint_t; >>> > + >>> > +/* Computes "magic info" for performing signed division by a fixed >>> > integer D. >>> > + * The type 'sint_t' is assumed to be defined as a signed integer type >>> > large >>> > + * enough to hold both the dividend and the divisor. >>> > + * Here >> is arithmetic (signed) shift, and >>> is logical shift. >>> > + * >>> > + * To emit code for n/d, rounding towards zero, use the following >>> > sequence: >>> > + * >>> > + * m = compute_signed_magic_info(D) >>> > + * emit("result = (m.multiplier * n) >> SINT_BITS"); >>> > + * if d > 0 and m.multiplier < 0: emit("result += n") >>> > + * if d < 0 and m.multiplier > 0: emit("result -= n") >>> > + * if m.post_shift > 0: emit("result >>= m.shift") >>> > + * emit("result += (result < 0)") >>> > + * >>> > + * The shifts by SINT_BITS may be "free" if the high half of the full >>> > multiply >>> > + * is put in a separate register. >>> > + * >>> > + * The final add can of course be implemented via the sign bit, e.g. >>> > + * result += (result >>> (SINT_BITS - 1)) >>> > + * or >>> > + * result -= (result >> (SINT_BITS - 1)) >>> > + * >>> > + * This code is heavily indebted to Hacker's Delight by Henry Warren. >>> > + * See <a href="http://www.hackersdelight.org/HDcode/magic.c.txt" rel="noreferrer" target="_blank">http://www.hackersdelight.org/HDcode/magic.c.txt</a> >>> > + * Used with permission from >>> > <a href="http://www.hackersdelight.org/permissions.htm" rel="noreferrer" target="_blank">http://www.hackersdelight.org/permissions.htm</a> >>> > + */ >>> > + >>> > +struct util_fast_sdiv_info { >>> > + sint_t multiplier; /* the "magic number" multiplier */ >>> > + unsigned shift; /* shift for the dividend after multiplying */ >>> > +}; >>> > + >>> > +struct util_fast_sdiv_info >>> > +util_compute_fast_sdiv_info(sint_t D); >>> > + >>> > +/* Computes "magic info" for performing unsigned division by a fixed >>> > positive >>> > + * integer D. The type 'uint_t' is assumed to be defined as an unsigned >>> > + * integer type large enough to hold both the dividend and the divisor. >>> > + * num_bits can be set appropriately if n is known to be smaller than >>> > + * the largest uint_t; if this is not known then pass >>> > + * "(sizeof(uint_t) * CHAR_BIT)" for num_bits. >>> > + * >>> > + * Assume we have a hardware register of width UINT_BITS, a known >>> > constant D >>> > + * which is not zero and not a power of 2, and a variable n of width >>> > num_bits >>> > + * (which may be up to UINT_BITS). To emit code for n/d, use one of the >>> > two >>> > + * following sequences (here >>> refers to a logical bitshift): >>> > + * >>> > + * m = compute_unsigned_magic_info(D, num_bits) >>> > + * if m.pre_shift > 0: emit("n >>>= m.pre_shift") >>> > + * if m.increment: emit("n = saturated_increment(n)") >>> > + * emit("result = (m.multiplier * n) >>> UINT_BITS") >>> > + * if m.post_shift > 0: emit("result >>>= m.post_shift") >>> > + * >>> > + * or >>> > + * >>> > + * m = compute_unsigned_magic_info(D, num_bits) >>> > + * if m.pre_shift > 0: emit("n >>>= m.pre_shift") >>> > + * emit("result = m.multiplier * n") >>> > + * if m.increment: emit("result = result + m.multiplier") >>> > + * emit("result >>>= UINT_BITS") >>> > + * if m.post_shift > 0: emit("result >>>= m.post_shift") >>> > + * >>> > + * The shifts by UINT_BITS may be "free" if the high half of the full >>> > multiply >>> > + * is put in a separate register. >>> > + * >>> > + * saturated_increment(n) means "increment n unless it would wrap to >>> > 0," i.e. >>> > + * if n == (1 << UINT_BITS)-1: result = n >>> > + * else: result = n+1 >>> > + * A common way to implement this is with the carry bit. For example, >>> > on x86: >>> > + * add 1 >>> > + * sbb 0 >>> > + * >>> > + * Some invariants: >>> > + * 1: At least one of pre_shift and increment is zero >>> > + * 2: multiplier is never zero >>> > + * >>> > + * This code incorporates the "round down" optimization per >>> > ridiculous_fish. >>> > + */ >>> > + >>> > +struct util_fast_udiv_info { >>> > + uint_t multiplier; /* the "magic number" multiplier */ >>> > + unsigned pre_shift; /* shift for the dividend before multiplying */ >>> > + unsigned post_shift; /* shift for the dividend after multiplying */ >>> > + int increment; /* 0 or 1; if set then increment the numerator, using >>> > one of >>> > + the two strategies */ >>> > +}; >>> > + >>> > +struct util_fast_udiv_info >>> > +util_compute_fast_udiv_info(uint_t D, unsigned num_bits); >>> > + >>> > +/* Below are possible options for dividing by a uniform in a shader >>> > where >>> > + * the divisor is constant but not known at compile time. >>> > + */ >>> > + >>> > +/* Full version. */ >>> > +static inline unsigned >>> > +fast_udiv(unsigned n, struct util_fast_udiv_info info) >>> > +{ >>> > + n = n >> info.pre_shift; >>> > + /* For non-power-of-two divisors, use a 32-bit ADD that clamps to >>> > UINT_MAX. */ >>> > + n = (((uint64_t)n + info.increment) * info.multiplier) >> 32; >>> > + n = n >> info.post_shift; >>> > + return n; >>> > +} >>> > + >>> > +/* A little more efficient version if n != UINT_MAX, i.e. no unsigned >>> > + * wraparound in the computation. >>> > + */ >>> > +static inline unsigned >>> > +fast_udiv_nuw(unsigned n, struct util_fast_udiv_info info) >>> > +{ >>> > + assert(n != UINT_MAX); >>> > + n = n >> info.pre_shift; >>> > + n = n + info.increment; >>> > + n = ((uint64_t)n * info.multiplier) >> 32; >>> > + n = n >> info.post_shift; >>> > + return n; >>> > +} >>> > + >>> > +/* Even faster version but both operands must be 31-bit unsigned >>> > integers >>> > + * and the divisor must be greater than 1. >>> > + * >>> > + * info must be computed with num_bits == 31. >>> > + */ >>> > +static inline unsigned >>> > +fast_udiv_u31_d_not_one(unsigned n, struct util_fast_udiv_info info) >>> > +{ >>> > + assert(info.pre_shift == 0); >>> > + assert(info.increment == 0); >>> > + n = ((uint64_t)n * info.multiplier) >> 32; >>> > + n = n >> info.post_shift; >>> > + return n; >>> > +} >>> > + >>> > +#endif >>> > diff --git a/src/util/meson.build b/src/util/meson.build >>> > index 027bc5b..ebaeb47 100644 >>> > --- a/src/util/meson.build >>> > +++ b/src/util/meson.build >>> > @@ -27,20 +27,22 @@ files_mesa_util = files( >>> > 'bitscan.h', >>> > 'bitset.h', >>> > 'build_id.c', >>> > 'build_id.h', >>> > 'crc32.c', >>> > 'crc32.h', >>> > 'debug.c', >>> > 'debug.h', >>> > 'disk_cache.c', >>> > 'disk_cache.h', >>> > + 'fast_idiv_by_const.c', >>> > + 'fast_idiv_by_const.h', >>> > 'format_r11g11b10f.h', >>> > 'format_rgb9e5.h', >>> > 'format_srgb.h', >>> > 'futex.h', >>> > 'half_float.c', >>> > 'half_float.h', >>> > 'hash_table.c', >>> > 'hash_table.h', >>> > 'list.h', >>> > 'macros.h', >>> > >>> >>> _______________________________________________ >>> mesa-dev mailing list >>> <a href="mailto:mesa-dev@lists.freedesktop.org" target="_blank">mesa-dev@lists.freedesktop.org</a> >>> <a href="https://lists.freedesktop.org/mailman/listinfo/mesa-dev" rel="noreferrer" target="_blank">https://lists.freedesktop.org/mailman/listinfo/mesa-dev</a> </blockquote></div></div>