[PATCH v2] Try and get overlapping cases fixed.

Mon May 16 10:01:44 PDT 2011

On 5/16/11 12:26 PM, Jeremy Huddleston wrote:
> Is the one div needed for:
>
> bpp / 8
 > bpp % 8
>
> really universally faster than the two bitwise ops needed for
>
> bpp >> 3
 > bpp & 0x7
>
> ?  I'm sure most modern compilers will know how to optimize that
> based on the target CPU, but I've always tried to avoid doing mults
> and divs in fast paths where possible.

Even if it's ten cycles slower, I'm going to wager it pales next to the 
hundreds-to-millions of cycles of memcpy.

- ajax