[Pixman] [PATCH] Possible overrun in bilinear_interpolate_line_sse

Sun May 29 05:35:04 PDT 2011

On Fri, May 27, 2011 at 3:53 PM, Siarhei Siamashka
<siarhei.siamashka at gmail.com> wrote:
> On Fri, May 27, 2011 at 1:28 PM, Makoto Kato <m_kato at ga2.so-net.ne.jp> wrote:
>> Hi, all.
>>
>> When width isn't multiples of 4 , it may cause buffer overrun in
>> bilinear_interpolate_line_sse().  Because ((uint32)-1) & 2 is true.
>
> Could you provide a more specific example of problematic width which
> leads to buffer overrun in your opinion?

If you are getting -1 value reaching "if (width & 2)" and "if (width &
1)" code parts, then the original width value was 3 just before it got
decremented by 4 on the last loop iteration. But if we need to process
3 remaining pixels, then both "if (width & 2)" and "if (width & 1)"
need to be true (the former processes 2 pixels, the latter processes 1
pixel, and 2 + 1 = 3). So this code should work correctly for any
non-negative widths. Except that this admittedly is only valid for
two's complement systems. However, the systems using different
representation for negative numbers seem to be already extinct and
even gcc supports two's complement integer types only:
    http://gcc.gnu.org/onlinedocs/gcc/Integers-implementation.html

It is possible to aim perfect portability, but without having access
to a real one's complement system for testing, I don't think that it
is even worth the efforts. Also one more theoretical portability issue
is the handling of right arithmetic shift for negative numbers.
Anyway, I believe that most of these issues should be easily
detectable when running 'make check' if anybody is ever going to port
pixman to some non two's complement system.

The reason for handling this loop in this way is that "while ((width
-= 4) >= 0) { ... }" code can be faster than "while (width >= 4) {
...; width -= 4; }" by using one instruction less, just because it can
"subtract 4/jump if non-negative" instead of "subtract 4/compare with
4/jump if larger or equal". I did some tests with various available
compilers for ARM and x86 a few years ago, and this way of handling
loops was just a bit faster overall. Surely, having one instruction
less is not a big deal for bilinear scaling, but nearest scaling
benefits from every saved CPU cycle in order to be sure that it is
fast enough to utilize all the available memory bandwidth:
    http://cgit.freedesktop.org/pixman/tree/pixman/pixman-fast-path.c?id=pixman-0.22.0#n1411

-- 
Best regards,
Siarhei Siamashka