[Liboil] Orc-0.4.2

Fri Dec 11 04:54:40 PST 2009

Hi David,

2009/7/14 David Schleef <ds at entropywave.com>:
> ORC - The Oil Runtime Compiler

I've made a branch for libvips to experiment with ORC. vips is a
scientific image processing library, http://www.vips.ecs.soton.ac.uk

I tried just adding an ORC path for im_add() (add two images). It
seems to work and wasn't too hard (if I've done it right). The code I
ended up with is here:

http://vips.svn.sourceforge.net/viewvc/vips/vips7/branches/orc/libvips/arithmetic/im_add.c?view=markup

It's very fast for 8 + 8 -> 16 operations. wtc.v is a 10,000 x 10,000
pixel unsigned 8-bit RGB image. Here's plain VIPS:

$ time vips im_add wtc.v wtc.v wtc2.v
real	0m10.360s
user	0m1.880s
sys	0m1.400s

That's using something this for the inner loop:

  for( i = 0; i < n; i++ )
    d[i] = s1[i] + s2[i];

If I turn on ORC:

$ time vips --vips-orc im_add wtc.v wtc.v wtc2.v
real	0m9.668s
user	0m0.530s
sys	0m1.410s

That's doing this for the inner loop:

        orc_program_append_ds_str( p, "convubw", "t1", "s1" );
        orc_program_append_ds_str( p, "convubw", "t2", "s2" );
        orc_program_append_str( p, "addusw", "d1", "t1", "t2" );

So about a 3x to 4x speedup, very nice!

It's not so great for u16 + u16 -> u32:

$ time vips im_add wtc2.v wtc2.v wtc4.v
real	0m19.518s
user	0m1.650s
sys	0m2.760s

$ time vips --vips-orc im_add wtc2.v wtc2.v wtc4.v
real	0m40.785s
user	0m34.900s
sys	0m3.180s

I guess it's falling back to emulation for some reason.

Comments and questions:

* I added liboil support a few years ago, but it never managed to
produce much of a speedup, the supplied operators just weren't a good
match for my needs. ORC looks much more promising, thank you very much
for making this thing.

* I guess the higher-bit depth operators have not yet been
implemented, is this right? Or have I messed up?

* I'm still a bit confused by the opcode names. "addusw" is "add
unsigned short word" I guess, "addssw" is the signed version, so what
is "addw"? I read opcodes.h and orcopcodes.c but it didn't help me
much :(

* I'd like to try addf/addg, are they working yet? I get "unknown opcode".

* I'd also like to experiment with ORC for our 2D convolution, but I'd
need addressing and looping (I think). Is this planned soon? Or did I
miss it?

John