[Liboil] Orc-0.4.2
jcupitt at gmail.com
jcupitt at gmail.com
Fri Dec 11 04:54:40 PST 2009
Hi David,
2009/7/14 David Schleef <ds at entropywave.com>:
> ORC - The Oil Runtime Compiler
I've made a branch for libvips to experiment with ORC. vips is a
scientific image processing library, http://www.vips.ecs.soton.ac.uk
I tried just adding an ORC path for im_add() (add two images). It
seems to work and wasn't too hard (if I've done it right). The code I
ended up with is here:
http://vips.svn.sourceforge.net/viewvc/vips/vips7/branches/orc/libvips/arithmetic/im_add.c?view=markup
It's very fast for 8 + 8 -> 16 operations. wtc.v is a 10,000 x 10,000
pixel unsigned 8-bit RGB image. Here's plain VIPS:
$ time vips im_add wtc.v wtc.v wtc2.v
real 0m10.360s
user 0m1.880s
sys 0m1.400s
That's using something this for the inner loop:
for( i = 0; i < n; i++ )
d[i] = s1[i] + s2[i];
If I turn on ORC:
$ time vips --vips-orc im_add wtc.v wtc.v wtc2.v
real 0m9.668s
user 0m0.530s
sys 0m1.410s
That's doing this for the inner loop:
orc_program_append_ds_str( p, "convubw", "t1", "s1" );
orc_program_append_ds_str( p, "convubw", "t2", "s2" );
orc_program_append_str( p, "addusw", "d1", "t1", "t2" );
So about a 3x to 4x speedup, very nice!
It's not so great for u16 + u16 -> u32:
$ time vips im_add wtc2.v wtc2.v wtc4.v
real 0m19.518s
user 0m1.650s
sys 0m2.760s
$ time vips --vips-orc im_add wtc2.v wtc2.v wtc4.v
real 0m40.785s
user 0m34.900s
sys 0m3.180s
I guess it's falling back to emulation for some reason.
Comments and questions:
* I added liboil support a few years ago, but it never managed to
produce much of a speedup, the supplied operators just weren't a good
match for my needs. ORC looks much more promising, thank you very much
for making this thing.
* I guess the higher-bit depth operators have not yet been
implemented, is this right? Or have I messed up?
* I'm still a bit confused by the opcode names. "addusw" is "add
unsigned short word" I guess, "addssw" is the signed version, so what
is "addw"? I read opcodes.h and orcopcodes.c but it didn't help me
much :(
* I'd like to try addf/addg, are they working yet? I get "unknown opcode".
* I'd also like to experiment with ORC for our 2D convolution, but I'd
need addressing and looping (I think). Is this planned soon? Or did I
miss it?
John
More information about the Liboil
mailing list