[Pixman] [PATCH 2/2] ARM: Add 'neon_composite_over_n_8888_0565' fast path
sandmann at cs.au.dk
Wed Apr 6 21:15:08 PDT 2011
Siarhei Siamashka <siarhei.siamashka at gmail.com> writes:
> Of course, people reading the source code need to know about this
> "convention". And it has its own disadvantages too. If anyone can
> propose something more maintainable and easier to read, I'm all ears.
> Maybe changing to the use of native codegenerator to compile fast path
> code at runtime could make it easier. If we do a good job teaching it
> to know instruction scheduling rules well enough.
As a tangent, I took a look at LuaJIT's dynamic assembler:
It's MIT licensed and looks quite interesting. It has a clever idea,
where it does almost all the assembling at compile time, so that there
is no need to have a runtime assembler with "emit_movzx()" type
Here is how it works: A mix of C and assembly is preprocessed. The C
code is emitted directly; the assembly is converted to machine code with
dummy labels. Then, at runtime this bytecode is interpreted, which emits
and links the machine code.
The advantage of this scheme is that the runtime component doesn't need
to know anything about instruction encodings or addressing modes, so it
can be really tiny - a few kilo0bytes or so. It also means you can write
real assembly instead of calling emit_*() functions.
However, a downside is that it could be difficult to do good code
scheduling since it seems it would work best if it can stitch together
pre-written blocks of assembly, much like the code generator macros do
for the NEON fast paths.
Other potential issues is that Lua would become a build-time dependency
for pixman since the preprocessor is written in Lua, and that it
currently doesn't support NEON, though presumably he would take patches.
Anyway, it seems to me to be worth taking a closer look at it to see if
it could be suitable as the basis of a pixman JIT compiler.
> I think it might be interesting for you. I also have the following
> experimental branch:
> It collects statistics about what operations do not have optimized
> fast paths, along with the number of uses of these operations, total
> number of pixels processed, average number of pixels per operation and
> average scanline length. The code is currently linux specific and
> writes results to syslog. These results can be converted into a more
> human readable form by a script. I'm using it quite successfully and
> it revealed some of the missing optimizations which would be hard to
> identify in some other way.
I think one issue that prevented that from going into pixman proper was
that there was no good way to get the computed flags down to the general
code path. If so, it might be interesting to combine it with this
in which the composite arguments are passed in a stack allocated struct
instead of as function arguments. The computed flags could then be
stored in that struct too with only minimal overhead.
More information about the Pixman