pixman 0.16.2 and 0.17.x

Wed Sep 2 14:27:43 PDT 2009

Hi,

Now that pixman 0.16.0 is out, it's time to look at what comes next.

0.16.2

We'll need an 0.16.2 release to fix ARM, Solaris and Windows x64 build
issues. People should feel free to push these fixes to master and
cherry-pick them to the new 0.16 branch. Once we have all the fixes
in, I'll make a 0.16.2 release.

0.17.x:

There is a couple of patches (Chris' AM_SILENT_RULES patch and Joonas'
fix for a1 trap sampling) that as far as I know are ready to go
in. Feel free to push those to master.

I don't expect to have a lot of time for pixman development myself for
0.17.x. The things I do plan to do are:

    - Look at Jeff's image scaling patch

    - Merge a couple of straight-forward performance fixes:
         - Andre's bilinear optimization
	 - Unroll branch
	   	  http://cgit.freedesktop.org/~sandmann/pixman/log/?h=unroll
           (unless this is superseded by macro based fast paths (see below))
         - Enable disabled SSE2 fast path

    - Code review and releases as necessary.

Other than that, for 0.17.x I hope people will look into getting some
of the more straight-forward performance improvements in. I have
included a non-exhaustive list of ideas below.

There is also the newly updated list of pixman projects here:

       http://people.freedesktop.org/~sandmann/pixman-projects.html

with some more blue-sky projects such as multithreading, polygon
images and JIT compilers.

Ideas for 0.17.x:

- More architecture specific fast paths.

  - Siarhei has new NEON fast paths

  - Fast paths for other architectures is welcome too

- Faster fast paths

  On some benchmarks _pixman_run_fast_path() shows up quite high.
  This branch:

	http://cgit.freedesktop.org/~sandmann/pixman/log/?h=faster-fast-path

  contains two straight-forward ideas:

    - It puts a cache in fron of _pixman_run_fast_path()

    - It computes the properties to match against up front instead of
      on each iteration.

  To make this mergable, we'll need configure time checks for thread
  local storage. 

- Even faster fast paths

  - Add negative caching (ie., cache the fact that we *didn't* find a
    fast path). Then make the cache cover all of the fast path tables
    so that only one cache look-up will be necessary per composite.

  - Treat general_composite() as a fast path that will always get
    hit. Then get rid of the implementation->composite() method and
    just have the implementations export a fast path table.

- Use the preprocessor to make fast paths

  The idea is demonstrated here:

       http://cgit.freedesktop.org/~sandmann/pixman/commit/?h=macro-fast-path&id=c6442a16a64ea833c444a424d56fa9ae9c0f9e6a

  In that commit the macro is being used for fetchers, but I
  think the same idea may be applicable to full fast paths,
  including SIMD fast paths.

  Siarhei's scaling fast paths 

       http://lists.cairographics.org/archives/cairo/2009-May/017211.html   

  could probably be done this way too.

- Text performance improvements.

  Software glyph rendering with X is rather slow.

  The most important fix here is to keep pixman_images around as
  private data attached to a Picture instead of creating three of them
  on each request.

  Even with that, it may make sense to add more explicit support for
  glyph rendering to pixman. A new

       		pixman_glyphs_t 

  data type along with a pixman_composite_glyphs_t() entry point to
  match the corresponding Render API might be useful.

  This will also allow us to make the cairo image backend
  significantly faster for glyphs.

- Allowing implementations to plug in fetchers

  The idea is to virtualize images so that implementations can plug in
  their own scanline fetchers. This is needed to fix these bugs:

        https://bugs.freedesktop.org/show_bug.cgi?id=20709
  	https://bugs.freedesktop.org/show_bug.cgi?id=21173

  that Steve Snyder filed.

  Gradients are another obvious application.

Thanks,
Soren