[Pixman] [PATCH 2/4] Added fast path for "pad" type repeats

Wed Mar 6 15:54:09 PST 2013

Hi,

> > For this one, did you try comparing Chris' patch to your on ARMv6?
> 
> Not previously, but I just checked. These are the times for
> t-firefox-chalkboard on ARMv6:
> 
> Current HEAD:  35.5 s
> Chris's patch:  9.7 s
> My patch:       6.6 s
> 
> They're not necessarily mutually exclusive, as mine won't accelerate
> operations where the source and mask need to be padded differently.
> 
> Since there haven't been any comments on the way in which I
> implemented
> fallback to general_composite_rect(), there's just the minor issue of
> updating .gitignore before you do push it - I'll reissue this patch
> with
> that in.

First, sorry to be slow to respond to this. Part of the reason for
that is that I don't really like this patch for a couple of reasons,
but don't have a really convincing alternative proposal.

At a high level, it is pretty clear that non-transformed images are
sampled outside their bounds often enough that special-cased support
is probably warranted. There is already support for NORMAL repeat, and
your patch adds PAD, but images with NONE extension could also be
optimized in various cases, either by clipping the border away or
implementing it with a solid fill depending on the operator.

But on the other hand, both the existing fast_path_composite_tiled()
and especially your patch have much more complex behavior than other
composite routines. I have some specific complaints below, but even if
those were fixed, fast_composite_pad_repeat() would still be a rather
complex function.

It is tempting to do the support for these operations at the
pixman_image_composite32() level by breaking them into smaller
composites before entering the fast path system, but that has the
significant drawback that it prevents the architecture specific fast
paths from doing the whole operation by themselves.

Here are some specific issues with this patch:

- The flags flags of images should be computed with
  _pixman_image_validate(), not just overwritten.

  Also, I'm skeptical about simply reusing the solid function that was
  looked up. There is nothing in principle that prevents the flags
  from depending on the color in the image, and as such every time the
  pixel in the solid fill image changes, a new validate/lookup cycle
  should really be done.

- The fallback to the general code that you have is not how the fast
  path system is intended to work. When a fast path has been chosen,
  it is supposed to carry out the operation by itself as fast as
  possible and not fall back to slower code. There could in principle
  (and at some point there may) be other fast paths in between this
  one and the general code.

  It may be better in this case to add a new flag that indicates that
  the properties required for the mask are present, or simply require
  the mask to be NULL. Either of these would sidestep the issue of
  falling back to general_composite_rectangle().

- You added a simple test program (thanks for that, that already puts
  you ahead of most people), but it only checks one single composite
  operation. This fast path is complex enough that I think the test
  should verify many more alignments of source and destination (and
  mask if support for that is kept), and more formats and operators.

  Also, the test should be added in a separate commit so that it is
  easy to verify that CRC values don't change.

- Some minor issues:
  - Variable declarations should be at the top of blocks (as in C89,
    not C99).

  - This:

+    src_flags = (info->src_flags & ~FAST_PATH_NORMAL_REPEAT) |
+                    FAST_PATH_SAMPLES_COVER_CLIP_NEAREST;

    seems like it should be ~FAST_PATH_PAD_REPEAT instead?

But the biggest objection I have is that there is just way too much
code here for an operation that is ultimately not that common. There
are nine separate blocks of code that are almost exactly alike.
Unfortunately, this is the objection that I can't don't have a very
good alternate proposal for. The best I can come up with is something
like the pseudo-code below, but I'm not sure how well it will actually
work.

The basic idea is to break the operation down into nine smaller
composites, where the nine source images are subimages of the original
source with repeat mode NORMAL. This takes advantage of two existing
optimizations: the fast_path_tiled() for 1xn images and the solid fast
paths for repeating 1x1 images.

      struct info_t
      {
          int pos;
          int composite_size;
          int src_pos;
          int src_size;
      };

      info_t columns[] = {
          { dest_x, - src_x, 0, 1 },
          { dest_x - src_x, src_width, 0, src_width },
          { dest_x - src_x + src_width, width - src_width + src_x,
          src_width - 1, 1 },
      };

      info_t columns[] = {
          { dest_y, - src_y, 0, 1 },
          { dest_y - src_y, src_height, 0, src_height },
          { dest_y - src_y + src_height, height - src_height + src_y,
          src_height - 1, 1 },
      };

      for (i = 0; i < 3; ++i)
      {
          for (j = 0; j < 3; ++j)
          {
                info.x = columns[j].pos;
                info.y = rows[i].pos;
                info.width = columns[j].composite_size;
                info.height = rows[i].composite_size;

                create image from source bits at rectangle
                      [ columns[j].src_pos, rows[i].src_pos,
                        columns[j].src_size, rows[i].src_size,
                      ]
                set NORMAL repeat on this image
                call _pixman_image_validate();

                lookup composite function for image

                intersect info.{x,y,width,height} with composite rect;

                call composite function
          }
      }

An issue here is that if the single pixels in the corners are opaque and
the operator is OVER, we'd ideally want to do the 'strength reduction'
from optimize_operator() in pixman.c to convert the OVER to SRC instead
so that the operation becomes a solid fill (or even a noop) rather than
an over_n_8888(). It is tempting here to just call pixman_image_composite32()
recursively here instead of doing the composite lookup manually, but it
wouldn't surprise me if that causes issues. Possibly optimize_operator()
should just be exported instead.

> > Also, did we ever find out whether it was
> > a bug in Firefox. I'm still somewhat skeptical that it's intended
> > for a
> > PAD image to be accessed so far out of bounds.
> 
> No idea, though I find it hard to imagine that the single-pixel-wide
> images weren't deliberately being extended a long way, as they're not
> very useful otherwise. Obviously they could use tile repeat for such
> images and it would have the same result, though it would boil down to
> src_8888_8888 calls rather than src_n_8888 and so I expect that would
> be slower.

But the image here was a 128 x 128 image with a big border. That seems
much less likely to be deliberately PAD extended. In fact, looking at
that image, it looks like a noise texture designed for NORMAL repeating.

Soren