# [Pixman] [cairo] Planar YUV support

Bill Spitzak spitzak at gmail.com
Fri Mar 4 13:47:58 PST 2011

```Soeren Sandmann wrote:
> Siarhei Siamashka <siarhei.siamashka at gmail.com> writes:

>>> The pipeline as it is now:
>>>
>>>    1 convert image sample to a8r8g8b8
>>>    2 extend sample grid in all directions according to repeat
>>>    3 interpolate between sample according to filter
>>>    4 transform
>>>    5 resample
>>>    6 combine
>>>    7 store
>
>> What is the difference between "3 interpolate between sample according
>> to filter" and "5 resample"?
>
> The output of stage 3 is an image that is defined on all of the real
> plane. There are no pixels any more, so there is no question about what
> stage 4, "transform", means. Stage 5 converts back to pixels by point
> sampling.

This is a very poor description of what should be happening. You cannot
do Stage 5 as a point sample. This is what the bilinear interpolation is
doing, and everybody should have realized by now that the output image
is no good for scales less than .5.

It is MUCH better to combine steps 3,4,5 together. The goal is to
produce a pixel in the output coordinate system. This is done by making
up a filter that will vary depending on the transform and the output
pixel, applying it to the source image, and the result is the output
pixel. It is absolutely impossible to do a "sample" step last that does
not take into account the transform.

For affine transforms an output pixel maps to a parallelogram on the
input image. This parallelogram can be much bigger or much smaller than
a single pixel. The parallelogram has 6 degres of freedom. It is obvious
that two numbers cannot describe the parallelogram, and therefore you
cannot use point sampling (no matter how fancy your bilinear
interpolation is) to produce the output image.

>>> have to be inserted before the first:
>>>
>>>   -2 interpolate subsampled components of YUV to get the same
>>>      resolution as the Y plane
>>>
>>>   -1 if the format is planar, stitch together components to form YUV
>>>      pixels
>>>
>>>    0 convert to sRGB
>>>
>>> Stage -2 is important because the filter used in that interpolation
>>> should probably be user-specifiable eventually, which has the
>>> implication that whatever simple support is added first, it needs to be
>>> clear what filter precisely is being used.

No I do not think the filter for UV should be "user specified". You are
adding meaningless complexity to the API and actually *preventing* the
interpolation from being improved.

It is quite possible to merge step -2 with the transform. The
parallelogram I described above would be 1/2 as big for the UV planes.
Also it may be shifted 1/2 pixel between U and V (because most producers
subsample the UV by averaging different pairs of pixels).

It does mean you cannot do "extend" of "black" as an earlier step.
However I very strongly believe that the current cairo behavior is not
wanted by anybody and is inefficient on modern hardware. See below about
this.

>>> Stage 0 is a color space conversion and need to eventually be
>>> configurable too, which means it has to be specified which matrix is
>>> being used.

I believe some assumptions can be made about the color samples so that
stage 0 can be moved to a later point.

All color spaces of interest have orthogonal channels which can be
filtered independently. Thus the filtering can be done before conversion.

If a channel is non-linear, it technically will effect the filtering.
See below for comments on why I think this may not be necessary. Even if
it is necessary, all interesting non-linear channels are so close to a
power of 2 that a single alternate filter that squares the input image,
applies the same filter, and does the square root, will produce an
answer that is accurate to 5 bits for the worst case of a white pixel
next to black, and well over 12 bits for most photographic images.

> Note that if some day we add compositing in linear RGB, the alternative
> process breaks down because the initial interpolation will be taking
> place in non-linear color space, whereas with intermediates in linear
> RGB, you'd want to do the second interpolation (but not the first) in
> linear light.

I do not think there is a requirement that the transform filtering be
done in linear space. It could be useful, but it will not completely
break doing the rest of the composite in linear RGB.

The reason is that for low-contrast images the gamma curve between two
adjacent pixels is extremely close to a straight line and thus the
result is almost identical.

There are problems with doing transforms in linear space:

For very large scales of high contrast images users are unhappy with
true linear filtering and prefer the gamma filtering. The reason is that
once the pixels become visible it becomes a perceptual rather than
physical appearance and the image just looks "wrong". This will mostly
effect "magnifier" applications for enlarging already-rendered text.

Linear filtering can also have very nasty side effects if the images are
premultiplied. The premultiplied pixels have been stored at much lower
resolution, in effect, and linearization can produce very bright colors
that will produce artifacts when blended with neighboring pixels.

If you do linear filtering it may only want to be done for scales less
than 1. Also there is no need to do it on color spaces where high
contrast is already poorly supported, so there is no need to do it to
the UV channels.

> There is also a question of what to do with YUV images with a
> non-premultiplied alpha channel. Interpolating the samples of such an
> image direclty is definitely wrong, but it may be that simply
> premultiplying first will work.

Filtering non-premulitplied data is a problem with all data formats, not
just YUV!

The problem is that where the alpha is zero the color is often black. A
filter that covers this area will bleed black into the object, making
the resulting image as though the object turns slightly darker at the
edges. The only way to get "correct" results is to ignore the
contribution of alpha zero pixels to the filter for the color channels.
Depending on the source you may have to ignore tiny alphas as well (some
programs produce this with black due to internal filtering).

You do have to watch out for "premultiplied" YUV where the UV channels
go towards what is really the maximum negative value, rather than the
neutral value, as the alpha goes to zero. You can easily correct these
by adding (255-alpha)/2 to the UV channels.

> The two-interpolation pipeline has the practical benefit that chroma
> reconstruction can be done in the fetchers, at least as long as the
> chroma filter is fixed, where as the one-step process means the general
> code for bilinear filtering would have to sample each component
> individually, then filter, and then do a color conversion. It would no
> longer be able to simply ask the underlying system to fetch an RGB
> pixel.

Here is the steps as I see it, with the parts that I believe CANNOT be
separated are made a single step:

1. Widen to 8 bit components
2. Extend sample grid but use "repeat" for "black outside"
3. Transform/filter to 1 sample per output pixel
4. Convert to interlaced
5. Convert to sRGB
6. Do "black outside" by multiplying by an antialiased quad
7. Composite into output buffer

```