Video standards

Pekka Paalanen pekka.paalanen at haloniitty.fi
Mon Apr 8 11:42:28 UTC 2024


On Fri, 5 Apr 2024 19:16:55 -0300
salsaman <salsaman at gmail.com> wrote:

> On Fri, 5 Apr 2024 at 12:57, Pekka Paalanen <pekka.paalanen at haloniitty.fi>
> wrote:
> 
> > On Fri, 5 Apr 2024 08:28:27 -0300
> > salsaman <salsaman at gmail.com> wrote:
> >  
> > > I don't think you are paying enough attention to the main points. Ir is  
> > not  
> > > simply a case of extending the fourcc values to include more. If I didn't
> > > make it clear enough, the whole fourcc system is obscure, inadequate,
> > > ambiguous. The only reason ever to use it would be when you don't have  
> > meta  
> > > data and you are forced to encode the format in the first 4 bytes.  
> >
> > Right. You must be talking about some other fourcc system. There are
> > many of them, and some combine multiple orthogonal things into a single
> > enumeration, which then becomes very difficult to extend and work with.
> >
> > drm_fourcc.h is not one of those.
> >  
> 
> I am talking about any system which tries to enumerate palettes (pixel
> formats) in four bytes in a non sequential way.
> In my own system (Weed) for example, all RGB palettes are in the range 1 -
> 511, yuv palettes are 512 - 1023, alpha are 1024 +

Interesting use of the term "palette". I've never heard of it being
used synonymous to pixel format. The only usage of "palette" I've
seen so far in the context of digital imagery is a mapping from
color-index values to color channel value tuples, a form of look-up
table.

With such profound disconnect on terminology, it is no wonder we
cannot seem to be able to communicate efficiently. Forgive me the
lengthy writings below, I'm just trying to avoid misunderstandings. You
and I obviously have very different underlying assumptions of what we
are even doing.

> In fact this header is enough to define every possible palette, there are
> standard enumerations for the most commonly used palettes, and
> advanced palettes allows for the composition of new ones. In there also I
> have symbolic names for gamma types and yuv details,
> 
> interlacing and flags for pre-posr alpha are kept in another header,
> 
> 
> 
> 
> >
> > Metadata is always necessary anyway, either implied or explicit.
> >  
> 
> Exactly, so I don't know why you keep mentioning fourcc as if it were some
> kind of complete solution.

It's not complete. It's a building block.

drm_fourcc.h is a very widely used standard, so it would be better to
build on it than to replace it. drm_fourcc.h pixel format system does
not conflict with the addition of metadata.

It is very common to allocate image buffers using specific pixel format
and format modifier (and width, height, stride), because those are
necessary for computing the amount of memory needed for an image. Other
metadata does not affect the amount of memory, or the layout of the
memory, so it is natural to keep this and the other metadata
independent of each other.

It has also been very common to have all the other metadata implicit,
especially colorimetry. For computer graphics, that has been the sRGB
assumption. As long as the assumption was good enough, no other
metadata was needed, and as a result the ecosystem is well developed to
communicate and use pixel formats and more recently also format
modifiers which are crucial for hardware accelerated graphics and video.

Therefore, we have a huge amount of infrastructure that can handle
pixel formats, either drm_fourcc.h or equivalent. If we were to switch
to a fundamentally different pixel format system, all that
infrastructure would need to be replaced. That's unlikely to happen.
What would happen is that if something uses a different pixel format
system, it will just end up being converted to and from drm_fourcc.h.
It adds overhead and room for mistakes, and it is possibly a
fundamentally imperfect translation, while the benefits I have not yet
understood.

This is why I am so keenly interested in what problem you have set out
to solve by introducing a new pixel format system. The benefits need to
be considerable to exceed the disadvantages.

I see the ability to combine independent building blocks to build a
complete image description as an advantage, because there will always
be something new in the future to add, that has previously been either
ignored, assumed, or not known of.

> >  
> > > Colorimetry is only relevant when displaying on a monitor. In the video
> > > world we just have red, green and blue (plus alpha, y, u and v). These  
> > are  
> > > just labels for the colour channels, mapping them to bit formats.  
> >
> > That is a very surprising opinion. Have you worked on HDR imagery?
> > Or wide color gamut?
> >  
> 
> As I have mentioned several times, these are display output parameters,
>  The only details which are relevant are the yuv/rgb conversion constants
> and the gamma transfer values, With those I can convert berween any two
> formats, which is all that is necessary for the steps between decoding and
> encoding / display.

I am puzzled. Let's say we have BT.601 525-line video, and it needs to
be re-coded for a BT.2020 container. The yuv/rgb conversion matrices
are different, sure. But the primaries are different as well. If you
ignore the difference in primaries, does that not result in unnaturally
color saturated image when the video is eventually displayed?

> > > The values I mentioned are all necessary if you want to convert from one
> > > colourspace to another. For example if I decode a video frame and the pix
> > > format is YUV420P then to convert it to RGBA to display via openGL, I  
> > need  
> > > to know the YUV subspace (bt709 or itu601) and whether the values are
> > > clamped or full range. Then I apply the standard conversion factors (Kr =
> > > 0.2126, Kb = 0.0722 for bt709). This cannot be derived from the fourcc
> > > (generally). No doubt there is a standard definition of definition of the
> > > R,G,B primaries, but that isnr a concern.  I just feed the values into an
> > > openGL texture buffer, and SDL buffer, a gdkpixbuf, QImage or whatever  
> > and  
> > > ask for it to be displayed. Now in an application I may optionally offer
> > > the user filters to adjust the white balance, contrast, display gamma  
> > etc.  
> > > but that is outside of the scope of what I am proposing.  
> >
> > Yes, those are all important properties, and not enough.
> >
> Let's just say that the final display output is out of scope, what else is  
> missing ?

Ok, this our difference. Final display is always in scope for me,
because digital images are meant to be displayed on displays. To my
understanding, a formed digital image is always intended for a specific
display viewed in a specific environment (e.g. a room), because that is
where it was color graded or intended to be viewed.

When a destination standard or actual viewing differs from the
original, I would need to know what I am converting from and what I am
converting to, in order to convert well.

> Pre / post alpha is required for conversion between formats, I hadn't
> mentioned that because I was trying to avoid going into every little detail.
> 
> 
> 
> 
> > > And no, it is not a case of "adding another standard" and confusing  
> > things,  
> > > there is no standard.  
> >
> > There are standards. ITU-T H.273, coding-independent code points, for
> > example. That combines well with drm_fourcc.h. Also ICC combines well
> > with drm_fourcc.h. This works, because drm_fourcc.h does not attempt to
> > define anything outside of the memory layout and abstract channels.
> >
> > Sorry what I meant is there are standards on paper, but there is no  
> standard set of enumerations (implementation vs specification).
> Instead we have multiple implementations, each with their own definitions.
> In fact somewhere above I actually linked to the ITU709 standard.
> 
> 
> 
> 
> > > I just had a look at pipewire, there is nothing bad about it per se, they
> > > mention their palette values are based on gstreamer. So fine, we have yet
> > > another library specific set of definitions.
> > >
> > > It's like I am trying to invent Esperanto, and all you can say is...."oh
> > > you don't like English, well have you considered speaking German instead  
> > ?"
> >
> > That does seem like an apt analogue.
> >  
> > >
> > > Well that is it, I am done. I was asked how XDG video could be useful. I
> > > explained the shortcomings of what exists currently, and outlined various
> > > ways in which having a standard could be useful.  
> >
> > Sorry, but I haven't understood what gap there is that would need to be
> > filled with yet another pixel format enumeration. Or is it perhaps the
> > same gap that we are filling in Wayland?
> >
> Yes, yet another standard (implementarion), but a common standard that  
> everyone agrees on, That is the entire basis for the existence of XDG, is
> it not ?

Yes, that is what XDG is for, when there is no single implementation
that attracts everyone. XDG is especially for defining things that
should work the same across many desktop environments, where to store
application user data for example.

FWIW, Wayland is not an XDG standard, but a protocol definition of its
own. Wayland does not provide an implementation, aside from a little
library that only converts between a byte stream and a callback API.
drm_fourcc.h is a standard defined by the Linux kernel, foremost to be
used in the kernel-userspace API, but also a standard for
inter-application communication of image buffers using the "dmabuf"
framework. Dmabuf is intended for hardware accelerated video and
graphics processing. None of these are XDG standards, but the Linux
desktop ecosystem is very much converging to these.

> I have no idea what you doing in Wayland but from what you have said the
> focus is naturally on display devices, This is a logical place to include
> colour temperature, white balance and so on.

Yes, indeed. We have our statement of design goals here:
https://gitlab.freedesktop.org/pq/color-and-hdr/-/blob/main/doc/design_goals.md

> All fine and good, but these are monitor standards, video processing itself
> (converting between formats and applying effects) is independant of the
> display device. If I want to make an RGB delay effect I will split the
> image into R, G, and B components, add a delay and then recombine. I don't
> care about the colour primaries because that is irrelevant. What I do care
> about is, if my input image is in yuv format, what choice of values should
> I use to convert it to RGB.

Do you have multiple inputs in different colorimetries that you
need to compose into a single output video stream?

Do you consider things like converting an SDR signal into a HDR
container format, or vice versa?

> If I am sending to openGL for display, I only care that the image I send is
> in RGBA format, post multiplied alpha, non interlaced, and sRGB gamma
> because that is what the openGL standard calls for,

Right.

>  For Wayland, libGL or whatever will get the monitor details and adjust the
> primaries etc. The same as if I connect a projector to the device, the

libGL does not do that. Currently, it is the application itself that
somehow needs to find out what kind of display there is, and what kind
of content it has, and then do whatever it can to do an appropriate
conversion. The application could e.g. have custom OpenGL shaders for
the conversion.

There are extensions that would allow an OpenGL or Vulkan application
to tell GL/Vulkan that btw. my rendering is not sRGB, it's BT.2100/PQ
for instance. Any necessary conversion is (hopefully) taken care of in
a display server. On Xorg/X11, that conversion just doesn't happen. On
Wayland, that has not happened before, but our Wayland work is intended
to make it happen.

(I guess usually it's more like the opposite: a monitor is driven in
BT.2100/PQ, all content is assumed to the sRGB SDR unless otherwise
told, so an application explicitly choosing BT.2100/PQ would avoid the
display server conversion from sRGB SDR to BT.2100/PQ.)

> display output can adjust the colours, but in my RGB delay effect it makes
> no difference.
> 
> 
> > We need to communicate a lot more about images than what pixel formats
> > ever could. We are building that definition based on existing standards
> > defining their own aspects: drm_fourcc.h, H.273 / ICC, and then adding
> > what's missing like the how or if alpha channel is baked into RGB (the
> > two ways to pre-multiply). Since these are all well-defined and
> > orthogonal, there is no problem combining them.
> >
> I totally agree ! I don't see what the argument is about. I just don't  
> think that fourcc, alone or even supplemented with metadata is
> a good idea, I prefer just to use plain integer enumerations

Right, that is our disagreement. I think drm_fourcc.h specifically is a
good building block. We would not extend drm_fourcc.h, but use it
together with other specifications.

> Sorry do you mean literally there are two ways to pre-multiply, or are you
> refering to pre multiplied vs. post multiplied ? The only way I know is to
> multiply each colour (RGB) value by the alpha value. With pre multiplied
> (such as what Cairo uses), the values have already been multiplied, with
> post alpha they have not. Most applications prefer post as it can avoid
> rounding errors, but pre can be faster when you have multiple layers,

I mean that if the alpha has been multiplied into RGB channels already,
it may be done in either electrical space or optical space. Here's some
discussion:
https://ssp.impulsetrain.com/gamma-premult.html

The popular convention is suited for non-linear blending, which can
cause color artifacts like shown here:
https://hg2dc.com/2019/07/21/question-9/

I guess it can be up to taste what looks right.

The popular convention prevents decoding the transfer characteristic in
order to get optical stimulus values which hurts color management
performance.

> ah here is my header
> 
> https://github.com/salsaman/LiVES/blob/LiVES-4.0/libweed/weed-palettes.h
> 

The struct looks quite similar to the drm_fourcc.h information table in
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/gpu/drm/drm_fourcc.c?h=v6.9-rc3#n143

The table entries are defined here:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/drm/drm_fourcc.h?h=v6.9-rc3#n59

The table code is only for kernel-internal use, but the information
contained within it is inherent to the publicly defined pixel formats
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/uapi/drm/drm_fourcc.h

One thing in your header cought my eye, it was this line:

#define WEED_PALETTE_HDYC WEED_PALETTE_UYVY /// UYVY with  bt,709 subspace

It seems WEED_PALETTE_HDYC implies something from bt.709, perhaps the
YCbCr-to-RGB conversion matrix? But none of the other YUV codes seem to
imply any matrix. Any YUV format could appear with any matrix in
the wild. But then, the bt.709 comment is useless, because
WEED_PALETTE_HDYC and WEED_PALETTE_Y422 and several others are the same
integer and so cannot be distinguished.


> the values are all described in the Weed standard (which I will split up
> into libweed and weed fx, as they have grown into independent entities).
> https://github.com/salsaman/LiVES/blob/d9cbe56c04569980a9c86b6df422a6bba1f451a8/libweed/docs/weedspec.txt#L1392
> 
> you will see nowhere a mention of the output display device, except for as
> I mentioned, aspect ratio and PAR which are useful for fixed sizing.

Right. That is a severe limitation for Wayland, Pipewire, and any
inter-process or even intra-process image/video transmission.

Usually images are consumed by people which means they need to be
displayed. Displaying needs display output parameters. If any link in
the video chain drops the display output parameters, they are lost, and
correct display can only happen by convention or accident. The
likelyhood of using the same convention everywhere was improved by the
invention of sRGB, reduced by Wide Color Gamut displays, and destroyed
by the coming of HDR (which includes WCG).

> > Wayland also already provides ways to handle some things, like pixel
> > aspect ratio, so we don't want to define another conflicting way to
> > define the same thing. That means the solution for Wayland is probably
> > not applicable somewhere else, and vice versa.
> >  
> 
> 
> Well I thought this was the XDG list, not the Wayland list, The most
> community friendly way would be to develop these things as application
> neutral xdg standards and then have Wayland be compatible with that.

Perhaps one will emerge eventually once the goals are defined.

> I will point out again, Wayland standards are geared towards display
> hardware, and whilst there is some overlap between that and video
> processing, the two things are not precisely the same. The former being
> device dependant, whilst the latter is a device independant abstraction
> dealing purely with manipulating bits and bytes.
> 

Maybe you are working on unformed images, then? Like raw image sensor
data, or in a pipeline that is being adjusted during color grading?

Otherwise I have to disagree. All the image standards I've seen,
including an old draft of sRGB, BT.601, BT.709, BT.2020 and BT.2100
define the image with respect to a reference display in a reference
viewing environment. They may not all have initially had a reference
display and environment, but they all do now.

BT.601 says for example:

	In typical production practice the encoding function of image
	sources is adjusted so that the final picture has the desired
	look, as viewed on a reference monitor having the reference
	decoding function of Rec. ITU-R BT.1886, in the reference
	viewing environment defined in Rec. ITU-R BT.2035.


Thanks,
pq
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <https://lists.freedesktop.org/archives/xdg/attachments/20240408/119ae7af/attachment.sig>


More information about the xdg mailing list