[VDPAU] [PATCH 0/1] Add new frame and field mode chroma types. Add VdpDecoderQueryProfileCapability API

Fri Nov 2 19:59:10 UTC 2018

Philip Langdale wrote at Friday, November 2, 2018 11:37 AM:
> On Fri, 2 Nov 2018 16:23:40 +0000 Stephen Warren <swarren at nvidia.com> wrote:
> > This would be technically feasible, but would be inefficient.
> >
> > The problem is that when the actual surface format differs from the
> > format that interop mandates, the interop map/unmap APIs must perform
> > a format-converting copy. This both wastes time and requires that
> > additional memory be allocated to store the converted data.
> >
> > Instead, if we allocate the video surfaces in a format that matches
> > the format we wish to expose with interop, then we don't need any
> > format- converting copy, and the system is as efficient as possible;
> > interop map and unmap operations are simply barriers and don't
> > require copies to be performed.
> >
> > This is somewhat complicated by the fact that different hardware
> > generations support different subsets of format, and this even differs
> > between compressed media formats; in some cases we support decoding
> > only to field-based surfaces (which drove the initial definition of
> > the interop API), in some cases to only frame-based surfaces, and in
> > some cases to either. Hence the need to explicitly expose this
> > information to applications.
> 
> This is the part I don't understand. When I look at a client actually
> using the interop API (mpv in my case), it has to assume that the frame
> is returned as split fields, with two textures per plane. It then does
> a reinterleave as a second stage.

I would not expect any client to do an explicit separate re-interleave
step. That would certainly introduce the inefficiency we're trying to
avoid. Rather, I assume that any application performing interop has some
specific processing it wants to perform on the surface, and hence uses a
shader to read the input surface, process the data, and then write the
results to a new surface. The process of re-interleaving the input data
can be completely hidden in the existing surface fetching part of the
shader, without introducing any separate temporary surface or any extra
copy operation.

If the VDPAU implementation is forced to provide interop surfaces in a
particular format that the video decoder HW itself doesn't support, then
the VDPAU implementation must perform this re-interleaving itself as a
separate step, rather than combining it with the desired application
processing. This is what we're trying to avoid.

> In other words, no matter what, we've
> already established that the existing interop API returns separate
> fields and a client must expect that. This reality caused grief for
> other vendors trying to implement interop when their hardware returned
> whole frames.

At least the extra work is the easy way around; providing two separate
sub-surfaces to interop with doubled "stride" is much easier than trying
to transparently combine two separate surfaces into one for interop
without being forced to just copy the surface and be inefficient:-)

> So, that leads me to conclude that the ship has sailed on the
> 'interop1' API. It must return fields until the end of time to avoid
> breaking consumers. If that means a wasteful de-interleave operation in
> the driver/library for HEVC, so be it - it's the only way to hold to
> the contract.

This isn't just about HEVC, although the issue did arise first with the
HW modules for HEVC. In general, it's about different HW generations.
Some HW may only decode to frame-based surfaces even for compressed
video formats other than HEVC.

> From that, I say that interop2 should simply be the
> one-frame-one-texture API. And if that means the driver/library has to
> do a reinterleave to satisfy the contract, it's also fine, because, in
> practice, the consumer was going to reinterleave anyway(!) No consumer
> is going to do anything with split fields except combine them, so that
> is a constant cost, whichever part does it.
> 
> This approach seems really appealing to me as it doesn't require
> touching any part of the actual vdpau API, which is currently
> field/frame ignorant, and we've already established that the vdpau
> readback API must return whole frames.

Making the field-vs-frame decision in the interop API is too late; at
that time, surfaces have already been allocated, and potentially used as
decode targets, so their format is already known. To allow
format-conversion-free interop implementations, we need to decide the
surface format when it's allocated, so that the video decoder can write
directly to the surface in the format that interop will expose it as.
Anything else requires interop to perform format-translating copy
operations, which we want to avoid.

-- 
nvpublic