[gst-devel] [RFC] Encoding and Profiles

Michael Smith msmith at xiph.org
Mon Oct 19 19:54:29 CEST 2009


On Mon, Oct 19, 2009 at 9:53 AM, Edward Hervey
<edward.hervey at collabora.co.uk> wrote:
> Hi all,
>
>  I have been working lately on researching ways to make the whole
> encoding experience better and more streamlined for applications using
> GStreamer, and have come up with a proposal.

Great. Comments inline below:

>
>  Only two introspectable property (i.e. usable without extra API):
>  * A GstEncodingProfile*
>  * The name of the profile to use
>
>  When a profile is selected, encodebin will:
>  * Add REQUEST sinkpads for all the GstStreamProfile
>  * Create the muxer and expose the source pad
>
>  Whenever a request pad is created, encodebin will:
>  * Create the chain of elements for that pad
>  * Ghost the sink pad
>  * Return that ghost pad
>
>  This allows reducing the code to the minimum for applications
>  wishing to encode a source for a given profile:
>
>  ...
>
>  encbin = gst_element_factory_make("encodebin, NULL);
>  g_object_set (encbin, "profile", "N900/H264 HQ", NULL);

Perhaps "profile-name" ("profile" being reserved for the profile
object itself) would be a better name.

>
> 1.2.1 Incoming streams
>
>  The streams fed to EncodeBin can be of various types:
>
>  * Video
>   * Uncompressed (but maybe subsampled)
>   * Compressed
>  * Audio
>   * Uncompressed (audio/x-raw-{int|float})
>   * Compressed
>  * Timed text
>  * Private streams

Any ideas on how this allows re-muxing (without re-encoding) of
certain streams? This wouldn't be an essential feature for the initial
_implementation_, but I think keeping it in mind when designing the
APIs is pretty important. It looks like you've thought about this, but
it's not clear from this writeup what conclusions you came to :-)

Maybe some API to query what caps are available for re-muxing given
the current profile - then the app can check that, then either
continue decoding if the input stream is incompatible, or pass-through
if possible. Or should the application directly be querying the
profile, rather than going through APIs on the bin, for this stuff?


>
>
> 1.2.2 Steps involved for raw video encoding
>
> (0) Incoming Stream
>
> (1) Transform raw video feed (optional)
>
>  Here we modify the various fundamental properties of a raw video
>  stream to be compatible with the intersection of:
>  * The encoder GstCaps and
>  * The specified "Stream Restriction" of the profile/target
>
>  The fundamental properties that can be modified are:
>  * width/height
>    This is done with a video scaler.
>    The DAR (Display Aspect Ratio) MUST be respected.
>    If needed, black borders can be added to comply with the target DAR.
>  * framerate
>  * format/colorspace/depth
>    All of this is done with a colorspace converter

With respect to framerate, any thought on VFR streams? If the target
format supports VFR, then it'd be nice to be able to just encode the
input as-is, without having to force it to a specified framerate.

It'd probably also be good to have some way to select, and then set
properties on, the elements used here. e.g. the application probably
wants to be able to control what sort of scaling to do (to enable
high-quality scaling, for example, or low-quality/fast for preview
encodes). Obviously, the default would just work, so this would be
more optional API for more advanced applications.

>
> (2) Actual encoding (optional for raw streams)
>
>  An encoder (with some optional settings) is used.

Are you planning anything for specifying how the settings should work,
such that a profile could contain settings that apply to several
different encoders (probably selected by rank, or optionally forced by
the application), or will the settings be tied to a specific element?

>
> (3) Muxing
>
>  A muxer (with some optional settings) is used.
>
> (4) Outgoing encoded and muxed stream
>
>
> 1.2.3 Steps involved for raw audio encoding
>
>  This is roughly the same as for raw video, expect for (1)
>
> (1) Transform raw audo feed (optional)
>
>  We modify the various fundamental properties of a raw audio stream to
>  be compatible with the intersection of:
>  * The encoder GstCaps and
>  * The specified "Stream Restriction" of the profile/target
>
>  The fundamental properties that can be modifier are:
>  * Number of channels
>  * Type of raw audio (integer or floating point)
>  * Depth (number of bits required to encode one sample)
>
>
> 1.2.4 Steps involved for encoded audio/video streams
>
>  Steps (1) and (2) are replaced by a parser if a parser is available
>  for the given format.
>
>
> 1.2.5 Steps involved for other streams
>
>  Other streams will just be forwarded as-is to the muxer, provided the
>  muxer accepts the stream type.
>
>
>
>
> 2. Encoding Profile System
> --------------------------
>
>  This work is based on:
>  * The existing GstPreset system for elements [0]
>  * The gnome-media GConf audio profile system [1]
>  * The investigation done into device profiles by Arista and
>  Transmageddon [2 and 3]
>
> 2.2 Terminology
> ---------------
>
> * Encoding Target Category
>  A Target Category is a classification of devices/systems/use-cases
>  for encoding.
>
>  Such a classification is required in order for:
>  * Applications with a very-specific use-case to limit the number of
>    profiles they can offer the user. A screencasting application has
>    no use with the online services targets for example.
>  * Offering the user some initial classification in the case of a
>    more generic encoding application (like a video editor or a
>    transcoder).
>
>  Ex:
>   Consumer devices
>   Online service
>   Intermediate Editing Format
>   Screencast
>   Capture
>   Computer
>
> * Encoding Profile Target
>  A Profile Target describes a specific entity for which we wish to
>  encode.
>  A Profile Target must belong to at least one Target Category.
>  It will define at least one Encoding Profile.
>
>  Ex (with category):
>   Nokia N900 (Consumer device)
>   Sony PlayStation 3 (Consumer device)
>   Youtube (Online service)
>   DNxHD (Intermediate editing format)
>   HuffYUV (Screencast)
>   Theora (Computer)
>
> * Encoding Profile
>  A specific combination of muxer, encoders, presets and limitations.
>
>  Ex:
>   Nokia N900/H264 HQ
>   Ipod/High Quality
>   DVD/Pal
>   Youtube/High Quality
>   HTML5/Low Bandwith
>   DNxHD
>
> 2.3 Encoding Profile
> --------------------
>
> An encoding profile requires the following information:
>
>  * Name
>   This string is not translatable and must be unique.
>   A recommendation to guarantee uniqueness of the naming could be:
>      <target>/<name>
>  * Description
>   This is a translatable string describing the profile
>  * Muxing format
>   This is a string containing the GStreamer media-type of the
>   container format.
>  * Muxing preset
>   This is an optional string describing the preset(s) to use on the
>   muxer.
>  * Multipass setting
>   This is a boolean describing whether the profile requires several
>   passes.
>  * List of Stream Profile
>
> 2.3.1 Stream Profiles
>
> A Stream Profile consists of:
>
>  * Type
>   The type of stream profile (audio, video, text, private-data)
>  * Encoding Format
>   This is a string containing the GStreamer media-type of the encoding
>   format to be used. If encoding is not to be applied, the raw audio
>   media type will be used.
>  * Encoding preset
>   This is an optional string describing the preset(s) to use on the
>   encoder.
>  * Restriction
>   This is an optional GstCaps containing the restriction of the
>   stream that can be fed to the encoder.
>   This will generally containing restrictions in video
>   width/heigh/framerate or audio depth.
>  * presence
>   This is an integer specifying how many streams can be used in the
>   containing profile. 0 means that any number of streams can be
>   used.
>  * pass
>   This is an integer which is only meaningful if the multipass flag
>   has been set in the profile. If it has been set it indicates which
>   pass this Stream Profile corresponds to.
>
> 2.4 Example profile
> -------------------
>
> The representation used here is XML only as an example. No decision is
> made as to which formatting to use for storing targets and profiles.

Whatever decision in made as to the 'default' format for storing
these, I'd really like to see a sufficiently complete API that an
application that (for whatever reason) doesn't want to use that format
could build the GstEncodingProfile object itself, from its own data
store.



>
> <gst-encoding-target>
>  <name>Nokia N900</name>
>  <category>Consumer Device</category>
>  <profiles>
>    <profile>Nokia N900/H264 HQ</profile>
>    <profile>Nokia N900/MP3</profile>
>    <profile>Nokia N900/AAC</profile>
>  </profiles>
> </gst-encoding-target>
>
> <gst-encoding-profile>
>  <name>Nokia N900/H264 HQ</name>
>  <description>
>    High Quality H264/AAC for the Nokia N900
>  </description>
>  <format>video/quicktime,variant=iso</format>
>  <streams>
>    <stream-profile>
>      <type>audio</type>
>      <format>audio/mpeg,mpegversion=4</format>
>      <preset>Quality High/Main</preset>
>      <restriction>audio/x-raw-int,channels=[1,2]</restriction>
>      <presence>1</presence>
>    </stream-profile>
>    <stream-profile>
>      <type>video</type>
>      <format>video/x-h264</format>
>      <preset>Profile Baseline/Quality High</preset>
>      <restriction>
>        video/x-raw-yuv,width=[16, 800],\
>        height=[16, 480],framerate=[1/1, 30000/1001]
>      </restriction>
>      <presence>1</presence>
>    </stream-profile>
>  </streams>
>
> </gst-encoding-profile>

This describes the constraints on the device (or whatever). Have you
thought at all about splitting out "constraints on what the target can
accept" from "what we actually want to encode"?

e.g. this profile says that I can do any size (within that range)
video, but my application wants to encode at a particular size -
should I be replacing the caps in the profile at runtime, or should
there be another object to represent these (somewhat different)
concepts?

What about constraints that are not (currently, at least) expressible
through caps? e.g. bitrate, profiles, etc?


Anyway, I don't have time right now to continue through this in enough
depth - and I'm sure some of my remarks miss something you've already
thought about - but this was just to throw some more ideas into the
mix.

I'm very happy to see you looking into this more deeply!

Mike




More information about the gstreamer-devel mailing list