[gst-devel] typefind cleanups

Tim Müller t.i.m at zen.co.uk
Wed Oct 24 13:25:12 CEST 2007


On Wed, 2007-10-24 at 10:54 +0200, Stefan Kost wrote:

> I believe we need some stricter rules for the gsttypefindfunctions.c.

Why do you believe that?

> Currently typefindplugin works down the registered typefind funtions  
> by ranks and alphabetical sort-order.
> A quick analysis shows:
> 38 PRIMARY
> 46 SECONDARY
> 8 MARGINAL
> 
> Alphabetical sorting is done on feature name (e.g. "audio/x-au"). Thus  
> we always first probe "application/", then "audio/", then "image/" and  
> finally "video/".
> Not really clever, but I have no better idea right now.

FWIW, "historically", the sorting by name (after rank) was just done in
order to make typefinding more predictable/determinate, nothing else.

The order in which we typefind things mostly matters for typefinders
that return MAXIMUM probability (since that's when typefinding stops
immediately and other typefinders are not tried).  For all others, it
doesn't matter that much, or only where two typefind functions return
the same probability (I don't think I've ever seen a case like that
though).


> While looking at the primary ones I stumbled over a few things:
> * TYPE_FIND_REGISTER (plugin, "audio/iLBC-sh", GST_RANK_PRIMARY,
>    shouldnt that be "audio/x-iLBC" ?

Does it matter? You could call it "typefinder9983" if you wanted to (and
the factory name wasn't part of the API, which some may argue it is).


> * not really widely used types as primary
>    "audio/qcelp", "text/x-cmml"
>    shouldn't they be secondary
> * non container based type as primary
>    "audio/x-flac",  "audio/x-vorbis", "video/x-theora", "video/x-dirac",
>    "video/mpeg4", "audio/mpeg"
>    shouldn't those be secondary too
> * container based as secondary
>    "video/x-ms-asf"
>    shouldn't that be primary
>  (..)
> 
> Any comments.

Foremost: please don't make any big changes before the next -base
release. The current order and typefind probabilities is something
that's been fine-tuned over quite some time, it's not something that
should be messed with before a release.


> Primary should be widely used container formats.
> Secondary should be the non-so widely use container formats and codecs.
> Marginal esoteric stuff

I think identifying the type correctly should be our primary concern
with efficiency a second(ary) concern. I'm not sure how ranking
typefind functions by popularity helps with either.  Most typefinders
for popular formats that can identify their type with MAXIMUM
probability are already PRIMARY rank as far as I can see (the ASF
typefinder being a notable exception, which should probably be
changed). For the others it doesn't really matter that much anyway.

This is not to say that everything that there is currently makes sense,
just that I'm not convinced your suggested metrics make more sense.

Cheers
 -Tim






More information about the gstreamer-devel mailing list