libebook filter sniffing cost ...

Kohei Yoshida libreoffice at kohei.us
Thu Dec 19 12:01:36 PST 2013


On Thu, 2013-12-19 at 13:11 +0100, David Tardon wrote:
> Hi,
> 
> On Thu, Dec 12, 2013 at 10:02:57AM +0000, Michael Meeks wrote:
> > 	Thoughts appreciated though; is there some ordering of sniffing such
> > that we can prioritize common formats over less common ones ? and has
> > perhaps libebook got into that stack too high up ?
> 
> I think the filters are tried by their occurence in the configuration.

Not exactly. Here is how the ranking of format types is done:
http://opengrok.libreoffice.org/xref/core/filter/source/config/cache/typedetection.cxx#108

If the format type is not listed there, then it will be ranked higher
because the format detector may be defined externally inside a foreign
extension which we do not know about at compile time.

> If that is really the case, we could implement simple ordering based on
> filter flags, e.g., decrease priority for ALIEN and 3RDPARTYFILTER and
> increase it for filters that are both IMPORT and EXPORT. That would put
> .odt before .docx and that before .fb2 (which is the one from libe-book
> that checks for zip content).

Well, this kind of complex ordering was how it was before, which
unfortunately created unpredictability and unreasonable degree of
randomness in deciding the order of filter detection services.  That
caused us to put in a steggerring amount of ugly local hacks in order to
get his/her favortie filter to be "properly" detected...  I'd hate to go
back to that era.

At present, we sort format types by file extension, pre-defined list of
complexity (the above list), and which application is trying to open the
file (if that's available).  Other than that, we rely on each individual
detection service to do the "right thing", which is to detect its own
format correctly, and reject format that's not its own, and do so at
reasonable cost.  So, my preferred solution in this case is to optimize
the libebook's detection service, rather than playing around the order
of detection services to avoid this particular scenario.

Best,

Kohei




More information about the LibreOffice mailing list