Testing xdg-mime on large number of files

Bastien Nocera hadess at hadess.net
Tue Sep 25 13:30:43 PDT 2012


Hello Sergey,

Em Tue, 2012-09-25 às 23:52 +0400, Сергей Давыдов escreveu:
> Hello XDG people,
> 
> I've discovered some files which were not recognized by xdg-mime
> unless they have a correct extension; after filing a bug against
> shared-mime-info I wondered if I have any other files that are not
> recognized and how can I check that. I wrote a shell script for
> automatic testing, and it worked pretty well: after feeding at about
> 150,000 files to it I've discovered five more unrecognized file groups
> and reported bugs about them. I'm now sharing the script to simplify
> large-scale testing of shared-mime-info database, you can find it
> attached to this email. I hope this will help improve shared-mime-info
> database.
> 
> The script accepts only one parameter: the directory in which to look
> for files. The directory will be walked recursively. The script works
> by hardlinking* the files into a temporary directory without
> extensions and comparing xdg-mime output for both files. If the
> mimetype becomes "application/octet-stream", it considers the file
> unrecognized and outputs its name and original mimetype to a file; if
> the mimetype changes but has no "octet-stream" in it, it considers the
> mimetype incompletely detected and writes out the file name to another
> file along with the normal and magic-based mimetypes as reported by
> xdg-mime.
> 
> *xdg-mime is clever - it follows symlinks and uses the name of the
> actual file for matching, so I had to use hard links instead of
> symbolic.

I've added a test program to xdgmime (that's where the code lives) to
print out the mime-type of files gathered "by name", "by data" and "by
file". Expanding it to print out mismatches to a separate file would be
fairly straight forward, and avoid having to use hard links at all.

I used this script to check the validity of my changes to the database
this morning, against the test cases you gave. I then added one example
of your test files to our test suite, which would help catch
regressions.

Thanks for your bug reports, feel free to keep them coming.

Cheers



More information about the xdg mailing list