<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=ISO-8859-15">
</head>
<body text="#000000" bgcolor="#ffffff">
Hi,<br>
I'm writing a MIME implementation for the D programming language. As
I want to eventually submit this for inclusion in the D standard
library it must be boost licensed and I therefore can't look at
existing implementations. This means I can only use the
specification and the mime test database to verify the correctness
of my implementation.<br>
<br>
Reading the specification, some questions occurred:<br>
<br>
The LiteralList:<br>
Is it safe to assume that the list is consistent?<br>
I.e. could there be a case insensitive "Hello" entry and another,
conflicting, case insensitive "hello" entry?<br>
Can the list contain two entries which are only different in casing,
one case-sensitive, one not? I.e. can the list contain a case
sensitive "Hello" entry and a case insensitive "hello" entry? If so
should "Hello" match the case sensitive variant ore the case
insensitive one?<br>
<br>
The Reverse Suffix Tree:<br>
Docs are very sparse here, took me some time to figure out what a
suffix tree is and how to search it ;-)<br>
<br>
Can the RST tree contain literals, or are those guaranteed to be in
the LITERALS list?<br>
<br>
ReverseSuffixTreeNode.CHARACTER: What encoding is used? I guess
UTF32?<br>
Can all characters be matched literally? Or could the tree contain
special glob characters like '*'? (guess no special characters)<br>
<br>
MagicList.MAX_EXTENT<br>
I finally found out what that's meant to be, but one sentence to
explain the meaning of this field wouldn't hurt.<br>
<br>
The whole Magic/Match/Matchlet stuff could use some documentation.<br>
<br>
Regarding the recommended checking order:<br>
"Otherwise, start by doing a glob match of the filename."<br>
In which order should LITERALs the RST and GLOBS be checked?<br>
Should all 3 of those always be checked?<br>
For example if a LITERAL match is found, should the RST/GLOBS still
be checked? (guess not)<br>
<br>
"If any of the mimetypes resulting from a glob match is equal to or
a subclass of the result from the magic sniffing, use this as the
result."<br>
Should this check be done against _all_ matching GLOBS/RST entries,
or against the list obtained in step 2 ("only biggest weight. If the
patterns are different, keep only globs with the longest pattern")<br>
<br>
"Otherwise use the result of the glob match that has the highest
weight."<br>
What if there are multiple, different matches with same length &
weight?<br>
Return "application/octet-stream" or the first match?<br>
<br>
The spec assumes there's at most one MAGIC match. What if there are
multiple matches? Use the one with the highest PRIORITY? What if
there are multiple matches with the same PRIORITY?<br>
<br>
I also had some issues with the test suite:<br>
<br>
My "by-Name" implementation fails for the following files:<br>
test-template.dot, aportis.pdb, sqlite2.kexi, subtitle-microdvd.sub,
simple-obj-c.m, linguist.ts, test.ogg<br>
<br>
There's a common pattern with all those tests: My implementation
finds multiple equal matches in the tree and bails out according to
the spec:<br>
"<span class="Apple-style-span" style="border-collapse: separate;
color: rgb(0, 0, 0); font-family: 'Times New Roman'; font-style:
normal; font-variant: normal; font-weight: normal; letter-spacing:
normal; line-height: normal; orphans: 2; text-indent: 0px;
text-transform: none; white-space: normal; widows: 2;
word-spacing: 0px; font-size: medium;">If a matching pattern is
provided by two or more MIME types, applications SHOULD not rely
on one of them. They are instead supposed to use magic data (see
below)<span class="Apple-converted-space"> </span></span>"<br>
<br>
Example for test-template.dot:<br>
Tree: '[{Type: 'application/msword-template' Weight: '50'
CaseSensitive: 'false' Flags: '[0,0,0]'},{Type: 'text/vnd.graphviz'
Weight: '50' CaseSensitive: 'false' Flags: '[0,0,0]'}]' Expected:
'application/msword-template'<br>
The test suite always assumes the first of those results is
returned? Which implementation is correct in this case? I see no
reason why 'application/msword-template' should be used here,
'text/vnd.graphviz' has exactly the same weight, flags and matching
pattern.<br>
<br>
Another question: why do the bug-30656-xchat.conf/menu.ini tests
return "application/octet-stream"? Those are text files, so if
text/binary guessing is used the result should be "text/plain"? Is
the spec out of date and binary/text guessing is obsolete?<br>
<br>
I also have issues understanding the test.jks MAGIC test:<br>
The magic value in the freedesktop.org.xml is "0xfeedfeed" ==>
[254, 237, 254, 237] type host32.<br>
My implementation reads that value from the cache file. I test on
x86-->LittleEndian. WORD_SIZE is 4, so I change the magic value
as indicated by the specs: [237, 254, 237, 254]<br>
The check, however fails, as test.jks starts with [254, 237, 254,
237]?<br>
What's wrong here, I'm pretty sure I'm supposed to byteswap VALUE?<br>
<br>
I was also surprised, why none of the other host32 magic tests
failed: Turns out all does tests are completely independent of byte
swapping:<br>
<br>
The application/x-java-jce-keystore magic value "0xcececece" is
exactly the same if swapped or not. (BTW: why is this then marked
host32? Doesn't this cause unnecessary byte swapping?)<br>
<br>
The application/vnd.tcpdump.pcap value is similar:<br>
The xml file contains these host32 values: "0xa1b2c3d4" and
"0xd4c3b2a1". Of course, one of these will match in any case. Again,
why aren't those both stored as big32/little32 to avoid the byte
swapping at runtime?<br>
<pre class="moz-signature" cols="72">--
Johannes Pfau</pre>
</body>
</html>