protocol handling spec?

Tue Aug 10 15:14:28 EEST 2004

On Tue Aug 10 08:54:01 2004, Avery Pennarun wrote:
> I can tell you for sure that it doesn't do anything stupid like 
> download
> every single message in my mailbox.  Of course not.  It downloads a 
> list of
> messages, and to get other messages I download... other URIs.
> 
> 
It's possible, it just depends on how that folder gets accessed. If 
it's really downloading a list of messages, that's going to be very 
slow the first time. A lot of what I do involves reading very remote, 
very large, folders occasionally. My client doesn't ever bother 
reading in all the message list, because that takes too long.

Enumerating the messages can involve quite a bit of data.

> > >The API you proposed is nice and simple, but if it were me, I 
> might want
> > >to add a special function for specifically composing/decomposing 
> lists. > >More than just imap: will want to create lists of objects 
> that I can
> > >parse and navigate in a general way.  Of course, HTML and XML 
> are both
> > >reasonable(*) ways of creating general types of lists.
> >
> > Erm. You're thinking that composing everything into XML is going 
> to > be faster, somehow?
> 
> XML people like the idea that you could do it that way.  I, and 
> apparently
> you, don't particularly.  That's fine; encode it in any other 
> format, or
> even extend your API to understand the concept of a "folder 
> listing".  It
> can be as fast as you need, but more speed means less generality.
> 
> 
No, I mean, encoding something as an octet-stream automatically will 
slow things down if it wasn't an octet-stream to begin with. XML or 
not doesn't matter. I'm well aware that any information can be 
encoded as a stream of octets (or bits) - Shannon proved that quite 
convincingly a while ago.

What I'm not convinced about is that this is a good idea when a 
better interface can be made available.

Take an IMAP folder containing several thousand messages. (I do, 
lots). There's rarely, if ever, a need to enumerate all the messages 
in it. It's possible - it may even be desirable - to represent that 
folder as a collection of URIs, but in order to find those URIs, a 
substantial amount of data has to come down that pipe - that's really 
quite unpleasant. (I measured it in excess of five seconds before 
rewriting that bit away - the folder being only around ~34k messages 
at the time.) So it should be totally avoidable.

I don't really think that's unusual.

> > >(BTW, a more general URI-type mechanism is the "moniker", which 
> is simply
> > >a mapping of strings into objects.  When you request a moniker 
> for a
> > >particular factory, it'll give you a corresponding object that 
> implements
> > >a particular API.  A URI is a particular kind of moniker that 
> returns
> > >objects implementing the URI API.)
> > > What I think you think, in these terms, is that:
> > > 1) One of:
> > a) Each possible interface to an object returned by a moniker > 
> corresponds to a MIME type.
> 
> If you think I think that, my powers of expression must be way 
> worse than I
> ever realized, and I apologize.  Where to begin?  Whew.
> 
> > b) All objects support an interface which provides an 
> octet-stream, > and has a property of a MIME type. (Sans 
> parameters, because nobody > likes them, eh?)
> 
> Umm, well, all typical URIs can provide objects which support such 
> an
> interface, yes, and for each example you've asked me about, I've 
> explained
> how that is possible.  Feel free to ask about more types of URIs; I 
> can do
> them all.
> 
> 
No, I don't think you have. "Using" a telnet or mailto scheme doesn't 
give you an octet-stream. I'm far - very far - from convinced that an 
imap scheme URI referencing a folder can give you a useful one.

> The parameters are merely optional, not required to be absent, 
> although they
> must be uniform across all URI types (since the interface must be 
> uniform if
> you want to be accomplishing anything).
> 
> 
Well, no, because the parameters of a MIME type are not uniform. (I'm 
referring to the parameters of a MIME type, specifically.)

But in any case, a uniform interface to a URI, which can refer to any 
of a very wide range of resources, isn't impossible. It just won't be 
all that useful. I see that common interface as being largely:

"Can I get a reasonable octet-stream out of this?"
"Punt this to something else, I don't understand it."
"Give me an octet-stream."

Once you get an octet stream, if you wanted one, you should be able 
to hand it somewhere else as well.

> > 2) All possible instances of all possible URIs return a uniform > 
> interface. See above.
> 
> Certainly not.  There is *one* API which, for all supported URIs, 
> must
> provide an object instance that supports that API.  This is the API 
> your
> favourite browser apps, like Konqueror and Nautilus, might need.  
> When you
> type a URI into your Nautilus or Mozilla URL bar, this is the API 
> you're
> *expecting* the objects to provide, or you wouldn't have typed the 
> URI
> there.
> 
> 
Yeah, I was thinking about this last night. Forgive me if I haven't 
thought this through fully.

It seems to me that for every time you request the object attached to 
a URI - let's call it dereferencing it - then you need to provide a 
URI used for context.

Let's give some examples:

Supposing you're browsing the web, and you click on a link. The 
context URI is the address of the page. This indicates to the 
dereferencing engine that, in this case, you want to treat the URI as 
a normal webpage for preference. Now I should point out this is 
different to the base URI, which as we know cannot be detirmined 
without all of the context URI, the resource it points to, the 
metadata associated with that resource, and several degrees in 
philosophy.

But supposing you enter it into a file manager - or rather, you enter 
it into an application working within a file-like context, more 
generally - then you probably want the URI to be treated as a DAV 
item if possible. (If not, you punt it to the web browser.)

This notion of a context URI does seem to fit most cases, including 
the rather awkward cid and mid schemes, which can use the imap (Or 
other) URI in order to figure out a good place to look for the part 
or message - that's much more base-URI-like, of course.

Actually, given the notion of a context URI with which to handle the 
current URI, I think it covers everything. There remains the problem 
of how to deal with URIs held on the desktop, for instance, but that 
can be handled by providing default behaviour for a particular class 
of URI, and extending current storage methods to include the selected 
context, probably abstracted into a set of URNs or something. (It 
might actually be better to seperate out context and base URIs, and 
use magic URNs for contexts all the time.)

> There are plenty of other potential APIs that might be supported by 
> any
> given subset of monikers, but if your calling app doesn't know 
> about that
> API, it doesn't help much.
> 
> 
No, so all you need is a 'give this to something which understands 
it'.

> > 3) There exists an interface for an octet-stream, with a MIME 
> type > (with parameters, because they're useful for dispatch). This 
> can be > passed to, and used by, an application without the 
> application > needing to know about the specific method of access 
> to the > octet-stream.
> 
> Agreed.
> 
> > 4) URIs may provide an interface as in (3), but may not. URIs do, 
> > however, provide some interface with a single method which 
> performs a > default action for them based purely on the scheme 
> name.
> 
> Sure.  But the API of claim #4 is merely a subset of the API of 
> claim #3; if
> we say that every kind of URI has a single default method *that 
> takes
> optional input and output blobs*, then we can have it always (like 
> in #4)
> and flexible (like in #3).
> 
> 
It may be possible. I don't think it's a good idea, because I don't 
believe that it'll be efficient. Which suggests a further API to 
ascertain whether this is a sane thing to do or not.

Moreover, it's not always possible - 'mailto' and 'telnet' don't 
provide any octet-stream, nor does 'smtp' [which is draft, but you 
can probably guess the intent.]. If I "copy" a mailto URI, I don't 
want an empty file, I want an error. And I certainly don't want to 
actually send an email.

> > You seem to add in MIME types somewhere for no very good reason, 
> > possibly using them as generalized interface names.
> 
> No, MIME types are separate from this, which is the point I'm 
> trying to get
> across; this discussion started out as an attempt to merge the MIME 
> and URI
> manager specs into a single spec, and I was pointing out that 
> they're
> different things with different uses, albeit similar-sounding ones. 
>  I'm not
> against merging the specs, only against misusing the actual 
> concepts.
> 
> To use your terminology, if a URI can retrieve an octet-stream, that
> octet-stream *does* have a MIME type, as all octet-streams do.  That
> MIME-type is really a moniker that will give you an object (the
> viewer/printer/editor/etc for that type of file), and that object 
> is the
> *user* of the URI itself.
> 
> 
Yeah, I follow that. But only for the case in which a URI refers to a 
resource which is sensibly encoded as an octet-stream. If it isn't an 
octet-stream, it doesn't have a MIME type.

(I've been a shade unclear, I admit - when I say "A URI does not have 
a MIME type", I mean "The resource pointed to by a URI does not have 
a MIME type", etc. A list of URIs formatted one per line, with CRLF 
EOLs, does of course have a MIME type no matter what the URIs are, 
but that's irrelevant.)

> > By the way, pop quiz, what should you do with a "dav" scheme URI? 
> > (Hint, it's a registered one at IANA, and quite definitely 
> doesn't > have a MIME type...)
> 
> This is an easy one: I handle it in almost the exact same way as I 
> would
> http (since it is, basically, http with a few extensions).  While 
> dav itself
> has no MIME type, the files retrieved or published via dav *do* 
> have a MIME
> type, and therefore I would load the application associated with 
> that MIME
> type, passing it the URI as the file to view/edit/print/etc.
> 
> 
Interesting. The resource that dav URIs refer to certainly doesn't 
have a MIME type. You'd be hard pressed to encode it as an 
octet-stream, too, since it falls outside Shannon's law, as it were. 
It's abstract, not concrete, referring to an XML namespace.

Some perfectly valid URIs are completely incapable of actually being 
processed in any useful way, it seems.

> > GNOME has all the functionality needed to launch arbitrary URIs. 
> It > doesn't know, however, and I have no way of telling it, that 
> some > URIs are reasonable to treat as filenames. (Or whichever 
> your > preferred nomenclature is. Data addresses?)
> 
> Ah, so this is the heart of the problem.  We shouldn't be arguing 
> about
> whether URIs have MIME types; it's you that wants them to, not me.
> 
> 
Oh, not quite - I want the octet-stream and MIME type they have, if 
any, to be available, and usable.

I don't want to try to coerce everything into an octet-stream with a 
MIME type, however, that would be foolish.

> > >The MIME type of an email message retrieved from any URI, 
> including
> > >imap:, should be message/rfc822.  When you display such a 
> message, you
> > >should expect it to pop up in some sort of "normal" mail reading 
> program
> > >or component, not just a text viewer.  Of course, you could also
> > >configure your "default viewer" for that type of object to be a 
> text
> > >viewer, since it happens to be a text file.
> >
> > Reread the URI. It's pointing to the TEXT section of your 
> message, > which was precisely the MIME type *you* set it to be, 
> and I merely > quoted off the server. Not message/rfc822, 
> text/plain; > charset="us-ascii".
> 
> Okay then; it should display in the default viewer for a text file. 
>  A URI
> that points into the actual body part of a message has nothing to 
> do with
> email, and therefore should not launch my mail reader, and it won't 
> as long
> as the imap: handler tells me the right MIME type.  Do you want it 
> to?
> 
> 
Yes. Because it does actually have a MIME type attached to the 
resource it points to.

> On the other hand, if I point at a message/rfc822, I probably 
> *would* expect
> it to launch my mail reader (or at least an embedded component of 
> it),
> because whenever I view an email on my desktop, I expect it to look 
> the
> same.
> 
> 
True, although I'd hope that the entire message wasn't downloaded 
unless the application wishing to use the URI really wanted an 
octet-stream. (Invariably, a well written IMAP client never downloads 
the full message, just the ENVELOPE, BODYSTRUCTURE, and those parts 
it wishes to display inline. A tiny handful of headers are useful, 
sometimes, depending on what else you want to do.)

So I see octet-stream in this case as a perfectly valid fallback - 
perhaps in order to save the message as a file - but not a suitable 
interface to use if you understand a better interface.

> > >This is why your IMAP client is the URI handler, and your mail 
> reader is
> > >the MIME handler.
> > > I strongly suspect that would lead to a truly revoltingly slow 
> mail > reader.
> 
> And yet it doesn't, and the KDE people proved it.  You should check 
> your
> assumptions.  I guess the most important thing to remember is that 
> the IMAP
> handler can do all the caching it wants, and just because you *ask* 
> it for a
> message doesn't mean it has to go and *retrieve* the message.

Depends on how generalized it's been written. If it had to list all 
the messages in a folder, then it has to do that at least once, and 
that's an expensive operation. (If it enumerates all the messages 
*and* provides you with the stuff for the summary, that's a huge 
amount of bandwidth.)

In any case, some more specialized cases, such as kiosk mode usage 
and roaming, don't have the option of caching - and both of those are 
extremely useful in many sectors. I can deal with this given either 
of control over the IMAP connection, or a suitably rich interface to 
the resource. (In effect, my client provides the latter to itself by 
having the former.) If the interface I get is solely concerned with 
encoding the entire folder into an octet-stream, it cannot possibly 
provide the level of speed this requires.

So to summarize:

1) Not all URIs resolve to an octet-stream. ('mailto', 'telnet')
2) Some URIs are more efficient to use if you don't resolve them to 
an octet-stream, but you can, and it makes sense. ('imap' URIs 
referring to a message or message/rfc-822 part)
3) Some URIs can be coerced into giving you an octet-stream, but it's 
highly inefficient and non-standard. ('imap' scheme URIs referring to 
a folder)
4) Some URIs always resolve to an octet-stream assuming no error, and 
this is efficient. ('imap' URIs referring to most message parts, 
'http' URIs in all their myriad of flavours, 'ftp' URIs, etc)
5) For all the above, there may exist an interface which provides 
more functionality.

I think trying to force everything to use the lowest common 
denominator is a very bad thing, and a fundamental breakage in the 
model.

Dave.