protocol handling spec?

Wed Aug 11 12:41:16 EEST 2004

On Tue Aug 10 19:18:11 2004, Avery Pennarun wrote:
> On Tue, Aug 10, 2004 at 01:14:28PM +0100, Dave Cridland wrote:
> 
> > On Tue Aug 10 08:54:01 2004, Avery Pennarun wrote:
> > >[imap uri downloading]
> > >It downloads a list of messages, and to get other messages I 
> download...
> > >other URIs.
> >
> > [...] A lot of what I do involves reading very remote, very 
> large, folders
> > occasionally. My client doesn't ever bother reading in all the 
> message
> > list, because that takes too long.
> 
> If you don't try to retrieve the URI corresponding to the folder 
> item list,
> there is no reason for it to do so.  If you have the URI of a 
> particular
> message, you can retrieve that directly, without going through the 
> folder
> list first.
> 
> 
But generating a folder message list is expensive. Generating part of 
one is cheap. Finding the URI for the nth item is very cheap indeed. 
But, the URI for the folder remains the same.

> > No, I mean, encoding something as an octet-stream automatically 
> will > slow things down if it wasn't an octet-stream to begin with. 
> XML or > not doesn't matter. I'm well aware that any information 
> can be > encoded as a stream of octets (or bits) - Shannon proved 
> that quite > convincingly a while ago.
> > > What I'm not convinced about is that this is a good idea when a 
> > better interface can be made available.
> 
> I both agree and disagree with you here: I think that, to simplify 
> the model
> of the world seen by things like web browsers and Nautilus, 
> rendering to the
> lowest-common-denominator (an octet-stream) is very useful.
> 
> 
In some cases, though, I don't think it's suitable.

> (There are lots of so-called "component systems" that implement this
> kind of behaviour.  The nice thing about XPLC is it implements 
> little else.)
> 
> 
Nor does COM, strictly speaking. It's all the additional cruft to do 
with Microsoft incorporating the object broker into the OS that 
causes the problems. But anyway, so XPLC is a nice COM clone.

> Anyway, now imagine that you're Nautilus, and someone gives you an 
> imap uri. What do you do?  Well, you can't really do anything until 
> you resolve it to
> an object, and you don't understand objects other than IObject and 
> URI API. IObject is kind of useless, so you resolve(moniker, "URI 
> API").  
> 
It's this URI API business I have problems with. URIs are uniform, 
the resources are not. I don't see the point in coercing them all to 
be alike.

> Now you have an object.  Let's say the URI API gives you a MIME 
> type for
> that object (which is probably the case).  You can see that the 
> MIME type is
> something like message/rfc822 - okay, I know how to view those:
> resolve("message/rfc822", "MIME Viewer API").  If this returns 
> nothing, I
> simply can't deal with that type of object; offer to save the 
> bitstream to a
> file or something.  It it returns something, I *can* deal with that 
> type of
> object.  Good; pass the URI object to the MIME object, and let him 
> do the
> rest.
> 
> 
No problem with that. Hopelessly inefficient in some cases, but 
possible in some cases.

> Let's say the URI object is from an imap: URI, and the MIME object 
> is my
> mail reader (because one of the various possible MIME-types that 
> URI might
> point to is set to be viewable by my mail reader).  My mail reader 
> receives
> the URI object, and has two choices: treat it like a bitstream, 
> which will
> definitely work, or check if it supports the IMAP API, which might 
> work.  If
> my mail reader is smart, he'll try the second one first (optimizing 
> IMAP
> performance where possible), but fall back to the first one, 
> because things
> like message/rfc822 can come from places other than just IMAP 
> servers.
> 
> 
But the only way you're likely to detirmine, in many cases, what the 
MIME type is is because you've actually hit the network. The only 
exceptions I can think of are for imap URIs where the URI is 
identifiable as a message, for data scheme URIs, and of course for 
local file URIs (although we don't actually know that's local).

> Now let's say I don't have a mail reader installed on my system at 
> all, but
> I try to retrieve the same message/rfc822 URI.  Nautilus will do
> resolve("message/rfc822", "MIME Viewer API") and still get an 
> object,
> because I have a low-priority viewer defined for message/rfc822: 
> it's the
> same as my text/plain viewer.
> 
> 
Well, I'd not imagine for a moment that a desktop system is going to 
understand IMAP without a mail reader available, but still. Yes, this 
is a potential fallback interface.

> Furthermore, if imap folder objects can render themselves as "list 
> of uri"
> objects, then Nautilus can display the messages in the folder as 
> literally a
> set of items in a file folder.  It's not the best way to view imap 
> - but if
> you don't have a mail reader installed, it's the *only* way.
> 
> 
This is where I start to have serious problems. Firstly, I can't 
actually imagine a set of circumstances where you'd have IMAP 
capability, yet no mail reader.

Secondly, I don't really see a problem with saying "This URI has no 
client available". You have to with some URIs anyway. It's not a big 
deal.

Thirdly, in the case where there's no mail reader, the only people 
who care that this kind of hack works are going to be more than 
capable of installing an IMAP client.

Fourthly, if you really did want this sort of thing, then surely 
you'd really want to have an interface defined as a core interface 
which provided a random access array interface to a set of other 
objects, which'd surely be more generally useful rather than 
serializing data into some internal format?

If this becomes a reality, I have visions of people saying "Look, my 
desktop can read my mail for me in my filemanager in a highly 
rudimentary form!", and people replying "Yes, but I don't care.".

> > "Using" a telnet or mailto scheme doesn't give you an 
> octet-stream.
> 
> While this is true, it can still give you an object implementing 
> the URI
> API.  For that object, trying to retrieve the bitstream will launch 
> a
> subprogram (telnet or your mail composer) and return NULL.
> 
> If it makes you feel better, we could have it return an empty 
> bitstream with
> a MIME type that resolves to your mail composer, but that's just a 
> longer
> route to the same thing.
> 
> 
s/route/hack/

No, neither makes sense to me. There's no bitstream, there cannot be, 
we can see that from the URI. So why not just explain that in the 
interface, rather than pretending there is one?

The only reason I can think of is that current desktop systems in 
xdg-land are only capable of dispatch based on MIME type, and hence 
the only way to dispatch a protocol is to pretend it's got a MIME 
type somewhere. This is not an insurmountable problem.

> > [...] the parameters of a MIME type are not uniform. (I'm 
> referring to the
> > parameters of a MIME type, specifically.)
> 
> An easy way to deal with this is to have a hash table of key:value 
> pairs for
> your MIME type.  Monikers are actually very powerful, though, and 
> allow the
> resolved object to continue parsing the moniker to see what it 
> wants to do. (For example, "http://foo/blah/x/y?a=b&c=d" could be 
> said to have
> non-uniform parameters; but the "http" moniker handler is 
> responsible for
> parsing the rest of the string.  Even though it's non-uniform, it's 
> still
> all one string.)
> 
> 
No, you've still misunderstood. It doesn't really matter anyway. MIME 
types have:

1) A type. (Variously called the "media type", or simply "the type". 
The whole nomenclature of it all is very, erm, fluid - MIME types are 
often referred to as "content types" and "media types" as well.)
2) A subtype. (Always called this, I think.)
3) Zero or more parameters.

Now, normally, the parameters don't matter - if they exist at all. 
For the top-level type "text", there's always a charset parameter. 
For text/plain, there's format, too. (This message should end up 
being text/plain; charset="us-ascii"; format="flowed" I think.)

For some, it starts to make a real difference - text/icalendar, for 
instance, uses parameters to indicate what the iCalendar object is 
doing there - if it's a request or not, for instance. So the 
parameters are intended to be useful to the recipient, allowing them 
to make decisions about how to handle the content *without* having to 
examine it first.

All current desktop MIME dispatching, as far as I'm aware, uses 
solely the top-level type and subtype, and never the parameters. 
Technically, it needs to check the charset for text parts, and switch 
to application/octet-stream handling for unknown charsets, but we've 
had this argument before.

Parameters on the URI are much weirder, applying to, potentially, 
every level of the path and other horrors. There will almost always 
be forms that throw us - I'll be willing to bet that smtp scheme URIs 
will bewilder some parsers. But it doesn't matter of course, because 
we don't understand them anyway, so we won't even attempt to make 
sense of anything past the colon.

[Aside: smtp URIs, being defined in a draft which, I think, has 
probably expired, need a token bit of explanation. They are of the 
form:
smtp:user;AUTH=*@smtp.example.com
In other words, they have the username, SASL mech, and hostname of 
the SMTP server (and port too if you want), but they don't have the 
normal "//" you'd expect. I have no idea why, not having been 
involved in their creation, but I suspect it relates to the fact the 
resource they address is effectively an abstract - they're used as a 
place to submit email, and you can't make a local copy of an 
SMTP/Submission server.]

> > It seems to me that for every time you request the object 
> attached to > a URI - let's call it dereferencing it - then you 
> need to provide a > URI used for context.
> 
> This doesn't seem necessary to me.  Where a file is linked *from* 
> doesn't
> have much effect on what I want the file to do, in general.  What
> application I'm running might have an effect, but that's fine; each 
> app
> should be able to provide overrides for how it wants to view 
> certain MIME
> types.  (I doubt that letting apps override how they want to deal 
> with
> certain URIs will be much use.)
> 
> 
It's essentially required for schemes like 'cid', and useful for 
'mid' - although I don't know of any uses of the latter. I thought it 
might be of use for figuring out what 'http' scheme URIs are worth 
probing for. I think I've seen them used for almost every application 
level protocol you can think of, and several you can't. WebDAV, 
subversion, XCAP... They all behave like the base http, so they're 
usable in a webbrowser - just not useful.

DAV, XCAP, Subversion, et al, don't have their own scheme at all. So 
it's essentially impossible to do dispatching to a sane client 
without performing some kind of a probe.

> If it were me, and I dragged an http: URI onto my desktop, I'd want 
> it to
> act the same regardless of whether I dragged it there from my mail 
> reader,
> my file browser, or my web browser.  Thus attaching a context to it 
> wouldn't
> help me very much.
> 
> 
The problem is that if you grabbed the URI from subversion, or a web 
folder, and slung it onto your desktop, then a reasonable expectation 
would be to reopen it in the same way later.

Problem being, 'http' means lots of different things layered on top 
of HTTP. There is no such thing as a DAV URI, nor an XCAP URI, nor a 
DeltaV URI. There are merely web servers with a bunch of funny stuff 
tacked on. The funny stuff is, or may be, the only part of what you 
want. Plenty of proposed services over DAV (Yes! No more overloading 
HTTP, we can overload DAV, now!) involve the actual resource being 
essentially a dummy, placed there because you need one, and to 
appease webbrowsers inadvertantly using them.

In other words, the actual resource may be partly or purely present 
for legacy support purposes. I don't want my desktop cunningly 
designed to handle legacy support stuff. :-)

> Similarly, if you click on a mailto: URI in your web browser, you 
> don't want
> it to pop up an empty browser window.  It's the same problem in 
> both cases. 
> But we can deal with it.  For example, the URI API might be defined 
> to
> ignore the output entirely if the object returns a NULL blob, but 
> to display
> a blank page if it returns a "" (empty string) blob.  Or we could 
> just add a
> function to the API, is_actually_viewable().  Or whatever we want.

Or you could make the interface fit the model, instead of trying to 
make it the other way around.

We know, a priori, that some URIs are not going to produce an 
octet-stream. We don't need to do anything more than look at the 
scheme in many cases. "imap" is tricky because we need to do more 
than that - it's a case where you can sometimes get an octet-stream, 
sometimes not - but if we don't have support for "imap" at all, then 
we fall back to the default case, which is a URI which you can't punt 
onto something else, and that cannot produce an octet-stream for us.

That's going to be the case for many URIs anyway - anything we don't 
handle, or don't recognise, or that simply doesn't make sense (such 
as 'dav').

Dave.