New handling for URI scheme handlers

Tue Dec 3 01:10:55 PST 2013

On Tue, Dec 3, 2013 at 8:32 AM, David Faure <faure at kde.org> wrote:
> On Monday 02 December 2013 09:40:00 Jerome Leclanche wrote:
>> All the client does is, for http:
>>  - Parse the url given to it
>>  - Identify whether it's potentially a file through its name (get a
>> filename if there's one)
>
> That's broken.   http://www.davidfaure.fr/kde, is that a file or a directory?
> In a way that one is both, actually. It returns HTML so it's a file, but it
> contains other files so it's a directory :)

Neither; it's an html document. Just to be clear, in the vocabulary I
used, a "file" is anything that is not an html document (and that
should be read by something else than the browser). http has no
concept of directories.

>
>>  - If it is, run a match against the xdg patterns on it
>
> That's broken. http://example.org/cgi-bin/script.pl looks like a perl script
> but actually returns HTML. There are a thousand more examples like that.
> Extensions over HTTP cannot be trusted.

My apologies, I see that my reply to that point was only sent to
Simon, pasting it below. The gist of it is that false positives are
fine.

"""
I'm aware, hence the "potentially".
There will be false positives, and the false positives will go one of two ways:
 - URL wrongly identified as a non-file when it is
(show-image.php?id=123): the browser will end up being the middleman,
as per current behaviour
 - URL wrongly identified as a file when it's not (/wiki/favicon.ico):
non-ideal as we have one extra request, but HEAD takes care of that.
In your case, Content-Type: text/html; charset=UTF-8

As I said, the worst case is when a potential match
(/wiki/favicon.ico) rejects HEAD for whatever reason. In this case,
it's passed to the browser and the usual fallback happens.

This is no different than using the mime db to only match by name and
not content because of performance concerns. As we all know, foo.txt
may as well be an application/zip, but it's rare enough that it
doesn't warrant reading everything just to display a file type. It
only becomes"
"""

>
>>  - If there's any match that is not an app that is associated with
>> x-scheme-handler/http(s), do HTTP HEAD on the url
>
> That's broken. HTTP HEAD is badly implemented by many many webservers.
> We used to use it, but we don't any longer. Instead we start a HTTP GET, to
> get the headers, extract the mimetype, put the download on hold, and resume it
> from the launched app. Works great for one-time urls too -- but yeah, it
> relies on using the same underlying http technology and being able to resume a
> transfer started by another app [which we can do in KIO since the transfer is
> handled by a separate process].

HEAD support is a bit of a concern, I agree (if nothing supports HEAD,
we always end up on the fallback case). One-time URLs should not break
with GET (since they dont return their body), but web devs doing
stupid things is not something we can really prevent.
What I described will work without issues for a lot of cases. It will
work for any webserver that serves files directly (and hasn't been
specifically configured to reject HEAD). It will work as expected for
99% of document URLs (text/plain and all its children will have to be
special-cased to always open in a web browser, of course, otherwise
every .php url will trigger the HEAD)

So all in all, it theoretically works for most scenarios. If I get
some free time I'll write and use a prototype handler over the next
few days to see how well this theory holds up.

>
>>  - If the HEAD was successful and returns a mime type in the mime type
>> db and the mime type is associated with an app, *download* the file
>>  -> Hand off downloaded file to the associated app
>
> Downloading into a temporary file means no incremental rendering
> (e.g. for a long text, or image, or worse, movie).

On anything else than NTFS, streamable formats are perfectly readable
while they're being written. If you download a large video file and
start playing it immediately, it will stream just fine.
Of course it's a different concern for other file types. Text editors
will spam you with "this file has been modified by an external
process". Image viewers will render half an image and not care
anymore. That's down to expectations. But all that is something that
can be handled in the download manager itself.

I love what you guys did with KIO but it's non-ideal for other reasons.
I need to think about all this, because it actually conflicts with an
unrelated component I wanted to work on (binding apps to domains).

>
> --
> David Faure, faure at kde.org, http://www.davidfaure.fr
> Working on KDE, in particular KDE Frameworks 5
>