A common VFS and a Common conf-system [Part II]

Tue Mar 1 17:48:02 EET 2005

As promised, here's some more notes/ideas.  Still dead tired due to only
four hours of sleep, but maybe I won't babble too uselessly.

I'm not going to expand on the API as originally planned - I'd rather do
that in the Wiki (and repost the below stuff there).  Can someone give
me an OK to do so, and let me know where exactly (path-wise) I should
put it?  I'd really appreciate it.

== Daemon ==

In terms of the get_poll_fd() and friend function below, let me clarify
the original intent, and then amend things a bit.

My original idea was to *NOT* have the backends run in the library or
host app itself, but instead to have a per-user daemon that handled all
of the actual VFS work, and then the library would simply be an API that
made it easy for an app to talk to the daemon.  There is one major
reason why I think the daemon approach is the best way to go, and that
reason is security.

If the backends were dlopen'd into the VFS library, that means the
backend runs in the same process as the application.  Which means that
every application must have access to all of your credentials for
accessing a service, such as FTP passwords.  Malicious apps could steal
your credentials.  With a daemon, only the daemon process needs access,
and you can set the system up in such a way that enforces that.

Second, for systems with SELinux or the Solaris/BSD equivalents, a
separate daemon makes it far easier to lock down the system.  If the
backends ran in-process with the app, then every app would need
permission to connect to various network locations and so on in order to
operate, and it would be impossible to provide useful lockdown.  With a
daemon, you can restrict most apps to have no external network access
and force them to go through the daemon.  The daemon can then provide
additional administrator-defined access control, such as refusing to
connect to an authenticated DAV share without SSL, refusing to connect
to SMB shares outside the local network, or whatever the admin wants.

== Polling API ==

That said, the daemon itself should be an implementation detail of the
library.  Users of the library should not be forced to even consider
whether there's a daemon doing the real work or not; all they should
think about is their app.  Additionally, there are backends where the
daemon might not make a lot of sense, or cases where admins or
developers might prefer in-process backends over the daemon.  For this
reason, the vfs_get_poll_fd function isn't sufficient at all, and should
be replaced with something more versatile.  Doing so will not hinder the
development or use of a daemon, but it will make it possible to do
without - a system that allows to valid solutions seems better than a
system that is locked into a single solution, especially one that may be
disadvantageous in certain contexts.

== Type Naming ==

In regards to type naming like D-BUS (CamelCaseTypeNames), I'm cool with
that.  It's pretty irrelevant at this point though, so I won't worry
about it.  When real code is written, the vfs_foo_t names can be changed
to DVfsFoo names.

== Backends ==

While maintaining interface stability for the library/application is
obviously integral, it's worth explaining how the backend interface
should work.

First off, compatibility is essential.  A backend written for D-VFS 1.0
should work in D-VFS 1.1.  If 1.1 added some new features backends can
use, those features must be optional, and older backends should work,
just with potentially less functionality than they might offer if they
were updated.

Second, I think it's important to let newer backends work with older
versions of the library.  If I write a backend for D-VFS 1.3, it should
work on 1.1.  Any features the backend provides that aren't in 1.1 would
just go unused.  This makes distribution and use of backends absolutely
painless.

The backends themselves will just export a vtable to the library.  The
vtable will have a length set by the backend so that newer libraries can
detect that the backend is missing newer vtable entries and provide a
default.  A backend can specify for each vtable entry whether to use its
own implementation of the feature (function pointer), whether to
explicitly disable the feature (function pointer to a standard function
that always returns VFS_ENOTSUPPORTED), or whether to use the default
(NULL).

The differentiation between not-supported and default has an interesting
possibility.  Many VFS operations are in truth specialized combination
of other operations.  Move could be implemented as a copy followed by a
delete.  Copy could be implemented as a read from the source to a write
to the destination.  In many cases, a backend can be provide a more
efficient version - network file systems often have a copy or move
operation that doesn't need any round trips to the client and can
possibly use fast-paths on the server like the rename syscall.

However, some backends don't have the ability to provide those services,
or developers of backends may not have the time or inclination to write
code for those operations.  In these cases, I think it would be best
that the library provide default versions of the operations that
aggregate simpler operations.  The default move function would do copy
and delete.  The default copy operation would start up a read/write pair
and essentially stream the file to the second copy.

At the very least, providing these defaults allows code sharing between
all backends that have no other option but to use the aggregate
functions.

It is also quite possible, and beneficial, to make a backend usable both
by the library (in-process) and by the daemon.  Write the code once,
distribute one object, and let the administrator decide how the backend
is to be used.  In truth, the daemon wouldn't be anything more than a
multiplexer of commands from multiple clients to the backends, basically
just facilitating connection sharing where possible and allowing
restricted access to credentials - make it use the same backends is
super simple.

The D-VFS configuration file can easily specify how to handle a
particular backend and protocol mapping.  For example, something like:

  process http neon
  process file file
  process * daemon

  daemon ssh gnomevfs

When the library is used, it looks only for the process lines.  The
config file says that http should be handled by the neon backend, that
file URIs should be handled by the file backend, and that all others
should be handled by the daemon backend.

When the daemon gets a request, it looks at the daemon lines.  It sees
that ssh URIs should be handled by a gnomevfs backend.  (Yes, guys,
that's possible, for those of you that want to leverage gnome-vfs or
KIOSlaves, especially early on.)  No others are specified, so they'd use
the default - which is whatever backends advertises support.  If more
than one backend advertises support, the one chosen is undefined; if you
want a specific backend to be used, specify it in the config file.

== Interactivity ==

A final cool feature that the library can handle would be interactivity.
The biggest use-case here is authentication information.  Again, I
believe it best that a daemon be used such that the individual apps
don't have to handle the GUI, or have access to the information entered
into the GUI (like passwords).

The D-VFS daemon could get a request to access, say,
http://idisk.mac.com.  That requires authentication.  It uses D-BUS to
ask the authentication helper to ask for a username and password for the
URI.  User enters it, it gets sent to the daemon, access continues.  Due
to the asynchronous nature of the client-side API, the application sees
no difference between waiting for authentication and a really slow
network connection.

We can take this a step further, though, if desired.  Look back to the
idea above about restricting what apps can do and access to sensitive
data.  Sure, with a daemon, we can be sure that Malicious App Foo can't
get my password for idisk.mac.com, but it can still grab, delete, or
edit files there.  With interactivity, we can specify that apps outside
of /usr must get user permission to access shares; maybe even that apps
in, say, /tmp are just flat out denied.  The interactivity support could
pop up a dialog saying something sort-of like, "App Foo is attempting to
access idisk.mac.com/sensitiveinfo.sxw.  The application may have
malicious intent.  If you did not request that App Foo access that file,
you may Deny the access to protect your data.  [Deny] [Allow]"  (I agree
that once you ask the user, you're already likely doomed with 99% of
users who will just click Allow anyhow; on administered networks, admins
can just deny any app not in /usr to have access at all, and solve that
problem there, at least.)

-- 
Sean Middleditch <elanthis at awesomeplay.com>