[Clipart] XML hierarchies, the DMS, daemons, and Debian

Tue Oct 12 18:38:11 PDT 2004

Bryce Harrington <bryce at bryceharrington.com> writes:

> The freedesktop.org system is Debian.  

Well, that explains why it has ancient versions of some things.

> I don't have a debian system I can develop on at present, but I'm
> sure I can scrounge one up if it becomes necessary.

Debian was the first Linux distro I ever used, but I haven't
experimented with it for quite some time now -- early 1999, IIRC.  The
version I had wasn't even current then, so I have never used apt (back
then, there was only something called dselect).

I really ought to get myself a spare Pentium/90 for playing with, so I
can have a system to install various distros on without having to fool
with my main workstation.  I want to do that anyway, so I can
experiment with BSD, and it would let me mess with Debian too.

I don't like learning to do things on the freedesktop.org server
because the lag time is prohibitive for many things.

>> The DMS is going to be called by various code, most of it running
>> in a CGI interface (or later maybe mod_perl).  The code that calls
>> it is going to get auth credentials from a user _once_ and then
>> send the user a magic cookie, which the browser will store.  The
>> cookie shouldn't contain the auth credentials themselves for
>> security reasons, so subsequently something will look up the cookie
>> in a database or something to see that it's valid and which user it
>> represents.  The DMS could serve as the "database or something"
>> that everything looks up the cookie in.  Or that could be separate
>> from the DMS.  Not sure which approach is better there.
>
> I was planning to investigate auth for SOAP daemons a bit today but
> didn't get a chance.

I don't know anything about SOAP, so I can't comment meaninfully here.

>> We'd like all the pieces of the site to use the _same_
>> authentication mechanism, so that the user can log in once and use
>> all parts of the site.  This means both PHP and Perl code need to
>> be able to call code that looks up the cookie.  (However, the thing
>> that takes auth info from the user and sends it to the auth
>> thingydo to _create_ the cookie and send it back to the user can be
>> written in one language.)  One of the easiest ways to achieve this
>> would be if the cookies were stored in a relational database (e.g.,
>> MySQL), since both PHP and Perl have good database interfaces and
>> could easily look up the cookie there.
>
> Totally agree, this is the best possible case.  Storing the auth
> info in a rdms would also enable the remote commandline approach
> mentioned above to be able to use the same authentication.

Makes sense to me.

>> Since the DMS is one of the things that needs to be able to check
>> that the user is valid, the calling code can pass it the cookie,
>> and it can look up the cookie in the database for itself.  That way
>> the DMS doesn't have to trust the calling code -- though it does
>> trust the code that checks the user's credentials and creates the
>> cookies and stores them in the database; everything has to trust
>> that.
>
> Yeah, that'd be cool.  I'm ok with the dms trusting the calling
> code, although you're right that it's less secure.

There's no real need for the DMS to trust the calling code if we store
the auth info in the RDBMS; then it can look up the cookie itself
easily.  The calling code just passes it the same cookie the user's
browser sends.

> But as long as we configure things carefully it could work ok.  But
> yeah, if we could provide a common mechanism for handling all the
> auth, that'd be preferable.

I'm thinking this:

We have one trusted script that handles logging the user in.  The
login script has to have write access to the cookies table in the
RDBMS, but everything else only needs read access to the cookies
table.  The login script is also the only thing that needs to be able
to check a user's password, so it could have a private passwords table
that only it has any access to at all.

So when login.cgi (or whatever) is called with no input (i.e., no
query string and no POST input), it sends a blank login form, which
points back to the same script; when the user submits the form, the
login script checks the password and authenticates the user.  If the
username and password are a match, it creates a thirty-byte random
string of alphanumeric characters, which it sends to the browser in a
cookie; it _also_ creates a new entry in the cookies table keyed on
the _same_ thirty-byte string; this record will also contain the
user's username and possibly other info, such as IP address, an
expiration timeframe, and so on.

Now, when any other script (including the DMS) wants to check the
user's authentication, it just takes the cookie string the browser
sent and looks that up in the cookies table, to which everything
running on the site has read access.

At the RDBMS level, this means creating one extra database user
account that has almost no privileges but can read the cookies table,
and probably also the users table and maybe certain other tables
(userprefs, ...).  Only scripts that actually *need* write access
would use the account with write access.  (Even then, they'd only have
access to the clipartweb database, not the databases of every project
on the server.  I don't even want to know passwords that would have
access to all the databases, much less put them in cgi scripts.)

In addition to the login script, there would be one other secure
script that would use the same credentials as the login script to get
write access, and that's the script that registers users in the first
place; it would presumably ask the user for a proposed username, other
information we want, and an email address; it would create an entry in
the users table but with a "notenabled" field set, and it would send
the user an email with a magic URI pointing back to the same script
with a query string containing a magic token that matches the magic
token embedded in the notenabled field; on receiving this token, the
script would then enable the user account, and the user can then log
in any time using the login script.  This guarantees we have a working
email address (initially) for every user, but users can create their
own accounts without any manual intervention.  If a problem with bots
develops we could put a CAPTCHA in place, but that will *hopefully*
not be necessary.

The only security issue here now is that the cookie can be sniffed by
a man-in-the-middle attack, but that still only allows the attacker to
impersonate the sniffed user, and only until the user logs in again,
at which point the login script will delete any previous cookies
belonging to the same user.  The only way to solve that one is for the
whole site to be https, which I think is really unnecessary for us.

The only question left is whether the login script should be https; if
it's not, a man-in-the-middle attack could sniff an actual password.
Whether we consider that a likely enough or serious enough threat to
worry about is an open question.

>> This brings us down to that code, the code that examines the user's
>> credentials and, upon determining that they're valid, creates a
>> cookie to give the user, and stores the cookie in the database
>> along with the username of the user.  How does this code determine
>> whether the user's credentials are valid?
>> 
>> I don't think it should check based on OS-level credentials,
>> because I think it will be useful for people to have an account for
>> contributing content without having a shell account on the server.
>
> correct

Okay, so we'll have our own clipartweb user accounts, not related to
OS accounts.  So far so good...

>> We could store the usernames and passwords in another database
>> table, but then anyone with database-level access to it could read
>> them out, and since people tend to use the same password for
>> multiple things, we might ought to avoid that.  We could hash it or
>> something, but I know a lot less about that -- although there are
>> modules on the CPAN that could do the hashing for us.
>>
>> This raises also the question of account creation:  do we want
>> people to be able to easily create accounts?  Should they have to
>> have a valid email address to do so, and receive a message with a
>> key to unlock the account?
>> 
>> This is the part of authentication we most need to talk about, I
>> think.
>
> I've actually developed such a system at OSDL when the company first
> started - and it's still in regular use.

I'm suitably impressed.

> However, I've got a plan here.  One of the things I was looking for
> when I picked out mantis was its authentication system.  Having
> developed auth systems before, I really didn't want to have to do it
> yet again.  Most bug trackers have an auth system built into them,
> for obvious reasons, and I figured regardless of what we did for
> auth, we'd want to tie it in with bug tracking, so users wouldn't
> need separate accounts for submitting bugs from doing other stuff.
>
> What I would suggest is this - log into freedesktop.org and browse
> through the mantis source code to learn how it does the
> authentication.  I think the mysql account info is listed there; use
> that to log into mysql and look directly at the database if you're
> comfortable doing that.  

I'm comfortable with MySQL.  Mantis is what, PHP?  I can probably read
PHP well enough to figure out what it's doing.

> Doublecheck that it produces login cookies that we can reuse (I
> believe its a ticket-based system iirc.)

Ticket-based is good.

I'll try to look into this, after I get the other thing done I'm
working on.

If this will work, we might even be able to get out of creating a
login.cgi, since we could just point to the Mantis one.  Maybe.

-- 
$;=sub{$/};@;=map{my($a,$b)=($_,$;);$;=sub{$a.$b->()}}
split//,"ten.thgirb\@badanoj$/ --";$\=$ ;-> ();print$/