[Roadster] sqlite database spec

Fri Aug 18 12:05:50 PDT 2006

On Fri, 2006-08-18 at 14:48 -0300, Ian McIntosh wrote:
> In response to Adam Renner's post, here's one possible database spec
> that has been developed jointly by Jeremy Cole, Ofer Achler, and me.

It appears you're storing binary integers in the database.  I agree with
this, but want to bring up the cross-platform issue:

   http://www.sqlite.org/faq.html#q6

Right now I can copy a sqlite database from any machine to any machine
and it all just works.  I hope you define the binary lobs so that that
continues to work.  A single Roadster database should be able to be used
on any architecture and any endian machine.  (You would need to
recompile the binary of course.  :)

Also, it would be nice if everything didn't have to be in one gigantic,
monolithic database.  Would it be possible to split the country up into,
say, eight 100 MB databases, instead of one 800 MB database?  That way
people wouldn't have to be burdened with areas that didn't interest
them.

Finally, what about about version numbers or dates or something?  The
Tiger data includes a lot of mistakes.  It would be nice to release
updates.  Sqlite is fairly rsync-friendly so releasing periodic updates
doesn't have to be a gigantic bandwidth hit.

     - Scott

> It doesn't require any "spatial" features in the database.
> 
> The main problems with our current approach using the MySQL Spatial
> Index are:
> 
> 1) Every point takes 16 bytes.  Storing the roads in the US would take
> several gigabytes this way.
> 
> 2) It doesn't let you select by area AND by type (eg "highway") in any
> optimized way, which is related to 3, below.. 
> 
> 3) For every road segment read in (or just examined), MySQL potentially
> does a 'seek' operation on the harddrive.  Seeks are slow, and hundreds
> of seeks are unacceptably slow.
> 
> Put simply, we need fewer seeks and smaller storage on disk.
> 
> ======================
> 
> Here's the proposal:
> 
> * Split the world up into square tiles.
> * All objects within that part of the world are stored in the tile.
> * Objects that span 2+ tiles are cut.
> * Object coordinates are stored as integer offsets from the tile's
> origin (with 1 unit = ~5 feet).
> * A tile number is derived from its Lan/Lon position in the world,
> making it possible to generate a list of tile IDs that cover any given
> area of Earth.
> 
> CREATE TABLE tile (
>   id         	INT4            NOT NULL,
>   data          MEDIUMBLOB      NOT NULL,
>   PRIMARY KEY (id)
> );
> 
> CREATE TABLE object_name (
>   id		INT3 UNSIGNED   NOT NULL,
>   name		VARCHAR(30)     NOT NULL,
>   tile_ids	MEDIUMBLOB	NOT NULL,
> 
>   PRIMARY KEY (id),
>   INDEX (name(7))
> );
> 
> (The object_name.tile_ids field lists tiles where this roadname shows up
> and is used for search.)
> 
> The size of each tile is chosen so that each point within the tile only
> requires 3 bytes (instead of MySQL's 16) while maintaining 5 foot
> precision.
> 
> We have the format of tile.data spec'd out, but I'd like to hear
> feedback on this approach before diving deeper.
> 
> -Ian
> 
> _______________________________________________
> roadster mailing list
> roadster at cairographics.org
> http://lists.freedesktop.org/mailman/listinfo/roadster