[Roadster] Converting TIGER data to a MySQL database

Jeff Garrett jeff at jgarrett.org
Mon Feb 13 14:37:34 PST 2006


Adam Renner wrote:
> >Well, Roadster already uses MySQL to store the TIGER data.  The
> >application imports the TIGER files to a MySQL database, and does take
> >advantage of spatial indexing.  (The importing happens in import_tiger.c
> >and the database description can be found near the bottom of db.c.)

> Checked out the source; I am somehow on the same wavelength with 
> whomever created the database layout. Mine was extremely similar. 4 road 
> tables for varying size roads, a table for road names, a table for 
> cities, one for states, one for zipcodes. There is some differences in 
> how we were handling points of interest.
> 
> I noticed in the screenshots that parks and rivers were also displayed 
> on some of the maps, but I did not see tables for this data anywhere in 
> db.c....   Did I miss something?

Roadster stores the polygon boundaries in the roads databases.  It has a
field to tell which type of feature (road, lake boundary, etc.) a given
"road" is.  (Similar to TIGER in this regard...)

> Another table that I recommend including is one that I refer too as my 
> "Glossary". I use it for lookups, so in case somebody enters something 
> like "boulevard", I can determine that this is equivalent to "Blvd" and 
> is a road type.

Roadster has a list of road suffixes with "proper names" and
abbreviations.  (This is in data/road-suffix-list.txt.)  It can thus use
the aliases in searching.  (To be more precise, on import and on search,
it translates the suffix to a number -- the index of the suffix in this
list.  It tries to match the resulting integers.)

> >Personally I think it'd be nice to have more POI data (esp. for
> >Chicago :-)).  Also, having a way to get decent driving (or travel)
> >directions would be good.  To start, Roadster has no way to tell which
> >streets are one-way.  And there is little weighting information: no
> >knowledge of speed limits or road conditions or traffic.  The BTS has
> >some data (e.g. from the Highway Performance Monitoring System) that
> >could be useful...

> POI data is what I have been working on. Right now I am using 
> chefmoz.org data for restaurants. Shapefiles for a lot of things can be 
> downloaded from nationalatlas.gov:
> http://www.nationalatlas.gov/atlasftp.html

Nice.  Thanks.

> I have also recently discovered the google local for mobile phones site. 
> It would be good to build a tool that could retrieve all the chinese 
> restaurants within a 20 mile radius of some location:
> http://www.google.com/xhtml?site=local&dm=none&latlng=39900000,-83200000&radius=20&q=category:+Restaurant+Chinese

I've had mixed results with Google local, but maybe I just don't know
how to use it yet.  Personally, I think it'd be nice if I had a list of
POIs important to me (restaurants, bars, pool halls, ...) near me.
Since I don't always have wifi access (when out), it'd be nice to crack
open the laptop and spontaneously choose a restaurant near me.  (This
lacks some of the flavor of wandering around for miles though...)  Even
more, it'd be nice if I could get directions to wherever it is I wish to
go (even disconnected), esp. (for now) on public transportation.  But
this may be a project for further down the line.  :)

Another advantage of having a Google-free POI set is the possibility of
sharing POI sets and even implementing some sort of ratings/reviews...

> >>The Pros to using a MySQL database are: spatial indexing is very fast, 
> >>MySQL is crossplatform, SQL is easy to work with and lots of developers 
> >>have experience, MySQL can be used with many programming languages, if 
> >>foreign country data ever becomes available, it will not be in TIGER 
> >>format and will need converted anyhow.

> >The last point is salient, but has little to do with MySQL.  The
> >importation is rather simple, and there's indications (see README) that
> >it may move to a separate program.  There might be interest in importing
> >ArcView shapefiles or Arc/Info files or ...
 
> Creating various wizard dialogues for importing various sorts of data 
> and setting up the database is the direction I was headed. Then the 
> wizard could either be run individually, as part of a separate app, or 
> within the mapping application depending on the users needs. My final 
> goal, however, was that end users would receive most data, especially 
> tiger data from an alternate source such as an SQL dump. Then it lessens 
> the headaches and complications of dealing with a large number of 
> obscure data sources.

I totally agree.  Users shouldn't have to deal with obscure data
sources, or know the difference between TIGER/Line files of various
ages...  What I was thing about was (I'm on a Debian-based distro)
making packages, like roadster-data-ca, roadster-data-il, etc.  Then a
simple "apt-get install roadster-data-il" would get you the data.
(Separate packages probably for POIs.  These could get out of hand
quickly.)  The way I was thinking about doing this was:

(1) Make Roadster use SQLite instead of MySQL
(2) Make Roadster get its information from more than one source:
    e.g. first search for db files in a system directory, and then for
    local ones (or changes) under ~/.roadster/data
(3) Import the TIGER/Line data into SQLite databases (with a script :))
(4) Massage the data (also with script)
    This could include polygon stitching, maybe other things?
(5) Make the deb's and rpm's (for those RedHat people :))

What do you think? :)  I've done very little of this so far.

(A wizard is fine, but I think it could be hard to make it
non-confusing.  And the data import shouldn't so much be a user
concern.)

> >>The Cons to using a MySQL database are: Spatial Index creation is slow, 
> >>Table sizes can get very large.

> >You miss one important point: MySQL databases are not (as far as i know)
> >portable.  Correct me if I'm wrong, but taking a database to a different
> >machine (possibly with different type sizes and byte order) may  not
> >work.  This would be a boon for packagers who could run the import on
> >the TIGER or other datasets, and publish the resulting databases.
> >SQLite doesn't have this problem, and there is a note in TODO suggesting
> >moving to SQLite.  (It doesn't have spatial indexing, but this is
> >probably not be such a slowdown.)
  
> The tables and indexes can be directly copied from one machine to 
> another, at least to a certain extent. I have copied from a celeron 
> machine to an athlon xp machine with no problems. I was using the same 
> operating system and same MySQL version. If I had tried the copy with a 
> new version of MySQL or a 64 bit processor, I have no idea what would 
> have happened.

[solutions omitted]

Fair enough.  Barry has pointed out this is a non-issue.  Sorry for the
error.  I'm still a little bit worried though.  In the proposed
packaging thing above, it'd mean we would need at least two MySQL
embedded servers (one for the system db and one for the per-user) if I
understand things and maybe more (if you have data for multiple
states...).

In the scheme I suggested, we essentially want to process the TIGER data
and split it into different parts (by state or whatever).  The advantage
of having an SQL-like interface is that we can index the data, but my
scheme would be unreasonable if that interface has a large overhead...
I'm thinking SQLite would be better.  What do you think?

> I hope that we can find a way to merge our efforts. It appears that I am 
> repeating a lot of the work that you have already done on this project.

"you" is not be the right pronoun.  I've contributed nothing (just
lurked on the list). My understanding is that this is practically all
Ian's work.  I am interested mainly for the reasons you can infer from
this e-mail.  I would like a decent mapping/POI/direction thing.  And
it'd be really cool if I could use this to explore more of this
city.  ;)

-- 
Jeff
JID: jeff at jgarrett.org
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : http://lists.freedesktop.org/archives/roadster/attachments/20060213/4d995fd7/attachment.pgp


More information about the roadster mailing list