[Roadster] Importing Tiger issue

Arne Götje ( 高盛華 ) arne at goetje-online.de
Mon Aug 1 16:01:01 EST 2005


Note: I changed my subscription to another e-mail address... hopefully 
Ian's mails now come through...

I'm copying from the mailinglist archive now...

-------------------------
> Hi Arne,
>
> > I'm not aware of libmygis' capabilities
>
> libmygis is a work in progress.  The author, Jeremy Cole, is 
subscribed
> to this list.  I think his plan is to make it support whatever 
Roadster
> needs (at the least), but don't take my word for it.  The only problem 
I
> have with libmygis is that I don't know when it'll be ready. :)
>
> I just learned of GDAL.  I'd like to look into it.
>
> > not sure if it's a good idea to rely on a third party library for 
that 
>
> If it's an open-source library and we can fork it if necessary, what 
is
> the danger?

ok... then it's fine. :)

> > I think the sorting preferences should be configuarble by the 
user... :)

> What would the options be?  Like the Wiki hints at[1], I think it 
might
> make sense to have an Advanced Search dialog at some point.  It would
> let you specify all parts of your search, just like Google's Advanced
> Search[2].

Well, the single search box is a nice idea... however address formats 
used in other parts of the world would be problematic to use with it... 
I show some examples below.

> > which tables are looked up in which order

> It's all in search_road.c and search_location.c ('location' being the
> name for POI in the code).  I think the code is pretty well 
documented,
> but I'd be happy to answer any specific questions about it.

Ok... :)

> > I'm keen to provide some efford to make roadster be able to support 
other 
> > countries' mapping data and to be able to search for their addresses

> I decided to not worry too much about internationalization of the
> storage because a) data isn't available yet and b) I didn't know how 
to
> do it. :)

> It would be helpful if you could provide as many real-world addressing
> examples as possible that don't fit the "STREET, CITY, STATE, COUNTRY"
> pattern.

a) not available for free, that's correct. Usually one has to buy the 
data for a lot of $$$.

b) Yes, I can help with that. The problem however is not the STREET, 
CITY, STATE, COUNTRY pattern, as this is more or less the same 
everywhere... the problem lies in the usage of those patterns. :)

I would say, to successfully distinguish the different parts of an 
address we need to have different seach fields. One for each address 
part:
* COUNTRY
* STATE/PROVINCE
* COUNTY
* CITY
* ZIP-CODE
* STREET
* NUMBER

Here are some address patterns as examples (All example are NOT real 
existing addresses, but the patterns are similar to real ones.):

NOTE: in all examples the odd and even house numbers are often not in 
sync and in some cases not even seperated (like in Berlin, for 
example). House numbers on one side of the street are counted in a row 
starting from 1 to the maximum number at the end of the road, then 
continue on the other side of the street all the way back.

-----------------
Germany and most of continental Europe:

STREET NUMBER
ZIP CITY

(no STATE or COUNTY used in postal addresses, but can be used optionally 
to narrow a search).

Streetnames can be devided into different classes, similar like in the 
US, for example in Germany:
* Straße (street), sometimes used as Strasse or Str.
* Allee (Boulevard)
* Weg (Way)
* Gasse (Alley)
and others

Those street classifiers do not necessarily stand seperate (example: 
Landstraße or Landstr.). Therefor we need to classify the type of 
street manually.

The NUMBER stand behind the streetname and can also include number 
ranges (i.e. 11-15) or letters to classify sub-ordinate buildings 
(letters a - z, i.e. 110c)

STATE and COUNTY is not used in postal addresses, but can be used to 
narrow a search, as some cities have similar names. COUNTIES in Germany 
have an abbrevation, these are also used on car number plates. 1 to 3 
letters in Germany.
STATEs in Germany also have abbrevations, which are two letters.

ZIP codes differ in each country. Germany, Italy, Spain, France and aybe 
others use 5 numerical digits only, even to distinguish different 
regions in a city, Austria, Switzerland, Luxembourg, Denmark, Belgium 
use 4 digits only.
The Netherlands use 4 digits plus 2 uppercase letters. These are NOT 
state abbrevations.

-----------------------
UK: similar system like in Canada

-----------------------
Taiwan, China (excluding Hong Kong and Macao):

two different systems:
1. native encoding (chinese characters)
2. westernized transcription system
The transcription system is standardized in China (Hanyu Pinyin), but 
not in Taiwan. In Taiwan exist multiple transcription systems. We would 
need a translation table for all possible transcriptions for the 
STREET, CITY and COUNTY fields.

In the native encoding it is not uncommon to abbrevate the STATE, COUNTY 
or CITY with only one chinese character.
In the romanized address, COUNTY and STATE and CITY usually do not 
include "City", "County" or "Province".

Examples:
a) China
1. native encoding:
ZIP
(STATE) (COUNTY) CITY STREET NUMBER (all in one line without spaces)

100011
北京市西直门外大街100号

ZIP is 6 digits and also distinguishes postal areas within a city.
in the above case STATE and COUNTY are missing, as Beijing City is big 
enough to be recognized... :) for smaller villages or cities however, 
STATE and COUNTY maybe used.
(e.g. 福建省厦门市 -- Fujian Prov. XiaMen City)
STATE, COUNTY, CITY, STREET and NUMBER (as well as extensions) can be 
distinguished by characters.
(e.g.: 省 = Province, 市 = City, 大街 = Boulevard, 街 = street, 路 = road, 号 = 
number, 之 = sub-ordinate number, 楼 = floor, etc.)
Streets can also have small alleys and lanes, which are numbered through 
together with the house numbers (up to three levels (巷 = lane, 弄 = 
alley, 衖 = sub-ordinate alley)).

For sub-ordinate house numbers: 101之1号, 101之2号, etc.

2. romanized transcription system:

Address pattern follows the US style:
NUMBER STREET
CITY (, COUNTY)
(PROVINCE)
ZIP

In China the transcription is standardized:
No. 100, Xizhimenwaidajie (usually no spaces in the streetnames)
Bejing City
100011

No abbrevations available for COUNTY and STATE level.

b) Taiwan:
1. native encoding:

same like in China, but used traditional chinese characters and 
sometimes different vocabulary.

No STATE used in postal addresses (there are only 2 provinces in Taiwan: 
Taiwan and Fujian), but COUNTY is used frequently.
ZIP code has 3 or 5 digits. 3 digits for City or borough in bigger 
cities, the last 2 digits for posta areas within a city (bourough).

10358 (The zip code is not correct here, it's just an example)
台北市中山北路3段125巷1弄3衖53之1號

In Taiwan long streets are divided into sections (段), ranging from 
Section 1 (downtown) to Section 7 or 8 (far far away), each section 
having aprox. 1000 house numbers, sometimes less, sometimes more.

Another example with COUNTY (縣) names:
桃園縣八德市

2. romanized transcription systems;
There is no standard in Taiwan, multiple concurrent systems are in use 
(with or without spelling mistakes... *sic*)
Spelling usually refer to US street classifiers (Road, Street, Lane, 
Alley, Blvd), directions in the streetnames (北 = North, 南 = South, 西 = 
West, 東 = East) are integral part of the street name and not a pure 
direction. They are usually abbrevated with one letter:

No. 53-1, Alley 1-3, Lane 125, ZhongShan N Rd. Sec. 3
Taipei City
10358 (The zip code is not correct here, it's just an example)

Lanes are numbered together with house numbers. So, Lane 125 would be 
between numbers 123 and 127. Alley 1 is the first cross-alley on this 
lane and Alley 1-3 is the 3rd cross-alley of the 1st cross-alley of 
Lane 125... :)
Taiwan's cities are jam-packed with lanes and alleys, they didn't bother 
to give each small alley a seperate name, that's why they just numbered 
them through... still better than Japan though... (see below).

Because the lack of a standard for transcription systems, multiple ways 
exist for transcribing counties, cities and streets:
中山路 could be written:
* ZhongShan Rd.
* Zhong Shan Rd.
* Zhongshan Rd.
* ChungShan Rd.
* Chungshan Rd.
* JhongShan Rd.
* Jhongshan Rd.
as well as all of these combined with spelling errors... (like the h 
missing) and with or witout spaces. :(((

Popular spellings for 新竹:
* XinZhu
* Xin Zhu
* Xinzhu
* Hsin-Chu
* Hsinchu
* Hsin chu
* Shinchu
* Shin chu
* Sinchu
* Hsinjhu
etc...

-------------------
Japan:

The transcription system is standardized, similar like in China.
1. native encoding

ZIP
COUNTY CITY AREA-NUMBERS (without spaces)

No Streetnames in use

ZIP codes are 7 digits, 3 digits for the city, then 4 digits for the 
area, seperated by a - from the city digits:

243-0041
神奈川県茅ヶ崎市茅ヶ崎2-1-30-205

County is 県, City is 市.
After the CITY stands the area. Areas have a name (and a number in many 
cases). Often the area name is equal to the city name, like in this 
case (茅ヶ崎2). There is no real pattern in naming such areas. After the 
area is a house block code (here: 1-30-205), which means block 1, house 
number 30, room 205. The blocks are numbered at random, same goes with 
the house numbers within those blocks. I'm not sure what's the maximum 
number of levels of this numbering scheme used.

-------------
Hong Kong:

no ZIP codes, the rest of the addresses follow a pattern similar to the 
UK and Canada

Example:
Rm. 5, 48 Fl., Tower E, Kings Plaza,
No. 38 Kings Road East 
Kowloon

Chinese version would be all in one line without spaces, like in China 
and Taiwan, but with different characters (vocabulary is different). 
Romanization system is standardized.

------------------------

> > Is roadster UTF-8 safe yet? :p

> The GUI is.  I'm not so sure about the database. :)

This can be solved easily if we stick to MySQL... just force UTF-8 as 
encoding.

> > the huge overhead of the embedded mysql server...

> What overhead are you referring to?  Memory, disk space, disk access
> time, CPU time, or what?

Memory and diskspace...
I wonder if we can store the database in a binary form to save 
diskspace... (currently the (incomplete) database of California takes 
more than 500 MB on my system !)

> I do have some big problems with the way MySQL's Spatial Extensions
> work.  The biggest problem is that we incur one disk seek for each 
road
> segment read in.  It's also a bit heavy on the on-disk storage size.

> I'd be happy to work with you to come up with a new storage scheme!

Let me take a look at the source code first... :)

Cheers
Arne
----------------------------------- 
-- 
Arne Götje (高盛華) <arne at goetje-online.de>
PGP/GnuPG key: 1024D/685D1E8C
Fingerprint: 2056 F6B7 DEA8 B478 311F  1C34 6E9F D06E 685D 1E8C
Key available at wwwkeys.pgp.net.   Encrypted e-mail preferred.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://lists.freedesktop.org/archives/roadster/attachments/20050801/94186077/attachment.pgp


More information about the roadster mailing list