[Roadster] Importing Tiger issue
Arne Götje ( 高盛華 )
arne at goetje-online.de
Mon Aug 1 16:01:01 EST 2005
Note: I changed my subscription to another e-mail address... hopefully
Ian's mails now come through...
I'm copying from the mailinglist archive now...
-------------------------
> Hi Arne,
>
> > I'm not aware of libmygis' capabilities
>
> libmygis is a work in progress. The author, Jeremy Cole, is
subscribed
> to this list. I think his plan is to make it support whatever
Roadster
> needs (at the least), but don't take my word for it. The only problem
I
> have with libmygis is that I don't know when it'll be ready. :)
>
> I just learned of GDAL. I'd like to look into it.
>
> > not sure if it's a good idea to rely on a third party library for
that
>
> If it's an open-source library and we can fork it if necessary, what
is
> the danger?
ok... then it's fine. :)
> > I think the sorting preferences should be configuarble by the
user... :)
> What would the options be? Like the Wiki hints at[1], I think it
might
> make sense to have an Advanced Search dialog at some point. It would
> let you specify all parts of your search, just like Google's Advanced
> Search[2].
Well, the single search box is a nice idea... however address formats
used in other parts of the world would be problematic to use with it...
I show some examples below.
> > which tables are looked up in which order
> It's all in search_road.c and search_location.c ('location' being the
> name for POI in the code). I think the code is pretty well
documented,
> but I'd be happy to answer any specific questions about it.
Ok... :)
> > I'm keen to provide some efford to make roadster be able to support
other
> > countries' mapping data and to be able to search for their addresses
> I decided to not worry too much about internationalization of the
> storage because a) data isn't available yet and b) I didn't know how
to
> do it. :)
> It would be helpful if you could provide as many real-world addressing
> examples as possible that don't fit the "STREET, CITY, STATE, COUNTRY"
> pattern.
a) not available for free, that's correct. Usually one has to buy the
data for a lot of $$$.
b) Yes, I can help with that. The problem however is not the STREET,
CITY, STATE, COUNTRY pattern, as this is more or less the same
everywhere... the problem lies in the usage of those patterns. :)
I would say, to successfully distinguish the different parts of an
address we need to have different seach fields. One for each address
part:
* COUNTRY
* STATE/PROVINCE
* COUNTY
* CITY
* ZIP-CODE
* STREET
* NUMBER
Here are some address patterns as examples (All example are NOT real
existing addresses, but the patterns are similar to real ones.):
NOTE: in all examples the odd and even house numbers are often not in
sync and in some cases not even seperated (like in Berlin, for
example). House numbers on one side of the street are counted in a row
starting from 1 to the maximum number at the end of the road, then
continue on the other side of the street all the way back.
-----------------
Germany and most of continental Europe:
STREET NUMBER
ZIP CITY
(no STATE or COUNTY used in postal addresses, but can be used optionally
to narrow a search).
Streetnames can be devided into different classes, similar like in the
US, for example in Germany:
* Straße (street), sometimes used as Strasse or Str.
* Allee (Boulevard)
* Weg (Way)
* Gasse (Alley)
and others
Those street classifiers do not necessarily stand seperate (example:
Landstraße or Landstr.). Therefor we need to classify the type of
street manually.
The NUMBER stand behind the streetname and can also include number
ranges (i.e. 11-15) or letters to classify sub-ordinate buildings
(letters a - z, i.e. 110c)
STATE and COUNTY is not used in postal addresses, but can be used to
narrow a search, as some cities have similar names. COUNTIES in Germany
have an abbrevation, these are also used on car number plates. 1 to 3
letters in Germany.
STATEs in Germany also have abbrevations, which are two letters.
ZIP codes differ in each country. Germany, Italy, Spain, France and aybe
others use 5 numerical digits only, even to distinguish different
regions in a city, Austria, Switzerland, Luxembourg, Denmark, Belgium
use 4 digits only.
The Netherlands use 4 digits plus 2 uppercase letters. These are NOT
state abbrevations.
-----------------------
UK: similar system like in Canada
-----------------------
Taiwan, China (excluding Hong Kong and Macao):
two different systems:
1. native encoding (chinese characters)
2. westernized transcription system
The transcription system is standardized in China (Hanyu Pinyin), but
not in Taiwan. In Taiwan exist multiple transcription systems. We would
need a translation table for all possible transcriptions for the
STREET, CITY and COUNTY fields.
In the native encoding it is not uncommon to abbrevate the STATE, COUNTY
or CITY with only one chinese character.
In the romanized address, COUNTY and STATE and CITY usually do not
include "City", "County" or "Province".
Examples:
a) China
1. native encoding:
ZIP
(STATE) (COUNTY) CITY STREET NUMBER (all in one line without spaces)
100011
北京市西直门外大街100号
ZIP is 6 digits and also distinguishes postal areas within a city.
in the above case STATE and COUNTY are missing, as Beijing City is big
enough to be recognized... :) for smaller villages or cities however,
STATE and COUNTY maybe used.
(e.g. 福建省厦门市 -- Fujian Prov. XiaMen City)
STATE, COUNTY, CITY, STREET and NUMBER (as well as extensions) can be
distinguished by characters.
(e.g.: 省 = Province, 市 = City, 大街 = Boulevard, 街 = street, 路 = road, 号 =
number, 之 = sub-ordinate number, 楼 = floor, etc.)
Streets can also have small alleys and lanes, which are numbered through
together with the house numbers (up to three levels (巷 = lane, 弄 =
alley, 衖 = sub-ordinate alley)).
For sub-ordinate house numbers: 101之1号, 101之2号, etc.
2. romanized transcription system:
Address pattern follows the US style:
NUMBER STREET
CITY (, COUNTY)
(PROVINCE)
ZIP
In China the transcription is standardized:
No. 100, Xizhimenwaidajie (usually no spaces in the streetnames)
Bejing City
100011
No abbrevations available for COUNTY and STATE level.
b) Taiwan:
1. native encoding:
same like in China, but used traditional chinese characters and
sometimes different vocabulary.
No STATE used in postal addresses (there are only 2 provinces in Taiwan:
Taiwan and Fujian), but COUNTY is used frequently.
ZIP code has 3 or 5 digits. 3 digits for City or borough in bigger
cities, the last 2 digits for posta areas within a city (bourough).
10358 (The zip code is not correct here, it's just an example)
台北市中山北路3段125巷1弄3衖53之1號
In Taiwan long streets are divided into sections (段), ranging from
Section 1 (downtown) to Section 7 or 8 (far far away), each section
having aprox. 1000 house numbers, sometimes less, sometimes more.
Another example with COUNTY (縣) names:
桃園縣八德市
2. romanized transcription systems;
There is no standard in Taiwan, multiple concurrent systems are in use
(with or without spelling mistakes... *sic*)
Spelling usually refer to US street classifiers (Road, Street, Lane,
Alley, Blvd), directions in the streetnames (北 = North, 南 = South, 西 =
West, 東 = East) are integral part of the street name and not a pure
direction. They are usually abbrevated with one letter:
No. 53-1, Alley 1-3, Lane 125, ZhongShan N Rd. Sec. 3
Taipei City
10358 (The zip code is not correct here, it's just an example)
Lanes are numbered together with house numbers. So, Lane 125 would be
between numbers 123 and 127. Alley 1 is the first cross-alley on this
lane and Alley 1-3 is the 3rd cross-alley of the 1st cross-alley of
Lane 125... :)
Taiwan's cities are jam-packed with lanes and alleys, they didn't bother
to give each small alley a seperate name, that's why they just numbered
them through... still better than Japan though... (see below).
Because the lack of a standard for transcription systems, multiple ways
exist for transcribing counties, cities and streets:
中山路 could be written:
* ZhongShan Rd.
* Zhong Shan Rd.
* Zhongshan Rd.
* ChungShan Rd.
* Chungshan Rd.
* JhongShan Rd.
* Jhongshan Rd.
as well as all of these combined with spelling errors... (like the h
missing) and with or witout spaces. :(((
Popular spellings for 新竹:
* XinZhu
* Xin Zhu
* Xinzhu
* Hsin-Chu
* Hsinchu
* Hsin chu
* Shinchu
* Shin chu
* Sinchu
* Hsinjhu
etc...
-------------------
Japan:
The transcription system is standardized, similar like in China.
1. native encoding
ZIP
COUNTY CITY AREA-NUMBERS (without spaces)
No Streetnames in use
ZIP codes are 7 digits, 3 digits for the city, then 4 digits for the
area, seperated by a - from the city digits:
243-0041
神奈川県茅ヶ崎市茅ヶ崎2-1-30-205
County is 県, City is 市.
After the CITY stands the area. Areas have a name (and a number in many
cases). Often the area name is equal to the city name, like in this
case (茅ヶ崎2). There is no real pattern in naming such areas. After the
area is a house block code (here: 1-30-205), which means block 1, house
number 30, room 205. The blocks are numbered at random, same goes with
the house numbers within those blocks. I'm not sure what's the maximum
number of levels of this numbering scheme used.
-------------
Hong Kong:
no ZIP codes, the rest of the addresses follow a pattern similar to the
UK and Canada
Example:
Rm. 5, 48 Fl., Tower E, Kings Plaza,
No. 38 Kings Road East
Kowloon
Chinese version would be all in one line without spaces, like in China
and Taiwan, but with different characters (vocabulary is different).
Romanization system is standardized.
------------------------
> > Is roadster UTF-8 safe yet? :p
> The GUI is. I'm not so sure about the database. :)
This can be solved easily if we stick to MySQL... just force UTF-8 as
encoding.
> > the huge overhead of the embedded mysql server...
> What overhead are you referring to? Memory, disk space, disk access
> time, CPU time, or what?
Memory and diskspace...
I wonder if we can store the database in a binary form to save
diskspace... (currently the (incomplete) database of California takes
more than 500 MB on my system !)
> I do have some big problems with the way MySQL's Spatial Extensions
> work. The biggest problem is that we incur one disk seek for each
road
> segment read in. It's also a bit heavy on the on-disk storage size.
> I'd be happy to work with you to come up with a new storage scheme!
Let me take a look at the source code first... :)
Cheers
Arne
-----------------------------------
--
Arne Götje (高盛華) <arne at goetje-online.de>
PGP/GnuPG key: 1024D/685D1E8C
Fingerprint: 2056 F6B7 DEA8 B478 311F 1C34 6E9F D06E 685D 1E8C
Key available at wwwkeys.pgp.net. Encrypted e-mail preferred.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://lists.freedesktop.org/archives/roadster/attachments/20050801/94186077/attachment.pgp
More information about the roadster
mailing list