Difference between revisions of "Nominatim for Traffic Accident Database"

From stgo
Jump to: navigation, search
(Exporting Accident Data to CVS)
(Exporting Accident Data to CVS)
Line 509: Line 509:
  
 
I exported 3 databases:
 
I exported 3 databases:
* Accidentes - this DB had about 536k entries for all of Chile.
+
* Accidentes - this DB had about 536k entries for all of Chile (csv ~ 208 MB)
* Personas
+
* Personas (csv ~ 147 MB)
* Vehiculos
+
* Vehiculos (csv ~ 123 MB)
 
and
 
and
* Categoria Ingreso (in CLP per Comuna, year 200x)
+
* Categoria Ingreso (in CLP per Comuna, year 200x) (csv ~ 9KB)

Revision as of 13:21, 9 March 2015

>> return to Cedeus IDE


Objective

Develop a script for automated geo-referencing of Chiles transit accident database.

Tools

  • Nominatim
  • MS-Access / LibreOffice : data comes in an MS Access DB from Carabineros de Chile
  • iPython + Notebook

Installing Nominatim

Setup of Nominatim VM

Setup of VM based on Basic Ubuntu1404 VM

  1. copying basicubunutu1404.vdi file and renaming to nominatim.vdi
  2. creating the nominatim VM on CedeusDB (ip.18):
    • VBoxManage createvm --name nominatim --ostype Ubuntu_64 --register
    • VBoxManage modifyvm nominatim --memory 4096
    • VBoxManage modifyvm nominatim --cpus 4
    • VBoxManage modifyvm nominatim --nic1 nat
    • VBoxManage storagectl nominatim --name "SATA Controller" --add sata --controller IntelAhci
    • assign the (old) disk image a new uuid
      VBoxManage internalcommands sethduuid nominatim.vdi
    • attach the (old) disk image:
      VBoxManage storageattach "nominatim" --storagectl "SATA Controller" --port 0 --device 0 --type hdd --medium nominatim.vdi
    • VBoxManage storagectl nominatim --name "IDE Controller" --add ide --controller PIIX4
    • set the nat rules (ports):
      VBoxManage modifyvm nominatim --natpf1 "ssh,tcp,,20022,,22"
      VBoxManage modifyvm nominatim --natpf1 "apache,tcp,,20080,,80"
      VBoxManage modifyvm nominatim --natpf1 "pgsql,tcp,,20432,,5432"
  3. optional - set VRDE port:
    • VBoxManage modifyvm nominatim --vrdeport 7761 (Note: 3389 is the default port anyway)
  4. check all settings with VBoxManage showvminfo nominatim
  5. start the VM
    VBoxHeadless -s nominatim --vrde on &
    the VM will listen on port 7761 - connect with GTKvncViewer to check
    • check if ssh connection works
  6. change the computers name in /etc/hosts and /etc/hostname
  7. restarting Apache gives the error message: "apache2: Could not reliably determine the server's fully qualified domain name, using 127.0.1.1 ..."
    => solve this by adding ServerName nominatim as the last line in /etc/apache2/apache2.conf file

Resize the Nominatim VM

  • shut the VM down to resize the HDD if necessary
  • check the HDD size: VBoxManage showhdinfo nominatim.vdi returns:
UUID:           e148d758-00ac-4b60-a067-d5c97316069d
Parent UUID:    base</br>
State:          created
Type:           normal (base)
Location:       /home/ssteinig/nominatim.vdi
Storage format: VDI
Format variant: dynamic default
Capacity:       100000 MBytes
Size on disk:   6224 MBytes
In use by VMs:  nominatim (UUID: be795392-3786-4f3b-9714-a9445f91855d)

Setup of Nominatim itself

Install needed libraries

  • install PHP 5
  • install Postgres
  • install PEAR
  • install GCC, GIT, protbuf, wget, etc.
  • install Osmosis

=> see https://wiki.openstreetmap.org/wiki/Nominatim/Installation#Software

sudo apt-get install build-essential libxml2-dev libgeos-dev libpq-dev libbz2-dev libtool automake libproj-dev
sudo apt-get install gcc proj-bin libgeos-c1 git osmosis libgeos++-dev
sudo apt-get install php5 php-pear php5-pgsql php5-json php-db
instead of: >> sudo apt-get install postgresql postgis postgresql-contrib postgresql-9.1-postgis postgresql-server-dev-9.1
  use: >> sudo apt-get install postgresql postgis postgresql-contrib postgresql-9.3-postgis-2.1 postgresql-server-dev-9.3
sudo apt-get install libprotobuf-c0-dev protobuf-c-compiler

and perhaps

sudo apt-get install git autoconf-archive

Edit PostreSQL settings

perhaps edit Postgres settings (before data import) sudo nano /etc/postgresql/9.3/main/postgresql.conf

=> see https://wiki.openstreetmap.org/wiki/Nominatim/Installation#Software

My machine has only 4GB, so I changed:

  • shared_buffers = 128MB => 512MB
  • work_mem = 50MB (previously uncommented, and set to 1MB)
  • maintenance_work_mem = 2GB (previously commented, and set to 16MB) //=> This value should be reduced again afterwards to avoid swapping when autovacuum runs.
  • effective_cache_size = 2GB (previously commented, and set to 128MB)
  • synchronous_commit = off (previously commented, and set to on)
  • checkpoint_segments = 100 (previously commented, and set to 3)
  • checkpoint_timeout = 10min (previously commented, and set to 5min)
  • checkpoint_completion_target = 0.9 (previously commented, and set to 0.5)

and for initial import only:

  • fsync = off (previously commented, and set to 'on')
  • full_page_writes = off (previously commented, and set to 'on')
=> switch both values back to 'on' after import, to avoid database corruption.

Note, I did a restart of the postgresql service after to have the settings applied.

First Nominatim Installation

Get the source code

I used the last stable release 2.3.1:

wget http://www.nominatim.org/release/Nominatim-2.3.1.tar.bz2
tar xvf Nominatim-2.3.1.tar.bz2

and rename the folder

mv Nominatim-2.3.1 Nominatim
Compiling

now compile the software

cd Nominatim
./configure
make

Note, the warning about missing lua libraries can be ignored. Nominatim does not make use of osm2pgsql's lua extension.

Configuration

create the configuration file local.php in the sub folder settings with the following content:

<?php
  // Paths
  @define('CONST_Postgresql_Version', '9.3');
  @define('CONST_Postgis_Version', '2.1');
  // Website settings
  //@define('CONST_Website_BaseURL', 'http://mysite/nominatim/');
  @define('CONST_Website_BaseURL', 'http://146.155.17.18:20080/nominatim/');

for further params see the file settings/settings.php.

Download (optional) data

Wikipedia rankings (downloading took me 20+ mins @PUC)

Wikipedia can be used as an optional auxiliary data source to help indicate the importance of osm features. Nominatim will work without this information but it will improve the quality of the results if this is installed. This data is available as a binary download.

wget --output-document=data/wikipedia_article.sql.bin http://www.nominatim.org/data/wikipedia_article.sql.bin
wget --output-document=data/wikipedia_redirect.sql.bin http://www.nominatim.org/data/wikipedia_redirect.sql.bin

(=> requires to be in the Nominatim folder). wikipedia_article.sql.bin had a size of 1.2GB and redirect 240MB. Combined the 2 files are around 1.5GB and add around 30GB to the install size of nominatim. They also increase the install time by an hour or so.

Creating postgres accounts

Creating the importer account

The import needs to be done with a postgres superuser with the same name as the account doing the import.

sudo -u postgres createuser -s <your user name>

You should ensure that this user can log in to the database without requiring a password (e.g. using ident authentication). This is the default on most distributions. See trust authentication for more information. (sstein: it looks like it is not mean ident but trust identification... see below)

Some more PostgreSQL Access configs

Note: To have access from outside, I also changed in /etc/postgresql/9.3/main/postgresql.conf':

listen_adresses = '*'

And in /etc/postgresql/9.3/main/pg_hba.conf I did set

local   all             all                                     peer

to

local   all             all                                     ident

as well as:

host    all             all             127.0.0.1/32            md5

to

host    all             all       0.0.0.0       0.0.0.0         md5

(perhaps I also need to modify /etc/postgresql/9.3/main/pg_ident.conf at some point, if I want to map a system user name to a certain db user name)

Do a sudo service postgresql restart

Then I logged in and changed my ssteinig user password this way for the standard db postgres:

sudo -u ssteinig psql postgres
alter user ssteinig with password 'myNewPassWord';

Now I tested if I can connect to the DB:

  • connection from command line to DB postgres: psql -U ssteinig -d postgres -h localhost
=> This worked, but it asked me for a password. I thought this should not happen after setting local to ident?
However, when I changed from ident/md5 to trust, I could login without password:
# "local" is for Unix domain socket connections only
local   all             all                                      trust
# IPv4 local connections:
host    all             all        0.0.0.0     0.0.0.0           trust
# IPv6 local connections:
host    all             all             ::1/128                  trust
got this from here: http://suite.opengeo.org/4.1/dataadmin/pgGettingStarted/firstconnect.html
  • connect from outside (cedeusgis1) with pgAdmin (ssteinig, db: postgres, 20432)
    => This worked as well.
Create website user

Create the website user www-data as a PostgreSQL database role

createuser -SDR www-data

For the installation process, you must have this user. If you want to run the website under another user, see comment in section Set up the website. You must not run the import as user www-data or root.

Nominatim module reading permissions

Some Nominatim Postgres functions are implemented in the nominatim.so C module that was compiled in one of the earlier steps. In order for these functions to be successfully created, PostgreSQL server process has to be able to read the module file. Make sure that directory and file permissions allow the file to be read. For example, if you downloaded and compiled Nominatim in your home directory, you will need to issue the following commands:

chmod +x ~/Nominatim
chmod +x ~/Nominatim/module

Note, the instructions say chmod +x ~/src/Nominatim ...but this must be for the installation from git source?

Import and index OSM data

First download a Planet File or a planet extract, for example from Geofabrik. Using a file in PBF format is recommended. I downloaded the Luxemburg Dataset for testing and then Chile. I used a cache of 1.2GB

Now start the import:

./utils/setup.php --osm-file /home/ssteinig/luxembourg-2015_2mar.osm.pbf --all --osm2pgsql-cache 1200 2>&1 | tee setup.log

=> This will create a new DB "nominatim" (with PostGIS extension enabled automatically). Importing the wikipedia articles was most time consuming. Together approx. 1 hour.

ToDo afterwards : reverse data import settings for PostgreSQL (see above).

Add special phrases

Add country codes and country names to the search index:

./utils/specialphrases.php --countries > specialphrases_countries.sql
psql -d nominatim -f specialphrases_countries.sql

If you want to be able to search for special amenities like pubs in Dublin, you need to import special phrases from this wiki like this:

./utils/specialphrases.php --wiki-import > specialphrases.sql
psql -d nominatim -f specialphrases.sql

This may be repeated from time to time when there are changes in the wiki. There is no need to repeat it after each update.
If you do not need phrases for all languages, edit utils/specialphrases.php and delete unneeded languages at the beginning of the file.

Set up the website

The following instructions will make Nominatim available at http://localhost/nominatim - or - http://146.155.17.18:22022/nominatim

Create the directory for the website and make sure it is writable by you and readable by apache:

sudo mkdir -m 755 /var/www/html/nominatim
sudo chown <your username> /var/www/html/nominatim

Populate the website directory with the necessary symlinks:

./utils/setup.php --create-website /var/www/html/nominatim

You will need to make sure settings/local.php is configured with correct values for CONST_Website_BaseURL. e.g.

@define('CONST_Website_BaseURL', 'http://146.155.17.18.20080/nominatim');
//or: @define('CONST_Website_BaseURL', 'http://localhost/nominatim');

=> did this already above.

Setting up Apache

Make sure your Apache configuration (i.e. /etc/apache2/sites-enabled/000-default in a standard Ubuntu/Debian installation) contains the following settings for the directory:

<Directory "/var/www/nominatim/">
   Options FollowSymLinks MultiViews
   AddType text/html   .php     
</Directory>

So my 0000-default.conf file looks now like this:

<VirtualHost *:80 146.155.17.18:20022>
       # The ServerName directive sets the request scheme, hostname and port that
       # the server uses to identify itself. This is used when creating
       # redirection URLs. In the context of virtual hosts, the ServerName
       # specifies what hostname must appear in the request's Host: header to
       # match this virtual host. For the default virtual host (this file) this
       # value is not decisive as it is used as a last resort host regardless.
       # However, you must set it for any further virtual host explicitly.
       #ServerName www.example.com

       ServerAdmin webmaster@localhost
       DocumentRoot /var/www/html

       # Available loglevels: trace8, ..., trace1, debug, info, notice, warn,
       # error, crit, alert, emerg.
       # It is also possible to configure the loglevel for particular
       # modules, e.g.
       #LogLevel info ssl:warn

       ErrorLog ${APACHE_LOG_DIR}/error.log
       CustomLog ${APACHE_LOG_DIR}/access.log combined

       <Directory "/var/www/html/nominatim/">
          Options FollowSymLinks MultiViews
          AddType text/html   .php
       </Directory>

       # For most configuration files from conf-available/, which are
       # enabled or disabled at a global level, it is possible to
       # include a line for only one particular virtual host. For example the
       # following line enables the CGI configuration for this host only
       # after it has been globally disabled with "a2disconf".
       #Include conf-available/serve-cgi-bin.conf
</VirtualHost>

# vim: syntax=apache ts=4 sw=4 sts=4 sr noet


After making changes in the apache config restart apache.

sudo apache2ctl graceful

Note: The name of the website user is hard-coded into Nominatim. In most Linux distributions, apache will run as www-data, so this will work without any further modifications. If your web server runs under a different name (e.g. in Fedora and CentOS apache runs as user apache), simply alter the name of the www-data user in postgresql after the import has finished, e.g. psql -d nominatim -c 'ALTER USER "www-data" RENAME TO "apache".

Test Webpage

Now go to: http://146.155.17.18:20080/nominatim/ to see if there is an OSM map shown. Yey!

I tested the Luxemburg dataset with writing "Pfaffenthal" into the text field. Nominatim found two places with that name: (i) city district, and (ii) suburb.

Load Chile Dataset

The setup.php command does not work a second time and stops with "ERROR: database already exists (pgsql://@/nominatim)". So one can not load 2 country datasets this way, as described here: http://github.com/twain47/Nominatim/issues/224

An option to load a second country is using the update function (but this is very slow... maybe importing all of South America is even faster when deleting the db and using setup again). So, I download from http://download.geofabrik.de/south-america.html as xml/osm.bz2 file, since "update" requires xml.

add Chile via update:

./utils/update.php --import-file /home/ssteinig/chile-2014_5mar.osm.bz2 --osm2pgsql-cache 1200 2>&1 | tee setupchile.log

then run:

./utils/update.php --index

Perhaps use option --index-instances for indexing, for running with multiple threads. Doing the update with these two commands is really really slow: 15:25-20:15 (4.7h) + 3.5h for indexing - for a 120 MB osm.bz2 file.

Updates of the Database

Haven't done this so far, but see: http://wiki.openstreetmap.org/wiki/Nominatim/Installation#Updates

Using Nominatim

Geocoding

Geocoding can return none, one, or many results. In the examples below it is returned only one result. Example 3 shows output for 2 results.

Example 1

  • "135 pilkington avenue, birmingham"
  • result in xml format (can be also html|xml|json|jsonv2) : html will return the map-web page
  • with polygon geometry : polygon_geojson=1
  • with address details : Include a breakdown of the address into elements : addressdetails=1
http://nominatim.openstreetmap.org/search?q=135+pilkington+avenue,+birmingham&format=xml&polygon_geojson=1&addressdetails=1

=> For more search options, see http://wiki.openstreetmap.org/wiki/Nominatim
For instance, search can be restricted to a country with e.g. countrycodes=cl or countrycodes=de

Answer in XML

<searchresults 
  timestamp="Thu, 05 Mar 15 19:50:26 +0000" 
  attribution="Data © OpenStreetMap contributors, ODbL 1.0. http://www.openstreetmap.org/copyright" 
  querystring="135 pilkington avenue, birmingham" 
  polygon="true" 
  exclude_place_ids="73723099" 
  more_url="http://nominatim.openstreetmap.org/search?format=xml&exclude_place_ids=73723099&accept-language=en-US,en;q=0.5&polygon=1&addressdetails=1&q=135+pilkington+avenue%2C+birmingham">
  <place place_id="73723099" 
    osm_type="way" 
    osm_id="90394480" 
    place_rank="30" 
    boundingbox="52.5487473,52.5488481,-1.8165129,-1.8163463" 
    geojson="{"type":"Polygon","coordinates":[[[-1.8165129,52.5487566],[-1.8164912,52.548824],
       [-1.8164684,52.5488213],[-1.8164598,52.5488481],
       [-1.8163463,52.5488346],[-1.8163716,52.5487561],
       [-1.8164289,52.5487629],[-1.8164339,52.5487473],
       [-1.8165129,52.5487566]]]}" 
    lat="52.5487921" 
    lon="-1.8164307339635" 
    display_name="135, Pilkington Avenue, Castle Vale, Maney, Birmingham, West Midlands, England, B72 1LH, United Kingdom" 
    class="building" 
    type="yes" 
    importance="0.411">
    <house_number>135</house_number>
    <road>Pilkington Avenue</road>
    <suburb>Castle Vale</suburb>
    <hamlet>Maney</hamlet>
    <city>Birmingham</city>
    <state_district>West Midlands</state_district>
    <state>England</state>
    <postcode>B72 1LH</postcode>
    <country>United Kingdom</country>
    <country_code>gb</country_code>
  </place>
</searchresults>

Answer in JSON

[{"place_id":"73723099",
  "licence":"Data \u00a9 OpenStreetMap contributors, ODbL 1.0. http:\/\/www.openstreetmap.org\/copyright",
  "osm_type":"way",
  "osm_id":"90394480",
  "boundingbox":["52.5487473","52.5488481","-1.8165129","-1.8163463"],
  "lat":"52.5487921","lon":"-1.8164307339635",
  "display_name":"135, Pilkington Avenue, Castle Vale, Maney, Birmingham, West Midlands, England, B72 1LH, United Kingdom",
  "class":"building",
  "type":"yes",
  "importance":0.411,
  "address":{"house_number":"135",
             "road":"Pilkington Avenue",
             "suburb":"Castle Vale",
             "hamlet":"Maney",
             "city":"Birmingham",
             "state_district":"West Midlands",
             "state":"England",
             "postcode":"B72 1LH",
             "country":"United Kingdom",
             "country_code":"gb"},
  "geojson":{"type":"Polygon","coordinates":[[
   [-1.8165129,52.5487566],
   [-1.8164912,52.548824],
   [-1.8164684,52.5488213],
   [-1.8164598,52.5488481],
   [-1.8163463,52.5488346],
   [-1.8163716,52.5487561],
   [-1.8164289,52.5487629],
   [-1.8164339,52.5487473],
   [-1.8165129,52.5487566]]]
  }
}]

Example 2 for Chile

  • "Av. Holanda 2607, Providencia"
  • result in json format: format=json
  • restriction to Addresses in Chile: countrycodes=cl

with OSM Nominatim

http://nominatim.openstreetmap.org/search?q=av.+holanda+2607,+providencia&format=json&countrycodes=cl&polygon_geojson=1&addressdetails=1

with Own Nominatim

http://146.155.17.18:20080/nominatim//search?q=av.+holanda+2607,+providencia&format=json&countrycodes=cl&polygon_geojson=1&addressdetails=1

Answer in JSON:

[{"place_id":"142577360",
  "licence":"Data \u00a9 OpenStreetMap contributors, ODbL 1.0. http:\/\/www.openstreetmap.org\/copyright",
  "osm_type":"node",
  "osm_id":"1245479266",
  "boundingbox":["-33.442225304494","-33.442125304494","-70.597784810112","-70.597684810112"],
  "lat":"-33.4421753044944",
  "lon":"-70.5977348101124",
  "display_name":"2607, Avenida Holanda, Providencia, Provincia de Santiago, XIII Regi\u00f3n Metropolitana de Santiago, 7500000, Chile",
  "class":"place",
  "type":"house",
  "importance":0.511,
  "address":{"house_number":"2607",
             "road":"Avenida Holanda",
             "suburb":"Providencia",
             "city":"Providencia",
             "county":"Provincia de Santiago",
             "state":"XIII Regi\u00f3n Metropolitana de Santiago",
             "postcode":"7500000",
             "country":"Chile",
             "country_code":"cl"}
}]

Example 3 - Luxemburg with 2 Results

Request

http://146.155.17.18:20080/nominatim/search?q=Rue+de+Neudorf+27,+Luxembourg&format=json

Answer with 2 results

The 2 results have a distinct place-id and a different location, so they are indeed different (see the maps).

Note that the first result is in first place, due to its higher importance value.

[
 {"place_id":"212907",
  "licence":"Data \u00a9 OpenStreetMap contributors, ODbL 1.0. http:\/\/www.openstreetmap.org\/copyright",
  "osm_type":"way",
  "osm_id":"315314115",
  "boundingbox":["49.6162725","49.6163748","6.1485573","6.1487066"],
  "lat":"49.6163233","lon":"6.14863161551865",
  "display_name":"27, Rue de Neudorf, Neudorf, Neudorf-Weimershof, Luxembourg, Canton Luxembourg, District Luxembourg, 1257, Luxemburg",
  "class":"place",
  "type":"house",
  "importance":0.531},
 {"place_id":"102284",
  "licence":"Data \u00a9 OpenStreetMap contributors, ODbL 1.0. http:\/\/www.openstreetmap.org\/copyright",
  "osm_type":"node",
  "osm_id":"360916144",
  "boundingbox":["49.5341927","49.5342927","5.9915497","5.9916497"],
  "lat":"49.5342427","lon":"5.9915997",
  "display_name":"27, Rue de Neudorf, Mondercange, Canton Esch-sur-Alzette, District Luxembourg, 3937, Luxemburg",
  "class":"place",
  "type":"house",
  "importance":0.511}
]

Reverse Geocoding

TODO ... see https://wiki.openstreetmap.org/wiki/Nominatim#Reverse_Geocoding_.2F_Address_lookup

Exporting Accident Data to CVS

The transit accident data come in an MS Access Database.

Fortunately I had MS Access installed. In case not, there seems also to be a way to get it loaded into LibreOffice/OpenOffice. This is described here: http://askubuntu.com/questions/187389/is-it-possible-to-open-an-access-2010-database-file-without-using-wine-or-virtua/519571#519571

A guide on how to export from Access to CVS is here: http://support.spatialkey.com/export-data-from-database-to-csv-file/ In short the process is as follows:

  • select the table and chose "External Data" tab
  • chose export to "Text File" which will start a wizard
  • put a name, but change ending from "txt" to "csv"! (ending may need to be necessary to able to change the encoding to UTF-8 later), then click "ok" ... which brings us to a otpions panel.
  • set options:
    • delimiter ";"
    • first line with column names
    • no colons to indicate text/string fields
  • click on "Advanced..." button in the lower left and chose UTF-8 encoding for the export


I exported 3 databases:

  • Accidentes - this DB had about 536k entries for all of Chile (csv ~ 208 MB)
  • Personas (csv ~ 147 MB)
  • Vehiculos (csv ~ 123 MB)

and

  • Categoria Ingreso (in CLP per Comuna, year 200x) (csv ~ 9KB)