Export .osm from a PostgreSQL openstreetmap database
You have access to a PostgreSQL+PostGIS database which contains openstreetmap
data imported by
osm2pgsql
for rendering maps. How can you generate an extract
to use with other cartography tools ?
pgsql2osm is a python script that can generate the XML in .osm format
for all entities within given boundaries.
It attempts to preserve all data attributes from original osm data:
nodes, ways and relations and all their osm_ids, all tags, and
interreferences (references to the outside of the extract you choose will be broken,
unless you use --bbox='-180,-89.99,180,89.99' to select all available data).
The main usecase I have for this is to generate an .obf file using
osm2obf
for the Android
OsmAnd
app offline maps.
- 
was imported with osm2pgsql- in --slimmode (this allows incremental updates and is also leveraged bypgsql2osm.py)
- Recommended : with --hstorecontaining all remaining tags not saved as columns in thetags::hstore. If not, the produced.osmwill have incomplete tags (but will be geometrically sound)
- with --flat-nodes, and access to the--flat-nodes <FILE>cache binary file (for now, must be readwrite)
 
- in 
- 
The geometry tables are the first with names that end in _point,_line,_polygonlisted in the following query:
SELECT f_table_name,f_geometry_column,srid,pg_size_pretty(pg_table_size(f_table_name::text))
FROM geometry_columns WHERE srid!=0 ORDER BY pg_table_size(f_table_name::text) DESC LIMIT 10;
- Same for _waysand_relstables:
SELECT relname,relkind,pg_size_pretty(pg_table_size(relname::text)),
    (SELECT nspname FROM pg_namespace WHERE oid=relnamespace) AS namespace
FROM pg_class WHERE relkind IN ('r','m','v') AND relnamespace NOT IN (SELECT oid FROM pg_namespace
        WHERE nspname IN ('pg_toast','pg_catalog','information_schema'))
    AND (relname~'_rels' OR relname~'_ways')
ORDER BY pg_table_size(relname::text) DESC;
- XML streaming output, allows to only store compressed .osm.bz2 on disk and not waste RAM. Example:
pgsql2osm --dsn 'dbname=gis' --iso fr --output -|bzip2 > France.osm.bz2
- 
Attempt to lower RAM footprint with generators for database queries: streaming all the way from database to XML 
- 
Automatic detection of table names, referred to here as planet_osm_*, but they can also be named anything with the correct suffixes added byosm2pgsql:_point,_line,_polygon,_waysand_rels
- 
Automatic detection of the middle database format (legacy text[] or new jsonb, available in databases created by osm2pgsql>=1.9). WARNING: very old versions <1.3 may use a different middle database format, it was undocumented at that time. In that case it is recommended to re-import with a newosm2pgsqlversion.
- 
Automatic detection of planet_osm_point,planet_osm_lineandplanet_osm_polygoncolumns: a specific.styleat import is not required- Warning : The column names you choose will be the keys in the .osmoutput regardless of the original tag's key. When not using default tag transforms, a.osmfile will be produced that may not have the tags like they are expected by other software.
- Note : key-values in tagsoverride column key-values
 
- Warning : The column names you choose will be the keys in the 
- 
Bounds specifiable as either : - iso country or region code --iso de-byfor Germany/Bayern- grep -i MY regions.csvcan show you recognized iso codes for Malaysia and subregions
 
- osm relation id --osm-rel-id N, for any relation in the database.
- geojson polygon input file --geojson file.geojson
- bounding box --bbox='<lon_from>,<lat_from>,<lon_to>,<lat_to>'
 
- iso country or region code 
- 
Bounds intersection can also be specified as one non- bboxof the above and a--bbox. The extracted region will then be the intersection (logical AND) of the shape with the bbox.
- 
Anti-Feature: unsorted ids, see Unsorted ids 
| Extract | File size | time | elements count | RAM | 
|---|---|---|---|---|
| Austria-left15_0.osm.bz2 --bbox='-180,-89,15.01,89' | 681MB | real 2h18 user 37min sys 42min | 46.7M n 4.54M w 75K r | 10.6GB | 
| Austria-right15_0.osm.bz2 --bbox='14.99,-89,180,89' | 502MB | real 1h23 user 23min sys 17min | 33.1M n 4.01M w 67K r | - | 
| Baden-württemberg.osm.bz2 --bbox='-180,-89,180,48.31' | 274MB | real 49min user 16min sys 5min | 17.2M n 2.53M w 32.6K r | - | 
| Baden-württemberg.osm.bz2 --bbox='-180,48.29,180,89' | 560MB | real 2h08 user 34min sys 21min | 34.0M n 5.84M w 63K r | 6.1GB | 
| Austria-left14.25.osm.bz2 --bbox='-180,-89,14.26,89' | 509MB | real 1h21 user 21min sys 22min | 34.6M n 3.31M w 53K r | 5.7GB | 
| Austria-right14.25.osm.bz2 --bbox='14.24,-89,180,89' | 676MB | real 2h11 user 30min sys 46min | 45.4M n 5.27M w 89K r | 10.7GB | 
| Switzerland.osm.bz2 | 692MB | real 2h35 user 39min sys 45min | 47.1M n 5.3M w 94.5K r | - | 
| Germany-bavaria.osm.bz2 | 1.1GB | real 5h17 user 1h06 sys 1h35 | 70.1M n 10.5M w 95.8K r | 12.1GB | 
Note : The File size is for the compressed extract, bzip2 default settings used unless otherwise noted
I just run the following to get the max RAM used by that process
maxmemkb=0;while sleep 1;do
  memkb="$(ps -Ao rss,cmd|grep python3|grep pgsql2osm|cut -d' ' -f1)";
  if [ "${maxmemkb}" -lt "${memkb}" ];then maxmemkb="${memkb}";fi;
  printf '\033[2K\r%s max=%s' "${memkb}" "${maxmemkb}";done
The bounding region will include every entity and all its dependent entities. This also means some entities outside the boundary will be inlcuded, as long as they have any reference to entities within the boundary. No geometry clipping is taking place, entities are exported as-is and can extend hundreds of kilometers beyond the boundary if that is a continuous entity (eg a long river).
The output ids are streamed in the order they come from the database, and for speed reasons
this is unsorted. If you need a sorted .osm, use
osmium after exporting:
osmium sort in.osm.bz2 --output=out.osm.bz2
The output data may have some missing tags, because the database import
is not a lossless operation: importing a .osm.pbf to database, then exporting
it to .osm and then converting it back to .osm.pbf will most probably
(on real world data) produce a different file.
pip install pgsql2osm
- Need to compile
osm2pgsql(instructions)
- then copy get_lonlat.cpptoosm2pgsql/src/get_lonlat.cpp
- and in the osm2pgsql/build/directory:
make
g++ -std=c++17 ../src/get_lonlat.cpp ../src/node-persistent-cache.cpp \
-I ../contrib/fmt/include/ ../build/src/libosm2pgsql.a -Wall -g -o get_lonlat
- Test:
./get_lonlat /path/to/planet.bin.nodes <<< 3546766428
- Should output 39.188565;49.885284;3546766428if that node was covered by database import
pgsql2osm /path/to/get_lonlat /path/to/planet.bin.nodes --help
For an overview of options
Made easier with the ModuleSettings class, example:
import pgsql2osm
m=pgsql2osm.settings.ModuleSettings(
  bounds_rel_id=osm_rel_id,
  get_lonlat_binary='/path/to/get_lonlat',
  nodes_file='/path/to/planet.bin.nodes',
  # no need for the --dsn if access already from somewhere else
  access=already_previously_obtained_dbaccess)
with bz2.open('test.osm.bz2','wb') as f:
  m.out_file=f #set the output file object
  m.main()
Note: will still print progress reports to stderr
- 
osm2pgsql's--slimmode means: has the "middle" layout tables containing all the raw osm references betweenways->nodesandrels->nodes,ways,rels, more specifically:
- 
The "middle" database tables have names ending in _waysand_rels(osm2pgsqldefaults), and are the biggest with that suffix in the database, by used space
- 
The *_waystable has columns- id::bigintfor the osm ID,
- nodes::bigint[]for a list of the way's child nodes,
- and tags::text[]for that way's tags in hstore format,
- with tags::jsonbin the case of the new jsonb format.
 
- 
The *_relstable has columns- id::bigintfor the osm ID,
- in the case of legacy text[] format:
- parts::bigint[]any ids of (mixed) nodes, ways or relations that are children of that relation,
- tags::text[]for the relation's tags in hstore format,
- members::text[]that more precisely describes the children, example- {'n123','admin_centre','w345','outer','w567','inner'}having the node- 123as a role- admin_centreand the ways- 345and- 567, as roles- outerand- innerboundaries respectively
 
- in the case of new jsonb format:
- members::jsonba list of json objects describing the members, the example continued- [{"ref":"123","role":"admin_centre","type":"N"},{"ref":"345","role":"outer","type":"W"}, {"ref":"567","role":"inner","type":"W"}],
- and tags::jsonbfor the relation's tags in{"key1":"value1","key2":"value2"}format,
 
 
- 
Access needed: SELECTgranted on the above mentioned tables and also (for the autodetections of table names and column names)pg_attribute,pg_tables,pg_type,pg_class, andgeometry_columns.
To generate regions.csv, run in a planet database:
psql -d gis -p 5432 --csv -c "select -osm_id as osm_id,coalesce(tags->'name:en',name) as name,
    tags->'ISO3166-1' as iso_country,tags->'border_type' as border_type,
    tags->'ISO3166-2' as iso_subcountry,admin_level,tags->'wikipedia' as wikipedia
  from planet_osm_polygon where osm_id<0 and admin_level in ('1','2','3','4')
    and boundary='administrative'" > regions.csv
Note : only admin_level<=4 is loaded into regions.csv.
Use a search on openstreetmap.org for smaller regions.
And then extract the relation id from the url:
https://www.openstreetmap.org/relation/51701 51701,
eg Switzerland