Tables

📊 Geospatial Performance Benchmark

This document tracks the performance and characteristics of different geospatial software stacks for the use cases defined in the project.

Use Case 1: Data Ingestion

Goal: To measure the one-time cost of preparing OSM data (.pbf) for analysis, either by loading it into a database or by creating a smaller subset.

Operation Performed	Technology	Test Dataset	Key Metrics	Notes / Qualitative Observations
Load entire `.pbf` into a database	`PostGIS + osm2pgsql`	`italy-latest.osm.pbf`	• Import Time: ~1840s (~31 min) • Disk Space: 19 GB	Represents the high upfront cost to create a fully indexed, queryable database. Ideal for many subsequent, fast queries.
Extract a regional `.pbf` from a national `.pbf`	`osmium-tool`	`italy-latest.osm.pbf`	• Execution Time: ~21s	Represents the cost of creating a smaller, more manageable file for file-based workflows. A pre-processing step for tools like PyOsmium.

(Note: For file-based tools like QuackOSM, the "ingestion" and "filtering" steps happen simultaneously and their performance is measured in Use Case 2)

Use Case 2: Data Filtering and Extraction

Goal: To measure the efficiency of extracting a specific subset of data (buildings in Milan) from a larger source, analyzing the impact of system caches.

Operation Performed	Technology	Run Type	Test Dataset	Key Metrics	Notes / Qualitative Observations
Extract buildings from `.pbf` and save to `.geoparquet`	`DuckDB + QuackOSM`	Cold Start	`italy-latest.osm.pbf`	• Execution Time: ~1.34s • Output: 6.97 MB	Performance is I/O-bound. This time reflects a true first-time run, limited by reading the large PBF from disk.
Extract buildings from `.pbf` and save to `.geoparquet`	`DuckDB + QuackOSM`	Hot Cache Avg.	`italy-latest.osm.pbf`	• Execution Time: ~0.22s • Runs: 99	Pure in-memory performance. The ~6x speedup* is due to the Operating System's file cache, proving the bottleneck is disk I/O, not CPU.*
Extract buildings from database	`PostGIS`	Cold Start	`planet_osm_polygon` table	• Execution Time: ~1.10s	Represents a first-time query, limited by reading data pages and indexes from disk into PostgreSQL's internal cache.
Extract buildings from database	`PostGIS`	Hot Cache Avg.	`planet_osm_polygon` table	• Execution Time: ~0.40s • Runs: 99	Pure database query performance. The ~2.75x speedup* is due to the database's internal caching mechanism.*
Read `.pbf`, build GeoDataFrame and filter by area	`PyOsmium + GeoPandas`	Cold Start	`lombardy-latest.osm.pbf`	• Execution Time: ~358.9s • Output: 7.61 MB	Performance is heavily CPU-bound. This time reflects the high computational cost of the two-pass parsing and geometry construction in Python.
Read `.pbf`, build GeoDataFrame and filter by area	`PyOsmium + GeoPandas`	Hot Cache Avg.	`lombardy-latest.osm.pbf`	• Execution Time: ~360.1s • Runs: 49	Nearly identical to the cold start, confirming the bottleneck is CPU processing, not disk I/O. The OS cache provides minimal benefit.

Use Case 3: Single Table Analysis (Geometric Properties)

Goal: Evaluate performance on calculations that do not require joins, but operate on geometries, scaling across different data sizes.

Operation Performed	Technology	Run Type	Test Dataset	Key Metrics	Notes / Qualitative Observations
3.1. Top 10 Largest Areas (Pinerolo)	`DuckDB Spatial`	Cold Start	`pinerolo_buildings.geoparquet`	• Execution Time: ~0.032s	Found 10 results. Minimal time difference from hot run.
3.1. Top 10 Largest Areas (Pinerolo)	`DuckDB Spatial`	Hot Cache Avg.	`pinerolo_buildings.geoparquet`	• Execution Time: ~0.017s • Runs: 99	The ~1.9x speedup* on the small dataset suggests a file metadata caching benefit.*
3.1. Top 10 Largest Areas (Pinerolo)	`PostGIS`	Cold Start	`pinerolo_buildings` table	• Execution Time: ~0.044s	Found 10 results. Performance comparable to other techs.
3.1. Top 10 Largest Areas (Pinerolo)	`PostGIS`	Hot Cache Avg.	`pinerolo_buildings` table	• Execution Time: ~0.017s • Runs: 99	Significant speedup from database caching on the small dataset.
3.1. Top 10 Largest Areas (Pinerolo)	`GeoPandas`	Cold Start	`pinerolo_buildings.geoparquet`	• Execution Time: ~0.071s	Found 10 results. Slower on cold start due to library overhead.
3.1. Top 10 Largest Areas (Pinerolo)	`GeoPandas`	Hot Cache Avg.	`pinerolo_buildings.geoparquet`	• Execution Time: ~0.053s • Runs: 99	The slowest of the three on this task, but still very fast.
3.2. Total Buffered Area (Pinerolo)	`DuckDB Spatial`	Cold Start	`pinerolo_buildings.geoparquet`	• Execution Time: ~0.122s	Result ~7.05M m². With the CRS fix, the result is numerically consistent with PostGIS/GeoPandas.
3.2. Total Buffered Area (Pinerolo)	`DuckDB Spatial`	Hot Cache Avg.	`pinerolo_buildings.geoparquet`	• Execution Time: ~0.120s • Runs: 99	Performance is consistently CPU-bound.
3.2. Total Buffered Area (Pinerolo)	`PostGIS`	Cold Start	`pinerolo_buildings` table	• Execution Time: ~0.144s	Result ~7.05M m². Performance is CPU-bound.
3.2. Total Buffered Area (Pinerolo)	`PostGIS`	Hot Cache Avg.	`pinerolo_buildings` table	• Execution Time: ~0.144s • Runs: 99	The result is consistent with GeoPandas, suggesting it is the correct baseline.
3.2. Total Buffered Area (Pinerolo)	`GeoPandas`	Cold Start	`pinerolo_buildings.geoparquet`	• Execution Time: ~0.174s	Result ~7.05M m². Performance is CPU-bound.
3.2. Total Buffered Area (Pinerolo)	`GeoPandas`	Hot Cache Avg.	`pinerolo_buildings.geoparquet`	• Execution Time: ~0.176s • Runs: 99	The result confirms the PostGIS calculation.
3.3. Restaurants NOT near Bus Stops (Pinerolo)	`DuckDB Spatial`	Cold Start	`pinerolo_restaurants.geoparquet` and `pinerolo_bus_stops.geoparquet`	• Execution Time: ~0.070s	Found 15 results. Numerically consistent with other tools.
3.3. Restaurants NOT near Bus Stops (Pinerolo)	`DuckDB Spatial`	Hot Cache Avg.	`pinerolo_restaurants.geoparquet` and `pinerolo_bus_stops.geoparquet`	• Execution Time: ~0.062s • Runs: 99	Performance is highly competitive.
3.3. Restaurants NOT near Bus Stops (Pinerolo)	`PostGIS`	Cold Start	`pinerolo_restaurants` and `pinerolo_bus_stops` tables	• Execution Time: ~0.003s	Found 15 results. Extremely fast due to database optimizations.
3.3. Restaurants NOT near Bus Stops (Pinerolo)	`PostGIS`	Hot Cache Avg.	`pinerolo_restaurants` and `pinerolo_bus_stops` tables	• Execution Time: ~0.0008s • Runs: 99	The fastest technology for this small-scale join.
3.3. Restaurants NOT near Bus Stops (Pinerolo)	`GeoPandas`	Cold Start	`pinerolo_restaurants.geoparquet` and `pinerolo_bus_stops.geoparquet`	• Execution Time: ~0.0045s	Found 15 results. Result now matches PostGIS and DuckDB, using the more accurate `buffer+sjoin` method.
3.3. Restaurants NOT near Bus Stops (Pinerolo)	`GeoPandas`	Hot Cache Avg.	`pinerolo_restaurants.geoparquet` and `pinerolo_bus_stops.geoparquet`	• Execution Time: ~0.0054s • Runs: 99	Achieves numerical consistency with a minimal performance cost on small data.
3.1. Top 10 Largest Areas (Milan)	`DuckDB Spatial`	Cold Start	`milan_buildings.geoparquet`	• Execution Time: ~0.199s	Found 10 results. Operation is now CPU-bound* on a larger dataset.*
3.1. Top 10 Largest Areas (Milan)	`DuckDB Spatial`	Hot Cache Avg.	`milan_buildings.geoparquet`	• Execution Time: ~0.188s • Runs: 99	Minimal speedup, as the `ST_Area` calculation is the dominant cost.
3.1. Top 10 Largest Areas (Milan)	`PostGIS`	Cold Start	`milan_buildings` table	• Execution Time: ~0.214s	Found 10 results. Performance is nearly identical to DuckDB.
3.1. Top 10 Largest Areas (Milan)	`PostGIS`	Hot Cache Avg.	`milan_buildings` table	• Execution Time: ~0.196s • Runs: 99	Similar CPU-bound* behavior.*
3.1. Top 10 Largest Areas (Milan)	`GeoPandas`	Cold Start	`milan_buildings.geoparquet`	• Execution Time: ~0.835s	Found 10 results. Significantly slower than database approaches on a medium dataset.
3.1. Top 10 Largest Areas (Milan)	`GeoPandas`	Hot Cache Avg.	`milan_buildings.geoparquet`	• Execution Time: ~0.869s • Runs: 99	The performance gap widens on larger data.
3.2. Total Buffered Area (Milan)	`DuckDB Spatial`	Cold Start	`milan_buildings.geoparquet`	• Execution Time: ~2.10s	Result ~111.4M m². Numerically Correct and consistent with other tools.
3.2. Total Buffered Area (Milan)	`DuckDB Spatial`	Hot Cache Avg.	`milan_buildings.geoparquet`	• Execution Time: ~2.08s • Runs: 99	Performance is faster than GeoPandas, slightly slower than PostGIS.
3.2. Total Buffered Area (Milan)	`PostGIS`	Cold Start	`milan_buildings` table	• Execution Time: ~1.37s	Result ~111.5M m². Fastest tech for this heavy computation.
3.2. Total Buffered Area (Milan)	`PostGIS`	Hot Cache Avg.	`milan_buildings` table	• Execution Time: ~1.34s • Runs: 99	Consistently CPU-bound* behavior.*
3.2. Total Buffered Area (Milan)	`GeoPandas`	Cold Start	`milan_buildings.geoparquet`	• Execution Time: ~2.99s	Result ~111.5M m². Slower than the databases, but confirms the PostGIS result.
3.2. Total Buffered Area (Milan)	`GeoPandas`	Hot Cache Avg.	`milan_buildings.geoparquet`	• Execution Time: ~2.93s • Runs: 99	Consistently CPU-bound.
3.3. Restaurants NOT near Bus Stops (Milan)	`DuckDB Spatial`	Cold Start	`milan_restaurants.geoparquet` and `milan_bus_stops.geoparquet`	• Execution Time: ~0.087s	Found 2135 results. Numerically consistent.
3.3. Restaurants NOT near Bus Stops (Milan)	`DuckDB Spatial`	Hot Cache Avg.	`milan_restaurants.geoparquet` and `milan_bus_stops.geoparquet`	• Execution Time: ~0.077s • Runs: 99	Significantly faster than PostGIS for this query type.
3.3. Restaurants NOT near Bus Stops (Milan)	`PostGIS`	Cold Start	`milan_restaurants` and `milan_bus_stops` tables	• Execution Time: ~10.53s	Found 2135 results. Very slow on this operation.
3.3. Restaurants NOT near Bus Stops (Milan)	`PostGIS`	Hot Cache Avg.	`milan_restaurants` and `milan_bus_stops` tables	• Execution Time: ~10.49s • Runs: 99	Confirms poor performance on this type of spatial anti-join.
3.3. Restaurants NOT near Bus Stops (Milan)	`GeoPandas`	Cold Start	`milan_restaurants.geoparquet` and `milan_bus_stops.geoparquet`	• Execution Time: ~0.049s	Found 2136 results. Numerically consistent with PostGIS/DuckDB.
3.3. Restaurants NOT near Bus Stops (Milan)	`GeoPandas`	Hot Cache Avg.	`milan_restaurants.geoparquet` and `milan_bus_stops.geoparquet`	• Execution Time: ~0.048s • Runs: 99	Still ~220x faster* than PostGIS. The performance cost for accuracy is small compared to the gain in reliability.*
3.1. Top 10 Largest Areas (Rome)	`DuckDB Spatial`	Cold Start	`rome_buildings.geoparquet`	• Execution Time: ~0.52s	Found 10 results. Performance scales predictably.
3.1. Top 10 Largest Areas (Rome)	`DuckDB Spatial`	Hot Cache Avg.	`rome_buildings.geoparquet`	• Execution Time: ~0.51s • Runs: 99	Consistently CPU-bound* on the largest dataset.*
3.1. Top 10 Largest Areas (Rome)	`PostGIS`	Cold Start	`rome_buildings` table	• Execution Time: ~0.40s	PostGIS becomes faster than DuckDB on the largest dataset for this query.
3.1. Top 10 Largest Areas (Rome)	`PostGIS`	Hot Cache Avg.	`rome_buildings` table	• Execution Time: ~0.38s • Runs: 99	Demonstrates better scaling than DuckDB for this operation.
3.1. Top 10 Largest Areas (Rome)	`GeoPandas`	Cold Start	`rome_buildings.geoparquet`	• Execution Time: ~2.52s	The performance gap between GeoPandas and the databases is largest on this dataset.
3.1. Top 10 Largest Areas (Rome)	`GeoPandas`	Hot Cache Avg.	`rome_buildings.geoparquet`	• Execution Time: ~2.63s • Runs: 99	Confirms that for pure computation on large in-memory data, database engines are more optimized.
3.2. Total Buffered Area (Rome)	`DuckDB Spatial`	Cold Start	`rome_buildings.geoparquet`	• Execution Time: ~6.48s	Result ~298.9M m². Numerically Correct and consistent.
3.2. Total Buffered Area (Rome)	`DuckDB Spatial`	Hot Cache Avg.	`rome_buildings.geoparquet`	• Execution Time: ~6.39s • Runs: 99	Slower than PostGIS, but faster than GeoPandas on the largest dataset.
3.2. Total Buffered Area (Rome)	`PostGIS`	Cold Start	`rome_buildings` table	• Execution Time: ~2.80s	Result ~299.2M m². PostGIS is over 2x faster than DuckDB on this heavy computation.
3.2. Total Buffered Area (Rome)	`PostGIS`	Hot Cache Avg.	`rome_buildings` table	• Execution Time: ~2.78s • Runs: 99	Confirms superior performance for complex, single-table aggregations on large data.
3.2. Total Buffered Area (Rome)	`GeoPandas`	Cold Start	`rome_buildings.geoparquet`	• Execution Time: ~9.38s	Result ~299.2M m². Numerically consistent with PostGIS but the slowest tech.
3.2. Total Buffered Area (Rome)	`GeoPandas`	Hot Cache Avg.	`rome_buildings.geoparquet`	• Execution Time: ~9.14s • Runs: 99	The performance gap widens significantly on the largest dataset.
3.3. Restaurants NOT near Bus Stops (Rome)	`DuckDB Spatial`	Cold Start	`rome_restaurants.geoparquet` and `rome_bus_stops.geoparquet`	• Execution Time: ~0.090s	Found 2298 results. Numerically consistent.
3.3. Restaurants NOT near Bus Stops (Rome)	`DuckDB Spatial`	Hot Cache Avg.	`rome_restaurants.geoparquet` and `rome_bus_stops.geoparquet`	• Execution Time: ~0.081s • Runs: 99	Dramatically faster than PostGIS for this large join.
3.3. Restaurants NOT near Bus Stops (Rome)	`PostGIS`	Cold Start	`rome_restaurants` and `rome_bus_stops` tables	• Execution Time: ~40.1s	Found 2298 results. Extremely slow on the largest dataset.
3.3. Restaurants NOT near Bus Stops (Rome)	`PostGIS`	Hot Cache Avg.	`rome_restaurants` and `rome_bus_stops` tables	• Execution Time: ~40.3s • Runs: 99	Performance of the join is significantly slower than DuckDB on the largest dataset.
3.3. Restaurants NOT near Bus Stops (Rome)	`GeoPandas`	Cold Start	`rome_restaurants.geoparquet` and `rome_bus_stops.geoparquet`	• Execution Time: ~0.133s	Found 2299 results. Numerically Correct. The `buffer+sjoin` method is more computationally expensive.
3.3. Restaurants NOT near Bus Stops (Rome)	`GeoPandas`	Hot Cache Avg.	`rome_restaurants.geoparquet` and `rome_bus_stops.geoparquet`	• Execution Time: ~0.130s • Runs: 99	Highlights the trade-off: slower than the old method, but now correct and still ~310x faster* than PostGIS.*

Use Case 4: Complex Spatial Join

Goal: To test performance on computationally intensive join operations on large datasets.

Operation Performed	Technology	Run Type	Test Dataset	Key Metrics	Notes / Qualitative Observations
4.1. Restaurants per Neighborhood (Pinerolo)	`DuckDB Spatial`	Cold Start	`pinerolo_neighborhoods.geoparquet` and `pinerolo_restaurants.geoparquet`	• Execution Time: ~0.033s • Output: 0.13 MB	Found 13 results, numerically consistent with other tools.
4.1. Restaurants per Neighborhood (Pinerolo)	`DuckDB Spatial`	Hot Cache Avg.	`pinerolo_neighborhoods.geoparquet` and `pinerolo_restaurants.geoparquet`	• Execution Time: ~0.004s • Runs: 99	Extremely fast on hot runs.
4.1. Restaurants per Neighborhood (Pinerolo)	`PostGIS`	Cold Start	`pinerolo_neighborhoods` and `pinerolo_restaurants` tables	• Execution Time: ~0.041s	Found 13 results, numerically consistent with others.
4.1. Restaurants per Neighborhood (Pinerolo)	`PostGIS`	Hot Cache Avg.	`pinerolo_neighborhoods` and `pinerolo_restaurants` tables	• Execution Time: ~0.015s • Runs: 99	Good speedup from database caching.
4.1. Restaurants per Neighborhood (Pinerolo)	`GeoPandas`	Cold Start	`pinerolo_neighborhoods.geoparquet` and `pinerolo_restaurants.geoparquet`	• Execution Time: ~0.006s • Output: 0.12 MB	Found 13 results. The fastest technology for this task on a small dataset.
4.1. Restaurants per Neighborhood (Pinerolo)	`GeoPandas`	Hot Cache Avg.	`pinerolo_neighborhoods.geoparquet` and `pinerolo_restaurants.geoparquet`	• Execution Time: ~0.005s • Runs: 99	Confirms its status as the top performer for this operation.
4.2. Tree Count and Length Sum of Residential Roads Near Hospitals (Pinerolo)	`DuckDB Spatial`	Cold Start	`pinerolo_hospitals.geoparquet`, `pinerolo_residential_streets.geoparquet` and `pinerolo_trees.geoparquet`	• Execution Time: ~0.178s	Result: 0 trees, ~5.0k meters. Numerically Correct* and consistent with PostGIS/GeoPandas.*
4.2. Tree Count and Length Sum of Residential Roads Near Hospitals (Pinerolo)	`DuckDB Spatial`	Hot Cache Avg.	`pinerolo_hospitals.geoparquet`, `pinerolo_residential_streets.geoparquet` and `pinerolo_trees.geoparquet`	• Execution Time: ~0.118s • Runs: 99	Slower than GeoPandas, but shows competitive performance.
4.2. Tree Count and Length Sum of Residential Roads Near Hospitals (Pinerolo)	`PostGIS`	Cold Start	`pinerolo_hospitals`, `pinerolo_residential_streets` and `pinerolo_trees` tables	• Execution Time: ~0.099s	Result: 0 trees, ~5.0k meters. Numerical result is the baseline.
4.2. Tree Count and Length Sum of Residential Roads Near Hospitals (Pinerolo)	`PostGIS`	Hot Cache Avg.	`pinerolo_hospitals`, `pinerolo_residential_streets` and `pinerolo_trees` tables	• Execution Time: ~0.079s • Runs: 99	The slowest of the three on the small dataset for this query type.
4.2. Tree Count and Length Sum of Residential Roads Near Hospitals (Pinerolo)	`GeoPandas`	Cold Start	`pinerolo_hospitals.geoparquet`, `pinerolo_residential_streets.geoparquet` and `pinerolo_trees.geoparquet`	• Execution Time: ~0.015s	Result: 0 trees, ~5.0k meters. Numerically consistent with PostGIS.
4.2. Tree Count and Length Sum of Residential Roads Near Hospitals (Pinerolo)	`GeoPandas`	Hot Cache Avg.	`pinerolo_hospitals.geoparquet`, `pinerolo_residential_streets.geoparquet` and `pinerolo_trees.geoparquet`	• Execution Time: ~0.008s • Runs: 99	Significantly faster than the databases due to `sjoin_nearest`.
4.3. Area Not Covered by Parks (Pinerolo)	`DuckDB Spatial`	Cold Start	`pinerolo_parks.geoparquet` and `osmnx` boundary	• Execution Time: ~0.033s	Result 49.97 km². Numerically Correct. With the CRS fix, the result is now perfectly aligned.
4.3. Area Not Covered by Parks (Pinerolo)	`DuckDB Spatial`	Hot Cache Avg.	`pinerolo_parks.geoparquet` and `osmnx` boundary	• Execution Time: ~0.025s • Runs: 99	Performance is competitive and the task is CPU-bound.
4.3. Area Not Covered by Parks (Pinerolo)	`PostGIS`	Cold Start	`pinerolo_parks` table and `osmnx` boundary	• Execution Time: ~0.021s	Result 49.97 km². Numerically Correct. PostGIS handles the data correctly.
4.3. Area Not Covered by Parks (Pinerolo)	`PostGIS`	Hot Cache Avg.	`pinerolo_parks` table and `osmnx` boundary	• Execution Time: ~0.006s • Runs: 99	Very fast and numerically robust. ~3.5x* speedup from caching.*
4.3. Area Not Covered by Parks (Pinerolo)	`GeoPandas`	Cold Start	`pinerolo_parks.geoparquet` and `osmnx` boundary	• Execution Time: ~0.005s	Result 49.97 km². Numerically Correct, confirming the PostGIS/GEOS baseline.
4.3. Area Not Covered by Parks (Pinerolo)	`GeoPandas`	Hot Cache Avg.	`pinerolo_parks.geoparquet` and `osmnx` boundary	• Execution Time: ~0.005s • Runs: 99	The fastest of the three for this task and numerically robust.
4.1. Restaurants per Neighborhood (Milan)	`DuckDB Spatial`	Cold Start	`milan_neighborhoods.geoparquet` and `milan_restaurants.geoparquet`	• Execution Time: ~0.119s • Output: 0.29 MB	Found 34 results, consistent with other tools.
4.1. Restaurants per Neighborhood (Milan)	`DuckDB Spatial`	Hot Cache Avg.	`milan_neighborhoods.geoparquet` and `milan_restaurants.geoparquet`	• Execution Time: ~0.097s • Runs: 99	Slower than PostGIS on this medium-scale join.
4.1. Restaurants per Neighborhood (Milan)	`PostGIS`	Cold Start	`milan_neighborhoods` and `milan_restaurants` tables	• Execution Time: ~0.220s	Found 34 results. Faster than DuckDB on the cold run.
4.1. Restaurants per Neighborhood (Milan)	`PostGIS`	Hot Cache Avg.	`milan_neighborhoods` and `milan_restaurants` tables	• Execution Time: ~0.201s • Runs: 99	Faster than DuckDB, but significantly slower than GeoPandas.
4.1. Restaurants per Neighborhood (Milan)	`GeoPandas`	Cold Start	`milan_neighborhoods.geoparquet` and `milan_restaurants.geoparquet`	• Execution Time: ~0.010s • Output: 0.28 MB	Found 34 results. Extraordinarily fast, showing the power of in-memory operations.
4.1. Restaurants per Neighborhood (Milan)	`GeoPandas`	Hot Cache Avg.	`milan_neighborhoods.geoparquet` and `milan_restaurants.geoparquet`	• Execution Time: ~0.009s • Runs: 99	Over 20x faster* than PostGIS.*
4.2. Tree Count and Length Sum of Residential Roads Near Hospitals (Milan)	`DuckDB Spatial`	Cold Start	`milan_hospitals.geoparquet`, `milan_residential_streets.geoparquet` and `milan_trees.geoparquet`	• Execution Time: ~0.193s	Result: 234 trees, ~25.0k meters. Numerically Correct* and consistent with PostGIS/GeoPandas.*
4.2. Tree Count and Length Sum of Residential Roads Near Hospitals (Milan)	`DuckDB Spatial`	Hot Cache Avg.	`milan_hospitals.geoparquet`, `milan_residential_streets.geoparquet` and `milan_trees.geoparquet`	• Execution Time: ~0.155s • Runs: 99	Significantly faster than PostGIS, but much slower than GeoPandas.
4.2. Tree Count and Length Sum of Residential Roads Near Hospitals (Milan)	`PostGIS`	Cold Start	`milan_hospitals`, `milan_residential_streets` and `milan_trees` tables	• Execution Time: ~34.16s	Result: 234 trees, ~25.0k meters. The numerical baseline.
4.2. Tree Count and Length Sum of Residential Roads Near Hospitals (Milan)	`PostGIS`	Hot Cache Avg.	`milan_hospitals`, `milan_residential_streets` and `milan_trees` tables	• Execution Time: ~33.11s • Runs: 99	The slowest technology by a large margin for this query type.
4.2. Tree Count and Length Sum of Residential Roads Near Hospitals (Milan)	`GeoPandas`	Cold Start	`milan_hospitals.geoparquet`, `milan_residential_streets.geoparquet` and `milan_trees.geoparquet`	• Execution Time: ~0.079s	Result: 234 trees, ~25.1k meters. Numerically consistent with PostGIS.
4.2. Tree Count and Length Sum of Residential Roads Near Hospitals (Milan)	`GeoPandas`	Hot Cache Avg.	`milan_hospitals.geoparquet`, `milan_residential_streets.geoparquet` and `milan_trees.geoparquet`	• Execution Time: ~0.082s • Runs: 99	Significantly faster than the databases. `sjoin_nearest` is a clear winner.
4.3. Area Not Covered by Parks (Milan)	`DuckDB Spatial`	Cold Start	`milan_parks.geoparquet` and `osmnx` boundary	• Execution Time: ~0.209s	Result 162.98 km². Numerically Correct. A massive improvement in performance and correctness.
4.3. Area Not Covered by Parks (Milan)	`DuckDB Spatial`	Hot Cache Avg.	`milan_parks.geoparquet` and `osmnx` boundary	• Execution Time: ~0.198s • Runs: 99	Performance is now highly competitive with PostGIS. The task is CPU-bound.
4.3. Area Not Covered by Parks (Milan)	`PostGIS`	Cold Start	`milan_parks` table and `osmnx` boundary	• Execution Time: ~0.142s	Result 162.98 km². Numerically Correct. PostGIS is the fastest for this operation.
4.3. Area Not Covered by Parks (Milan)	`PostGIS`	Hot Cache Avg.	`milan_parks` table and `osmnx` boundary	• Execution Time: ~0.138s • Runs: 99	Shows extreme efficiency and numerical robustness for complex geometry operations.
4.3. Area Not Covered by Parks (Milan)	`GeoPandas`	Cold Start	`milan_parks.geoparquet` and `osmnx` boundary	• Execution Time: ~0.430s	Result 162.98 km². Numerically Correct. Slower than PostGIS but confirms the robustness of the GEOS engine.
4.3. Area Not Covered by Parks (Milan)	`GeoPandas`	Hot Cache Avg.	`milan_parks.geoparquet` and `osmnx` boundary	• Execution Time: ~0.423s • Runs: 99	Confirms robustness but is slower than the databases for this task.
4.1. Restaurants per Neighborhood (Rome)	`DuckDB Spatial`	Cold Start	`rome_neighborhoods.geoparquet` and `rome_restaurants.geoparquet`	• Execution Time: ~4.97s • Output: 2.23 MB	Found 52 results, consistent with other tools.
4.1. Restaurants per Neighborhood (Rome)	`DuckDB Spatial`	Hot Cache Avg.	`rome_neighborhoods.geoparquet` and `rome_restaurants.geoparquet`	• Execution Time: ~4.36s • Runs: 99	Faster than PostGIS on the largest dataset.
4.1. Restaurants per Neighborhood (Rome)	`PostGIS`	Cold Start	`rome_neighborhoods` and `rome_restaurants` tables	• Execution Time: ~6.68s	Found 52 results. Slower than DuckDB on the largest dataset.
4.1. Restaurants per Neighborhood (Rome)	`PostGIS`	Hot Cache Avg.	`rome_neighborhoods` and `rome_restaurants` tables	• Execution Time: ~5.32s • Runs: 99	Caching provides a noticeable benefit.
4.1. Restaurants per Neighborhood (Rome)	`GeoPandas`	Cold Start	`rome_neighborhoods.geoparquet` and `rome_restaurants.geoparquet`	• Execution Time: ~0.034s • Output: 2.23 MB	Found 52 results. The performance gap is enormous on the largest dataset.
4.1. Restaurants per Neighborhood (Rome)	`GeoPandas`	Hot Cache Avg.	`rome_neighborhoods.geoparquet` and `rome_restaurants.geoparquet`	• Execution Time: ~0.031s • Runs: 99	GeoPandas is ~140x faster* than DuckDB for this Point-In-Polygon join.*
4.2. Tree Count and Length Sum of Residential Roads Near Hospitals (Rome)	`DuckDB Spatial`	Cold Start	`rome_hospitals.geoparquet`, `rome_residential_streets.geoparquet` and `rome_trees.geoparquet`	• Execution Time: ~0.288s	Result: 441 trees, ~103.7k meters. Numerically Correct* and consistent.*
4.2. Tree Count and Length Sum of Residential Roads Near Hospitals (Rome)	`DuckDB Spatial`	Hot Cache Avg.	`rome_hospitals.geoparquet`, `rome_residential_streets.geoparquet` and `rome_trees.geoparquet`	• Execution Time: ~0.258s • Runs: 99	Massive performance improvement. Now faster than GeoPandas on the cold run, and competitive on hot.
4.2. Tree Count and Length Sum of Residential Roads Near Hospitals (Rome)	`PostGIS`	Cold Start	`rome_hospitals`, `rome_residential_streets` and `rome_trees` tables	• Execution Time: ~185.74s	Result: 441 trees, ~103.1k meters. The numerical baseline.
4.2. Tree Count and Length Sum of Residential Roads Near Hospitals (Rome)	`PostGIS`	Hot Cache Avg.	`rome_hospitals`, `rome_residential_streets` and `rome_trees` tables	• Execution Time: ~186.80s • Runs: 99	The performance gap with other tools is enormous (~1000x).
4.2. Tree Count and Length Sum of Residential Roads Near Hospitals (Rome)	`GeoPandas`	Cold Start	`rome_hospitals.geoparquet`, `rome_residential_streets.geoparquet` and `rome_trees.geoparquet`	• Execution Time: ~0.189s	Result: 441 trees, ~103.3k meters. Numerically consistent with PostGIS.
4.2. Tree Count and Length Sum of Residential Roads Near Hospitals (Rome)	`GeoPandas`	Hot Cache Avg.	`rome_hospitals.geoparquet`, `rome_residential_streets.geoparquet` and `rome_trees.geoparquet`	• Execution Time: ~0.186s • Runs: 99	GeoPandas is ~1000x faster* than PostGIS.*
4.3. Area Not Covered by Parks (Rome)	`DuckDB Spatial`	Cold Start	`rome_parks.geoparquet` and `osmnx` boundary	• Execution Time: ~0.346s	Result 1,235 km². Numerically Correct. Massive improvement.
4.3. Area Not Covered by Parks (Rome)	`DuckDB Spatial`	Hot Cache Avg.	`rome_parks.geoparquet` and `osmnx` boundary	• Execution Time: ~0.335s • Runs: 99	Performance is now competitive with the other tools.
4.3. Area Not Covered by Parks (Rome)	`PostGIS`	Cold Start	`rome_parks` table and `osmnx` boundary	• Execution Time: ~0.248s	Result 1,235 km². Numerically Correct. PostGIS is the fastest and most robust.
4.3. Area Not Covered by Parks (Rome)	`PostGIS`	Hot Cache Avg.	`rome_parks` table and `osmnx` boundary	• Execution Time: ~0.236s • Runs: 99	Demonstrates extreme efficiency on large, complex geometric operations.
4.3. Area Not Covered by Parks (Rome)	`GeoPandas`	Cold Start	`rome_parks.geoparquet` and `osmnx` boundary	• Execution Time: ~0.782s	Result 1,235 km². Numerically Correct. Slower than PostGIS, but robust.
4.3. Area Not Covered by Parks (Rome)	`GeoPandas`	Hot Cache Avg.	`rome_parks.geoparquet` and `osmnx` boundary	• Execution Time: ~0.784s • Runs: 99	The GEOS-based tools are both numerically correct and performant for this task.

Appendix: Benchmarks on NON-OSM Data Formats

This section will track the performance of the core technologies on standard geospatial data formats, without the complexity of OSM-specific parsing libraries.

Pure Vector Data (Shapefile)

Goal: To measure ingestion and filtering performance on a standard Shapefile dataset, analyzing the impact of system caches.

Use Case 1: Ingestion

Operation Performed	Technology	Run Type	Test Dataset	Key Metrics	Notes / Qualitative Observations
Read Shapefile using ST_Read	`DuckDB Spatial`	Cold Start	`Com01012025_WGS84.shp`	• Execution Time: ~0.44s	Found Execution 7896 features. Very fast initial read from disk.
Read Shapefile using ST_Read	`DuckDB Spatial`	Hot Cache Avg.	`Com01012025_WGS84.shp`	• Execution Time: ~0.33s • Runs: 99	The OS file cache provides a noticeable, but smaller (~1.3x), speedup.
Load Shapefile into database	`PostGIS + shp2pgsql`	//	`Com01012025_WGS84.shp`	• Execution Time: ~21s	Represents the one-time cost of manual ingestion via CLI. Significantly slower due to database overhead.
Read Shapefile into GeoDataFrame	`GeoPandas`	Cold Start	`Com01012025_WGS84.shp`	• Execution Time: ~0.83s	Found 7896 features. Performance reflects reading the file from disk.
Read Shapefile into GeoDataFrame	`GeoPandas`	Hot Cache Avg.	`Com01012025_WGS84.shp`	• Execution Time: ~0.34s • Runs: 99	The ~2.4x speedup* is due to the OS file cache.*

Use Case 2: Filtering

Operation Performed	Technology	Run Type	Test Dataset	Key Metrics	Notes / Qualitative Observations
Filter table by attribute	`DuckDB Spatial`	Cold Start	`Com01012025_WGS84.shp`	• Execution Time: ~0.020s • Output: 12.57 MB	Filtered to 1180 features. Very fast in-memory query.
Filter table by attribute	`DuckDB Spatial`	Hot Cache Avg.	`Com01012025_WGS84.shp`	• Execution Time: ~0.019s • Runs: 99	CPU-bound, with minimal difference between cold and hot runs.
Filter table by attribute	`PostGIS`	Cold Start	`comuni_istat` table	• Execution Time: ~0.22s	Found 1180 features. Represents a query where data is read from disk into the database cache.
Filter table by attribute	`PostGIS`	Hot Cache Avg.	`comuni_istat` table	• Execution Time: ~0.075s • Runs: 99	The ~2.9x speedup* is due to PostgreSQL's internal caching mechanism.*
Filter GeoDataFrame by attribute	`GeoPandas`	Cold Start	`Com01012025_WGS84.shp`	• Execution Time: ~0.0010s • Output: 12.58 MB	Filtered to 1180 features. In-memory operation is extremely fast.
Filter GeoDataFrame by attribute	`GeoPandas`	Hot Cache Avg.	`Com01012025_WGS84.shp`	• Execution Time: ~0.0009s • Runs: 99	Negligible difference from cold start, confirming the operation is CPU-bound.

Raster Data (GeoTIFF)

Goal: To measure ingestion and filtering performance on a large-scale raster dataset (GHS-POP), analyzing the impact of system caches and comparing file-based vs database approaches.

Use Case 1: Ingestion

Operation Performed	Technology	Run Type	Test Dataset	Key Metrics	Notes / Qualitative Observations
Open GeoTIFF file	`DuckDB Spatial`	N/A	`GHS_POP_ITALY_100m.tif`	• Execution Time: N/A	Technology does not support native raster operations.
Load Raster into database	`PostGIS + raster2pgsql`	//	`GHS_POP_ITALY_100m.tif`	• Execution Time: ~370s (~6.2 min)	One-time cost for manual ingestion via CLI. Includes table creation, tiling, and indexing of the entire Italy raster.
Open GeoTIFF file	`Python (rioxarray)`	Cold Start	`GHS_POP_ITALY_100m.tif`	• Execution Time: ~0.15s	Measures the time to read file metadata from disk.
Open GeoTIFF file	`Python (rioxarray)`	Hot Cache Avg.	`GHS_POP_ITALY_100m.tif`	• Execution Time: ~0.03s • Runs: 99	The ~5.7x speedup* is due to the OS file caching of the file headers.*

Use Case 2: Filtering

Operation Performed	Technology	Run Type	Test Dataset	Key Metrics	Notes / Qualitative Observations
Clip raster to Milan	`DuckDB Spatial`	N/A	`GHS_POP_ITALY_100m.tif`	• Execution Time: N/A	Technology does not support native raster operations.
Clip raster to Milan	`PostGIS`	Cold Start	`ghs_population` table	• Execution Time: ~8.12s	Found 1,256,044* people. The query is computationally expensive.*
Clip raster to Milan	`PostGIS`	Hot Cache Avg.	`ghs_population` table	• Execution Time: ~7.40s • Runs: 99	The minimal speedup confirms the operation is CPU-bound* (raster processing) rather than I/O-bound.*
Clip raster to Milan	`Python (rioxarray)`	Cold Start	`GHS_POP_ITALY_100m.tif`	• Execution Time: ~7.57s • Output: 0.08 MB	Found 1,256,044* people. Performance is highly competitive with PostGIS.*
Clip raster to Milan	`Python (rioxarray)`	Hot Cache Avg.	`GHS_POP_ITALY_100m.tif`	• Execution Time: ~7.91s • Runs: 99	Performance is nearly identical to the cold start, definitively proving the operation is CPU-bound.

(Note: GHS_POP_ITALY_100m.tif is a TIF file generated using the GDAL merging tool via CLI, the dataset downloaded from Copernicus are the for 4 that contains every part of Italy.)

Use Case 5: Vector-Raster Analysis

Goal: To evaluate the performance of different technologies for calculating raster statistics based on vector zones.

Operation Performed	Technology	Run Type	Test Dataset	Key Metrics	Notes / Qualitative Observations
Calculate Population per Municipality (Piedmont)	`PostGIS`	Cold Start	`comuni_istat_clean` and `ghs_population` tables	• Execution Time: ~2.50s • GPKG Loading Time: ~17.87s	Calculated total population of 4,222,758 for 1180 municipalities. Result is numerically identical to the Python stack, even though it's needed an additional script to convert the data in a GPKG file, which needs more time to download the data again in the database.
Calculate Population per Municipality (Piedmont)	`PostGIS`	Hot Cache Avg.	`comuni_istat_clean` and `ghs_population` tables	• Execution Time: ~2.43s • Runs: 99	The operation is CPU-bound. PostGIS is ~5.5x faster* than the Python stack for this task.*
Calculate Population per Municipality (Piedmont)	`Python (rasterstats)`	Cold Start	`Com01012025_WGS84.shp` and `GHS_POP_ITALY_100m.tif`	• Execution Time: ~13.97s	Calculated total population of 4,222,758* for 1180 municipalities. Result is numerically identical to PostGIS.*
Calculate Population per Municipality (Piedmont)	`Python (rasterstats)`	Hot Cache Avg.	`Com01012025_WGS84.shp` and `GHS_POP_ITALY_100m.tif`	• Execution Time: ~13.45s • Runs: 99	The operation is also CPU-bound. The high-level libraries introduce a significant performance overhead compared to the database.

(Note: DuckDB still does not support native raster operations.)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Tables

📊 Geospatial Performance Benchmark

Use Case 1: Data Ingestion

Use Case 2: Data Filtering and Extraction

Use Case 3: Single Table Analysis (Geometric Properties)

Use Case 4: Complex Spatial Join

Appendix: Benchmarks on NON-OSM Data Formats

Pure Vector Data (Shapefile)

Use Case 1: Ingestion

Use Case 2: Filtering

Raster Data (GeoTIFF)

Use Case 1: Ingestion

Use Case 2: Filtering

Use Case 5: Vector-Raster Analysis

Clone this wiki locally