-
Notifications
You must be signed in to change notification settings - Fork 0
Tables
This document tracks the performance and characteristics of different geospatial software stacks for the use cases defined in the project.
Goal: To measure the one-time cost of preparing OSM data (.pbf) for analysis, either by loading it into a database or by creating a smaller subset.
| Operation Performed | Technology | Test Dataset | Key Metrics | Notes / Qualitative Observations |
|---|---|---|---|---|
Load entire .pbf into a database |
PostGIS + osm2pgsql |
italy-latest.osm.pbf |
• Import Time: ~1840s (~31 min) • Disk Space: 19 GB |
Represents the high upfront cost to create a fully indexed, queryable database. Ideal for many subsequent, fast queries. |
Extract a regional .pbf from a national .pbf
|
osmium-tool |
italy-latest.osm.pbf |
• Execution Time: ~21s | Represents the cost of creating a smaller, more manageable file for file-based workflows. A pre-processing step for tools like PyOsmium. |
(Note: For file-based tools like QuackOSM, the "ingestion" and "filtering" steps happen simultaneously and their performance is measured in Use Case 2)
Goal: To measure the efficiency of extracting a specific subset of data (buildings in Milan) from a larger source, analyzing the impact of system caches.
| Operation Performed | Technology | Run Type | Test Dataset | Key Metrics | Notes / Qualitative Observations |
|---|---|---|---|---|---|
Extract buildings from .pbf and save to .geoparquet
|
DuckDB + QuackOSM |
Cold Start | italy-latest.osm.pbf |
• Execution Time: ~1.34s • Output: 6.97 MB |
Performance is I/O-bound. This time reflects a true first-time run, limited by reading the large PBF from disk. |
Extract buildings from .pbf and save to .geoparquet
|
DuckDB + QuackOSM |
Hot Cache Avg. | italy-latest.osm.pbf |
• Execution Time: ~0.22s • Runs: 99 |
Pure in-memory performance. The ~6x speedup is due to the Operating System's file cache, proving the bottleneck is disk I/O, not CPU. |
| Extract buildings from database | PostGIS |
Cold Start |
planet_osm_polygon table |
• Execution Time: ~1.10s | Represents a first-time query, limited by reading data pages and indexes from disk into PostgreSQL's internal cache. |
| Extract buildings from database | PostGIS |
Hot Cache Avg. |
planet_osm_polygon table |
• Execution Time: ~0.40s • Runs: 99 |
Pure database query performance. The ~2.75x speedup is due to the database's internal caching mechanism. |
Read .pbf, build GeoDataFrame and filter by area |
PyOsmium + GeoPandas |
Cold Start | lombardy-latest.osm.pbf |
• Execution Time: ~358.9s • Output: 7.61 MB |
Performance is heavily CPU-bound. This time reflects the high computational cost of the two-pass parsing and geometry construction in Python. |
Read .pbf, build GeoDataFrame and filter by area |
PyOsmium + GeoPandas |
Hot Cache Avg. | lombardy-latest.osm.pbf |
• Execution Time: ~360.1s • Runs: 49 |
Nearly identical to the cold start, confirming the bottleneck is CPU processing, not disk I/O. The OS cache provides minimal benefit. |
Goal: Evaluate performance on calculations that do not require joins, but operate on geometries, scaling across different data sizes.
| Operation Performed | Technology | Run Type | Test Dataset | Key Metrics | Notes / Qualitative Observations |
|---|---|---|---|---|---|
| 3.1. Top 10 Largest Areas (Pinerolo) | DuckDB Spatial |
Cold Start | pinerolo_buildings.geoparquet |
• Execution Time: ~0.032s | Found 10 results. Minimal time difference from hot run. |
| 3.1. Top 10 Largest Areas (Pinerolo) | DuckDB Spatial |
Hot Cache Avg. | pinerolo_buildings.geoparquet |
• Execution Time: ~0.017s • Runs: 99 |
The ~1.9x speedup on the small dataset suggests a file metadata caching benefit. |
| 3.1. Top 10 Largest Areas (Pinerolo) | PostGIS |
Cold Start |
pinerolo_buildings table |
• Execution Time: ~0.044s | Found 10 results. Performance comparable to other techs. |
| 3.1. Top 10 Largest Areas (Pinerolo) | PostGIS |
Hot Cache Avg. |
pinerolo_buildings table |
• Execution Time: ~0.017s • Runs: 99 |
Significant speedup from database caching on the small dataset. |
| 3.1. Top 10 Largest Areas (Pinerolo) | GeoPandas |
Cold Start | pinerolo_buildings.geoparquet |
• Execution Time: ~0.071s | Found 10 results. Slower on cold start due to library overhead. |
| 3.1. Top 10 Largest Areas (Pinerolo) | GeoPandas |
Hot Cache Avg. | pinerolo_buildings.geoparquet |
• Execution Time: ~0.053s • Runs: 99 |
The slowest of the three on this task, but still very fast. |
| 3.2. Total Buffered Area (Pinerolo) | DuckDB Spatial |
Cold Start | pinerolo_buildings.geoparquet |
• Execution Time: ~0.122s | Result ~7.05M m². With the CRS fix, the result is numerically consistent with PostGIS/GeoPandas. |
| 3.2. Total Buffered Area (Pinerolo) | DuckDB Spatial |
Hot Cache Avg. | pinerolo_buildings.geoparquet |
• Execution Time: ~0.120s • Runs: 99 |
Performance is consistently CPU-bound. |
| 3.2. Total Buffered Area (Pinerolo) | PostGIS |
Cold Start |
pinerolo_buildings table |
• Execution Time: ~0.144s | Result ~7.05M m². Performance is CPU-bound. |
| 3.2. Total Buffered Area (Pinerolo) | PostGIS |
Hot Cache Avg. |
pinerolo_buildings table |
• Execution Time: ~0.144s • Runs: 99 |
The result is consistent with GeoPandas, suggesting it is the correct baseline. |
| 3.2. Total Buffered Area (Pinerolo) | GeoPandas |
Cold Start | pinerolo_buildings.geoparquet |
• Execution Time: ~0.174s | Result ~7.05M m². Performance is CPU-bound. |
| 3.2. Total Buffered Area (Pinerolo) | GeoPandas |
Hot Cache Avg. | pinerolo_buildings.geoparquet |
• Execution Time: ~0.176s • Runs: 99 |
The result confirms the PostGIS calculation. |
| 3.3. Restaurants NOT near Bus Stops (Pinerolo) | DuckDB Spatial |
Cold Start |
pinerolo_restaurants.geoparquet and pinerolo_bus_stops.geoparquet
|
• Execution Time: ~0.070s | Found 15 results. Numerically consistent with other tools. |
| 3.3. Restaurants NOT near Bus Stops (Pinerolo) | DuckDB Spatial |
Hot Cache Avg. |
pinerolo_restaurants.geoparquet and pinerolo_bus_stops.geoparquet
|
• Execution Time: ~0.062s • Runs: 99 |
Performance is highly competitive. |
| 3.3. Restaurants NOT near Bus Stops (Pinerolo) | PostGIS |
Cold Start |
pinerolo_restaurants and pinerolo_bus_stops tables |
• Execution Time: ~0.003s | Found 15 results. Extremely fast due to database optimizations. |
| 3.3. Restaurants NOT near Bus Stops (Pinerolo) | PostGIS |
Hot Cache Avg. |
pinerolo_restaurants and pinerolo_bus_stops tables |
• Execution Time: ~0.0008s • Runs: 99 |
The fastest technology for this small-scale join. |
| 3.3. Restaurants NOT near Bus Stops (Pinerolo) | GeoPandas |
Cold Start |
pinerolo_restaurants.geoparquet and pinerolo_bus_stops.geoparquet
|
• Execution Time: ~0.0045s | Found 15 results. Result now matches PostGIS and DuckDB, using the more accurate buffer+sjoin method. |
| 3.3. Restaurants NOT near Bus Stops (Pinerolo) | GeoPandas |
Hot Cache Avg. |
pinerolo_restaurants.geoparquet and pinerolo_bus_stops.geoparquet
|
• Execution Time: ~0.0054s • Runs: 99 |
Achieves numerical consistency with a minimal performance cost on small data. |
| 3.1. Top 10 Largest Areas (Milan) | DuckDB Spatial |
Cold Start | milan_buildings.geoparquet |
• Execution Time: ~0.199s | Found 10 results. Operation is now CPU-bound on a larger dataset. |
| 3.1. Top 10 Largest Areas (Milan) | DuckDB Spatial |
Hot Cache Avg. | milan_buildings.geoparquet |
• Execution Time: ~0.188s • Runs: 99 |
Minimal speedup, as the ST_Area calculation is the dominant cost. |
| 3.1. Top 10 Largest Areas (Milan) | PostGIS |
Cold Start |
milan_buildings table |
• Execution Time: ~0.214s | Found 10 results. Performance is nearly identical to DuckDB. |
| 3.1. Top 10 Largest Areas (Milan) | PostGIS |
Hot Cache Avg. |
milan_buildings table |
• Execution Time: ~0.196s • Runs: 99 |
Similar CPU-bound behavior. |
| 3.1. Top 10 Largest Areas (Milan) | GeoPandas |
Cold Start | milan_buildings.geoparquet |
• Execution Time: ~0.835s | Found 10 results. Significantly slower than database approaches on a medium dataset. |
| 3.1. Top 10 Largest Areas (Milan) | GeoPandas |
Hot Cache Avg. | milan_buildings.geoparquet |
• Execution Time: ~0.869s • Runs: 99 |
The performance gap widens on larger data. |
| 3.2. Total Buffered Area (Milan) | DuckDB Spatial |
Cold Start | milan_buildings.geoparquet |
• Execution Time: ~2.10s | Result ~111.4M m². Numerically Correct and consistent with other tools. |
| 3.2. Total Buffered Area (Milan) | DuckDB Spatial |
Hot Cache Avg. | milan_buildings.geoparquet |
• Execution Time: ~2.08s • Runs: 99 |
Performance is faster than GeoPandas, slightly slower than PostGIS. |
| 3.2. Total Buffered Area (Milan) | PostGIS |
Cold Start |
milan_buildings table |
• Execution Time: ~1.37s | Result ~111.5M m². Fastest tech for this heavy computation. |
| 3.2. Total Buffered Area (Milan) | PostGIS |
Hot Cache Avg. |
milan_buildings table |
• Execution Time: ~1.34s • Runs: 99 |
Consistently CPU-bound behavior. |
| 3.2. Total Buffered Area (Milan) | GeoPandas |
Cold Start | milan_buildings.geoparquet |
• Execution Time: ~2.99s | Result ~111.5M m². Slower than the databases, but confirms the PostGIS result. |
| 3.2. Total Buffered Area (Milan) | GeoPandas |
Hot Cache Avg. | milan_buildings.geoparquet |
• Execution Time: ~2.93s • Runs: 99 |
Consistently CPU-bound. |
| 3.3. Restaurants NOT near Bus Stops (Milan) | DuckDB Spatial |
Cold Start |
milan_restaurants.geoparquet and milan_bus_stops.geoparquet
|
• Execution Time: ~0.087s | Found 2135 results. Numerically consistent. |
| 3.3. Restaurants NOT near Bus Stops (Milan) | DuckDB Spatial |
Hot Cache Avg. |
milan_restaurants.geoparquet and milan_bus_stops.geoparquet
|
• Execution Time: ~0.077s • Runs: 99 |
Significantly faster than PostGIS for this query type. |
| 3.3. Restaurants NOT near Bus Stops (Milan) | PostGIS |
Cold Start |
milan_restaurants and milan_bus_stops tables |
• Execution Time: ~10.53s | Found 2135 results. Very slow on this operation. |
| 3.3. Restaurants NOT near Bus Stops (Milan) | PostGIS |
Hot Cache Avg. |
milan_restaurants and milan_bus_stops tables |
• Execution Time: ~10.49s • Runs: 99 |
Confirms poor performance on this type of spatial anti-join. |
| 3.3. Restaurants NOT near Bus Stops (Milan) | GeoPandas |
Cold Start |
milan_restaurants.geoparquet and milan_bus_stops.geoparquet
|
• Execution Time: ~0.049s | Found 2136 results. Numerically consistent with PostGIS/DuckDB. |
| 3.3. Restaurants NOT near Bus Stops (Milan) | GeoPandas |
Hot Cache Avg. |
milan_restaurants.geoparquet and milan_bus_stops.geoparquet
|
• Execution Time: ~0.048s • Runs: 99 |
Still ~220x faster than PostGIS. The performance cost for accuracy is small compared to the gain in reliability. |
| 3.1. Top 10 Largest Areas (Rome) | DuckDB Spatial |
Cold Start | rome_buildings.geoparquet |
• Execution Time: ~0.52s | Found 10 results. Performance scales predictably. |
| 3.1. Top 10 Largest Areas (Rome) | DuckDB Spatial |
Hot Cache Avg. | rome_buildings.geoparquet |
• Execution Time: ~0.51s • Runs: 99 |
Consistently CPU-bound on the largest dataset. |
| 3.1. Top 10 Largest Areas (Rome) | PostGIS |
Cold Start |
rome_buildings table |
• Execution Time: ~0.40s | PostGIS becomes faster than DuckDB on the largest dataset for this query. |
| 3.1. Top 10 Largest Areas (Rome) | PostGIS |
Hot Cache Avg. |
rome_buildings table |
• Execution Time: ~0.38s • Runs: 99 |
Demonstrates better scaling than DuckDB for this operation. |
| 3.1. Top 10 Largest Areas (Rome) | GeoPandas |
Cold Start | rome_buildings.geoparquet |
• Execution Time: ~2.52s | The performance gap between GeoPandas and the databases is largest on this dataset. |
| 3.1. Top 10 Largest Areas (Rome) | GeoPandas |
Hot Cache Avg. | rome_buildings.geoparquet |
• Execution Time: ~2.63s • Runs: 99 |
Confirms that for pure computation on large in-memory data, database engines are more optimized. |
| 3.2. Total Buffered Area (Rome) | DuckDB Spatial |
Cold Start | rome_buildings.geoparquet |
• Execution Time: ~6.48s | Result ~298.9M m². Numerically Correct and consistent. |
| 3.2. Total Buffered Area (Rome) | DuckDB Spatial |
Hot Cache Avg. | rome_buildings.geoparquet |
• Execution Time: ~6.39s • Runs: 99 |
Slower than PostGIS, but faster than GeoPandas on the largest dataset. |
| 3.2. Total Buffered Area (Rome) | PostGIS |
Cold Start |
rome_buildings table |
• Execution Time: ~2.80s | Result ~299.2M m². PostGIS is over 2x faster than DuckDB on this heavy computation. |
| 3.2. Total Buffered Area (Rome) | PostGIS |
Hot Cache Avg. |
rome_buildings table |
• Execution Time: ~2.78s • Runs: 99 |
Confirms superior performance for complex, single-table aggregations on large data. |
| 3.2. Total Buffered Area (Rome) | GeoPandas |
Cold Start | rome_buildings.geoparquet |
• Execution Time: ~9.38s | Result ~299.2M m². Numerically consistent with PostGIS but the slowest tech. |
| 3.2. Total Buffered Area (Rome) | GeoPandas |
Hot Cache Avg. | rome_buildings.geoparquet |
• Execution Time: ~9.14s • Runs: 99 |
The performance gap widens significantly on the largest dataset. |
| 3.3. Restaurants NOT near Bus Stops (Rome) | DuckDB Spatial |
Cold Start |
rome_restaurants.geoparquet and rome_bus_stops.geoparquet
|
• Execution Time: ~0.090s | Found 2298 results. Numerically consistent. |
| 3.3. Restaurants NOT near Bus Stops (Rome) | DuckDB Spatial |
Hot Cache Avg. |
rome_restaurants.geoparquet and rome_bus_stops.geoparquet
|
• Execution Time: ~0.081s • Runs: 99 |
Dramatically faster than PostGIS for this large join. |
| 3.3. Restaurants NOT near Bus Stops (Rome) | PostGIS |
Cold Start |
rome_restaurants and rome_bus_stops tables |
• Execution Time: ~40.1s | Found 2298 results. Extremely slow on the largest dataset. |
| 3.3. Restaurants NOT near Bus Stops (Rome) | PostGIS |
Hot Cache Avg. |
rome_restaurants and rome_bus_stops tables |
• Execution Time: ~40.3s • Runs: 99 |
Performance of the join is significantly slower than DuckDB on the largest dataset. |
| 3.3. Restaurants NOT near Bus Stops (Rome) | GeoPandas |
Cold Start |
rome_restaurants.geoparquet and rome_bus_stops.geoparquet
|
• Execution Time: ~0.133s | Found 2299 results. Numerically Correct. The buffer+sjoin method is more computationally expensive. |
| 3.3. Restaurants NOT near Bus Stops (Rome) | GeoPandas |
Hot Cache Avg. |
rome_restaurants.geoparquet and rome_bus_stops.geoparquet
|
• Execution Time: ~0.130s • Runs: 99 |
Highlights the trade-off: slower than the old method, but now correct and still ~310x faster than PostGIS. |
Goal: To test performance on computationally intensive join operations on large datasets.
| Operation Performed | Technology | Run Type | Test Dataset | Key Metrics | Notes / Qualitative Observations |
|---|---|---|---|---|---|
| 4.1. Restaurants per Neighborhood (Pinerolo) | DuckDB Spatial |
Cold Start |
pinerolo_neighborhoods.geoparquet and pinerolo_restaurants.geoparquet
|
• Execution Time: ~0.033s • Output: 0.13 MB |
Found 13 results, numerically consistent with other tools. |
| 4.1. Restaurants per Neighborhood (Pinerolo) | DuckDB Spatial |
Hot Cache Avg. |
pinerolo_neighborhoods.geoparquet and pinerolo_restaurants.geoparquet
|
• Execution Time: ~0.004s • Runs: 99 |
Extremely fast on hot runs. |
| 4.1. Restaurants per Neighborhood (Pinerolo) | PostGIS |
Cold Start |
pinerolo_neighborhoods and pinerolo_restaurants tables |
• Execution Time: ~0.041s | Found 13 results, numerically consistent with others. |
| 4.1. Restaurants per Neighborhood (Pinerolo) | PostGIS |
Hot Cache Avg. |
pinerolo_neighborhoods and pinerolo_restaurants tables |
• Execution Time: ~0.015s • Runs: 99 |
Good speedup from database caching. |
| 4.1. Restaurants per Neighborhood (Pinerolo) | GeoPandas |
Cold Start |
pinerolo_neighborhoods.geoparquet and pinerolo_restaurants.geoparquet
|
• Execution Time: ~0.006s • Output: 0.12 MB |
Found 13 results. The fastest technology for this task on a small dataset. |
| 4.1. Restaurants per Neighborhood (Pinerolo) | GeoPandas |
Hot Cache Avg. |
pinerolo_neighborhoods.geoparquet and pinerolo_restaurants.geoparquet
|
• Execution Time: ~0.005s • Runs: 99 |
Confirms its status as the top performer for this operation. |
| 4.2. Tree Count and Length Sum of Residential Roads Near Hospitals (Pinerolo) | DuckDB Spatial |
Cold Start |
pinerolo_hospitals.geoparquet, pinerolo_residential_streets.geoparquet and pinerolo_trees.geoparquet
|
• Execution Time: ~0.178s | Result: 0 trees, ~5.0k meters. Numerically Correct and consistent with PostGIS/GeoPandas. |
| 4.2. Tree Count and Length Sum of Residential Roads Near Hospitals (Pinerolo) | DuckDB Spatial |
Hot Cache Avg. |
pinerolo_hospitals.geoparquet, pinerolo_residential_streets.geoparquet and pinerolo_trees.geoparquet
|
• Execution Time: ~0.118s • Runs: 99 |
Slower than GeoPandas, but shows competitive performance. |
| 4.2. Tree Count and Length Sum of Residential Roads Near Hospitals (Pinerolo) | PostGIS |
Cold Start |
pinerolo_hospitals, pinerolo_residential_streets and pinerolo_trees tables |
• Execution Time: ~0.099s | Result: 0 trees, ~5.0k meters. Numerical result is the baseline. |
| 4.2. Tree Count and Length Sum of Residential Roads Near Hospitals (Pinerolo) | PostGIS |
Hot Cache Avg. |
pinerolo_hospitals, pinerolo_residential_streets and pinerolo_trees tables |
• Execution Time: ~0.079s • Runs: 99 |
The slowest of the three on the small dataset for this query type. |
| 4.2. Tree Count and Length Sum of Residential Roads Near Hospitals (Pinerolo) | GeoPandas |
Cold Start |
pinerolo_hospitals.geoparquet, pinerolo_residential_streets.geoparquet and pinerolo_trees.geoparquet
|
• Execution Time: ~0.015s | Result: 0 trees, ~5.0k meters. Numerically consistent with PostGIS. |
| 4.2. Tree Count and Length Sum of Residential Roads Near Hospitals (Pinerolo) | GeoPandas |
Hot Cache Avg. |
pinerolo_hospitals.geoparquet, pinerolo_residential_streets.geoparquet and pinerolo_trees.geoparquet
|
• Execution Time: ~0.008s • Runs: 99 |
Significantly faster than the databases due to sjoin_nearest. |
| 4.3. Area Not Covered by Parks (Pinerolo) | DuckDB Spatial |
Cold Start |
pinerolo_parks.geoparquet and osmnx boundary |
• Execution Time: ~0.033s | Result 49.97 km². Numerically Correct. With the CRS fix, the result is now perfectly aligned. |
| 4.3. Area Not Covered by Parks (Pinerolo) | DuckDB Spatial |
Hot Cache Avg. |
pinerolo_parks.geoparquet and osmnx boundary |
• Execution Time: ~0.025s • Runs: 99 |
Performance is competitive and the task is CPU-bound. |
| 4.3. Area Not Covered by Parks (Pinerolo) | PostGIS |
Cold Start |
pinerolo_parks table and osmnx boundary |
• Execution Time: ~0.021s | Result 49.97 km². Numerically Correct. PostGIS handles the data correctly. |
| 4.3. Area Not Covered by Parks (Pinerolo) | PostGIS |
Hot Cache Avg. |
pinerolo_parks table and osmnx boundary |
• Execution Time: ~0.006s • Runs: 99 |
Very fast and numerically robust. ~3.5x speedup from caching. |
| 4.3. Area Not Covered by Parks (Pinerolo) | GeoPandas |
Cold Start |
pinerolo_parks.geoparquet and osmnx boundary |
• Execution Time: ~0.005s | Result 49.97 km². Numerically Correct, confirming the PostGIS/GEOS baseline. |
| 4.3. Area Not Covered by Parks (Pinerolo) | GeoPandas |
Hot Cache Avg. |
pinerolo_parks.geoparquet and osmnx boundary |
• Execution Time: ~0.005s • Runs: 99 |
The fastest of the three for this task and numerically robust. |
| 4.1. Restaurants per Neighborhood (Milan) | DuckDB Spatial |
Cold Start |
milan_neighborhoods.geoparquet and milan_restaurants.geoparquet
|
• Execution Time: ~0.119s • Output: 0.29 MB |
Found 34 results, consistent with other tools. |
| 4.1. Restaurants per Neighborhood (Milan) | DuckDB Spatial |
Hot Cache Avg. |
milan_neighborhoods.geoparquet and milan_restaurants.geoparquet
|
• Execution Time: ~0.097s • Runs: 99 |
Slower than PostGIS on this medium-scale join. |
| 4.1. Restaurants per Neighborhood (Milan) | PostGIS |
Cold Start |
milan_neighborhoods and milan_restaurants tables |
• Execution Time: ~0.220s | Found 34 results. Faster than DuckDB on the cold run. |
| 4.1. Restaurants per Neighborhood (Milan) | PostGIS |
Hot Cache Avg. |
milan_neighborhoods and milan_restaurants tables |
• Execution Time: ~0.201s • Runs: 99 |
Faster than DuckDB, but significantly slower than GeoPandas. |
| 4.1. Restaurants per Neighborhood (Milan) | GeoPandas |
Cold Start |
milan_neighborhoods.geoparquet and milan_restaurants.geoparquet
|
• Execution Time: ~0.010s • Output: 0.28 MB |
Found 34 results. Extraordinarily fast, showing the power of in-memory operations. |
| 4.1. Restaurants per Neighborhood (Milan) | GeoPandas |
Hot Cache Avg. |
milan_neighborhoods.geoparquet and milan_restaurants.geoparquet
|
• Execution Time: ~0.009s • Runs: 99 |
Over 20x faster than PostGIS. |
| 4.2. Tree Count and Length Sum of Residential Roads Near Hospitals (Milan) | DuckDB Spatial |
Cold Start |
milan_hospitals.geoparquet, milan_residential_streets.geoparquet and milan_trees.geoparquet
|
• Execution Time: ~0.193s | Result: 234 trees, ~25.0k meters. Numerically Correct and consistent with PostGIS/GeoPandas. |
| 4.2. Tree Count and Length Sum of Residential Roads Near Hospitals (Milan) | DuckDB Spatial |
Hot Cache Avg. |
milan_hospitals.geoparquet, milan_residential_streets.geoparquet and milan_trees.geoparquet
|
• Execution Time: ~0.155s • Runs: 99 |
Significantly faster than PostGIS, but much slower than GeoPandas. |
| 4.2. Tree Count and Length Sum of Residential Roads Near Hospitals (Milan) | PostGIS |
Cold Start |
milan_hospitals, milan_residential_streets and milan_trees tables |
• Execution Time: ~34.16s | Result: 234 trees, ~25.0k meters. The numerical baseline. |
| 4.2. Tree Count and Length Sum of Residential Roads Near Hospitals (Milan) | PostGIS |
Hot Cache Avg. |
milan_hospitals, milan_residential_streets and milan_trees tables |
• Execution Time: ~33.11s • Runs: 99 |
The slowest technology by a large margin for this query type. |
| 4.2. Tree Count and Length Sum of Residential Roads Near Hospitals (Milan) | GeoPandas |
Cold Start |
milan_hospitals.geoparquet, milan_residential_streets.geoparquet and milan_trees.geoparquet
|
• Execution Time: ~0.079s | Result: 234 trees, ~25.1k meters. Numerically consistent with PostGIS. |
| 4.2. Tree Count and Length Sum of Residential Roads Near Hospitals (Milan) | GeoPandas |
Hot Cache Avg. |
milan_hospitals.geoparquet, milan_residential_streets.geoparquet and milan_trees.geoparquet
|
• Execution Time: ~0.082s • Runs: 99 |
Significantly faster than the databases. sjoin_nearest is a clear winner. |
| 4.3. Area Not Covered by Parks (Milan) | DuckDB Spatial |
Cold Start |
milan_parks.geoparquet and osmnx boundary |
• Execution Time: ~0.209s | Result 162.98 km². Numerically Correct. A massive improvement in performance and correctness. |
| 4.3. Area Not Covered by Parks (Milan) | DuckDB Spatial |
Hot Cache Avg. |
milan_parks.geoparquet and osmnx boundary |
• Execution Time: ~0.198s • Runs: 99 |
Performance is now highly competitive with PostGIS. The task is CPU-bound. |
| 4.3. Area Not Covered by Parks (Milan) | PostGIS |
Cold Start |
milan_parks table and osmnx boundary |
• Execution Time: ~0.142s | Result 162.98 km². Numerically Correct. PostGIS is the fastest for this operation. |
| 4.3. Area Not Covered by Parks (Milan) | PostGIS |
Hot Cache Avg. |
milan_parks table and osmnx boundary |
• Execution Time: ~0.138s • Runs: 99 |
Shows extreme efficiency and numerical robustness for complex geometry operations. |
| 4.3. Area Not Covered by Parks (Milan) | GeoPandas |
Cold Start |
milan_parks.geoparquet and osmnx boundary |
• Execution Time: ~0.430s | Result 162.98 km². Numerically Correct. Slower than PostGIS but confirms the robustness of the GEOS engine. |
| 4.3. Area Not Covered by Parks (Milan) | GeoPandas |
Hot Cache Avg. |
milan_parks.geoparquet and osmnx boundary |
• Execution Time: ~0.423s • Runs: 99 |
Confirms robustness but is slower than the databases for this task. |
| 4.1. Restaurants per Neighborhood (Rome) | DuckDB Spatial |
Cold Start |
rome_neighborhoods.geoparquet and rome_restaurants.geoparquet
|
• Execution Time: ~4.97s • Output: 2.23 MB |
Found 52 results, consistent with other tools. |
| 4.1. Restaurants per Neighborhood (Rome) | DuckDB Spatial |
Hot Cache Avg. |
rome_neighborhoods.geoparquet and rome_restaurants.geoparquet
|
• Execution Time: ~4.36s • Runs: 99 |
Faster than PostGIS on the largest dataset. |
| 4.1. Restaurants per Neighborhood (Rome) | PostGIS |
Cold Start |
rome_neighborhoods and rome_restaurants tables |
• Execution Time: ~6.68s | Found 52 results. Slower than DuckDB on the largest dataset. |
| 4.1. Restaurants per Neighborhood (Rome) | PostGIS |
Hot Cache Avg. |
rome_neighborhoods and rome_restaurants tables |
• Execution Time: ~5.32s • Runs: 99 |
Caching provides a noticeable benefit. |
| 4.1. Restaurants per Neighborhood (Rome) | GeoPandas |
Cold Start |
rome_neighborhoods.geoparquet and rome_restaurants.geoparquet
|
• Execution Time: ~0.034s • Output: 2.23 MB |
Found 52 results. The performance gap is enormous on the largest dataset. |
| 4.1. Restaurants per Neighborhood (Rome) | GeoPandas |
Hot Cache Avg. |
rome_neighborhoods.geoparquet and rome_restaurants.geoparquet
|
• Execution Time: ~0.031s • Runs: 99 |
GeoPandas is ~140x faster than DuckDB for this Point-In-Polygon join. |
| 4.2. Tree Count and Length Sum of Residential Roads Near Hospitals (Rome) | DuckDB Spatial |
Cold Start |
rome_hospitals.geoparquet, rome_residential_streets.geoparquet and rome_trees.geoparquet
|
• Execution Time: ~0.288s | Result: 441 trees, ~103.7k meters. Numerically Correct and consistent. |
| 4.2. Tree Count and Length Sum of Residential Roads Near Hospitals (Rome) | DuckDB Spatial |
Hot Cache Avg. |
rome_hospitals.geoparquet, rome_residential_streets.geoparquet and rome_trees.geoparquet
|
• Execution Time: ~0.258s • Runs: 99 |
Massive performance improvement. Now faster than GeoPandas on the cold run, and competitive on hot. |
| 4.2. Tree Count and Length Sum of Residential Roads Near Hospitals (Rome) | PostGIS |
Cold Start |
rome_hospitals, rome_residential_streets and rome_trees tables |
• Execution Time: ~185.74s | Result: 441 trees, ~103.1k meters. The numerical baseline. |
| 4.2. Tree Count and Length Sum of Residential Roads Near Hospitals (Rome) | PostGIS |
Hot Cache Avg. |
rome_hospitals, rome_residential_streets and rome_trees tables |
• Execution Time: ~186.80s • Runs: 99 |
The performance gap with other tools is enormous (~1000x). |
| 4.2. Tree Count and Length Sum of Residential Roads Near Hospitals (Rome) | GeoPandas |
Cold Start |
rome_hospitals.geoparquet, rome_residential_streets.geoparquet and rome_trees.geoparquet
|
• Execution Time: ~0.189s | Result: 441 trees, ~103.3k meters. Numerically consistent with PostGIS. |
| 4.2. Tree Count and Length Sum of Residential Roads Near Hospitals (Rome) | GeoPandas |
Hot Cache Avg. |
rome_hospitals.geoparquet, rome_residential_streets.geoparquet and rome_trees.geoparquet
|
• Execution Time: ~0.186s • Runs: 99 |
GeoPandas is ~1000x faster than PostGIS. |
| 4.3. Area Not Covered by Parks (Rome) | DuckDB Spatial |
Cold Start |
rome_parks.geoparquet and osmnx boundary |
• Execution Time: ~0.346s | Result 1,235 km². Numerically Correct. Massive improvement. |
| 4.3. Area Not Covered by Parks (Rome) | DuckDB Spatial |
Hot Cache Avg. |
rome_parks.geoparquet and osmnx boundary |
• Execution Time: ~0.335s • Runs: 99 |
Performance is now competitive with the other tools. |
| 4.3. Area Not Covered by Parks (Rome) | PostGIS |
Cold Start |
rome_parks table and osmnx boundary |
• Execution Time: ~0.248s | Result 1,235 km². Numerically Correct. PostGIS is the fastest and most robust. |
| 4.3. Area Not Covered by Parks (Rome) | PostGIS |
Hot Cache Avg. |
rome_parks table and osmnx boundary |
• Execution Time: ~0.236s • Runs: 99 |
Demonstrates extreme efficiency on large, complex geometric operations. |
| 4.3. Area Not Covered by Parks (Rome) | GeoPandas |
Cold Start |
rome_parks.geoparquet and osmnx boundary |
• Execution Time: ~0.782s | Result 1,235 km². Numerically Correct. Slower than PostGIS, but robust. |
| 4.3. Area Not Covered by Parks (Rome) | GeoPandas |
Hot Cache Avg. |
rome_parks.geoparquet and osmnx boundary |
• Execution Time: ~0.784s • Runs: 99 |
The GEOS-based tools are both numerically correct and performant for this task. |
This section will track the performance of the core technologies on standard geospatial data formats, without the complexity of OSM-specific parsing libraries.
Goal: To measure ingestion and filtering performance on a standard Shapefile dataset, analyzing the impact of system caches.
| Operation Performed | Technology | Run Type | Test Dataset | Key Metrics | Notes / Qualitative Observations |
|---|---|---|---|---|---|
| Read Shapefile using ST_Read | DuckDB Spatial |
Cold Start | Com01012025_WGS84.shp |
• Execution Time: ~0.44s | Found Execution 7896 features. Very fast initial read from disk. |
| Read Shapefile using ST_Read | DuckDB Spatial |
Hot Cache Avg. | Com01012025_WGS84.shp |
• Execution Time: ~0.33s • Runs: 99 |
The OS file cache provides a noticeable, but smaller (~1.3x), speedup. |
| Load Shapefile into database | PostGIS + shp2pgsql |
// | Com01012025_WGS84.shp |
• Execution Time: ~21s | Represents the one-time cost of manual ingestion via CLI. Significantly slower due to database overhead. |
| Read Shapefile into GeoDataFrame | GeoPandas |
Cold Start | Com01012025_WGS84.shp |
• Execution Time: ~0.83s | Found 7896 features. Performance reflects reading the file from disk. |
| Read Shapefile into GeoDataFrame | GeoPandas |
Hot Cache Avg. | Com01012025_WGS84.shp |
• Execution Time: ~0.34s • Runs: 99 |
The ~2.4x speedup is due to the OS file cache. |
| Operation Performed | Technology | Run Type | Test Dataset | Key Metrics | Notes / Qualitative Observations |
|---|---|---|---|---|---|
| Filter table by attribute | DuckDB Spatial |
Cold Start | Com01012025_WGS84.shp |
• Execution Time: ~0.020s • Output: 12.57 MB |
Filtered to 1180 features. Very fast in-memory query. |
| Filter table by attribute | DuckDB Spatial |
Hot Cache Avg. | Com01012025_WGS84.shp |
• Execution Time: ~0.019s • Runs: 99 |
CPU-bound, with minimal difference between cold and hot runs. |
| Filter table by attribute | PostGIS |
Cold Start |
comuni_istat table |
• Execution Time: ~0.22s | Found 1180 features. Represents a query where data is read from disk into the database cache. |
| Filter table by attribute | PostGIS |
Hot Cache Avg. |
comuni_istat table |
• Execution Time: ~0.075s • Runs: 99 |
The ~2.9x speedup is due to PostgreSQL's internal caching mechanism. |
| Filter GeoDataFrame by attribute | GeoPandas |
Cold Start | Com01012025_WGS84.shp |
• Execution Time: ~0.0010s • Output: 12.58 MB |
Filtered to 1180 features. In-memory operation is extremely fast. |
| Filter GeoDataFrame by attribute | GeoPandas |
Hot Cache Avg. | Com01012025_WGS84.shp |
• Execution Time: ~0.0009s • Runs: 99 |
Negligible difference from cold start, confirming the operation is CPU-bound. |
Goal: To measure ingestion and filtering performance on a large-scale raster dataset (GHS-POP), analyzing the impact of system caches and comparing file-based vs database approaches.
| Operation Performed | Technology | Run Type | Test Dataset | Key Metrics | Notes / Qualitative Observations |
|---|---|---|---|---|---|
| Open GeoTIFF file | DuckDB Spatial |
N/A | GHS_POP_ITALY_100m.tif |
• Execution Time: N/A | Technology does not support native raster operations. |
| Load Raster into database | PostGIS + raster2pgsql |
// | GHS_POP_ITALY_100m.tif |
• Execution Time: ~370s (~6.2 min) | One-time cost for manual ingestion via CLI. Includes table creation, tiling, and indexing of the entire Italy raster. |
| Open GeoTIFF file | Python (rioxarray) |
Cold Start | GHS_POP_ITALY_100m.tif |
• Execution Time: ~0.15s | Measures the time to read file metadata from disk. |
| Open GeoTIFF file | Python (rioxarray) |
Hot Cache Avg. | GHS_POP_ITALY_100m.tif |
• Execution Time: ~0.03s • Runs: 99 |
The ~5.7x speedup is due to the OS file caching of the file headers. |
| Operation Performed | Technology | Run Type | Test Dataset | Key Metrics | Notes / Qualitative Observations |
|---|---|---|---|---|---|
| Clip raster to Milan | DuckDB Spatial |
N/A | GHS_POP_ITALY_100m.tif |
• Execution Time: N/A | Technology does not support native raster operations. |
| Clip raster to Milan | PostGIS |
Cold Start |
ghs_population table |
• Execution Time: ~8.12s | Found 1,256,044 people. The query is computationally expensive. |
| Clip raster to Milan | PostGIS |
Hot Cache Avg. |
ghs_population table |
• Execution Time: ~7.40s • Runs: 99 |
The minimal speedup confirms the operation is CPU-bound (raster processing) rather than I/O-bound. |
| Clip raster to Milan | Python (rioxarray) |
Cold Start | GHS_POP_ITALY_100m.tif |
• Execution Time: ~7.57s • Output: 0.08 MB |
Found 1,256,044 people. Performance is highly competitive with PostGIS. |
| Clip raster to Milan | Python (rioxarray) |
Hot Cache Avg. | GHS_POP_ITALY_100m.tif |
• Execution Time: ~7.91s • Runs: 99 |
Performance is nearly identical to the cold start, definitively proving the operation is CPU-bound. |
(Note: GHS_POP_ITALY_100m.tif is a TIF file generated using the GDAL merging tool via CLI, the dataset downloaded from Copernicus are the for 4 that contains every part of Italy.)
Goal: To evaluate the performance of different technologies for calculating raster statistics based on vector zones.
| Operation Performed | Technology | Run Type | Test Dataset | Key Metrics | Notes / Qualitative Observations |
|---|---|---|---|---|---|
| Calculate Population per Municipality (Piedmont) | PostGIS |
Cold Start |
comuni_istat_clean and ghs_population tables |
• Execution Time: ~2.50s • GPKG Loading Time: ~17.87s |
Calculated total population of 4,222,758 for 1180 municipalities. Result is numerically identical to the Python stack, even though it's needed an additional script to convert the data in a GPKG file, which needs more time to download the data again in the database. |
| Calculate Population per Municipality (Piedmont) | PostGIS |
Hot Cache Avg. |
comuni_istat_clean and ghs_population tables |
• Execution Time: ~2.43s • Runs: 99 |
The operation is CPU-bound. PostGIS is ~5.5x faster than the Python stack for this task. |
| Calculate Population per Municipality (Piedmont) | Python (rasterstats) |
Cold Start |
Com01012025_WGS84.shp and GHS_POP_ITALY_100m.tif
|
• Execution Time: ~13.97s | Calculated total population of 4,222,758 for 1180 municipalities. Result is numerically identical to PostGIS. |
| Calculate Population per Municipality (Piedmont) | Python (rasterstats) |
Hot Cache Avg. |
Com01012025_WGS84.shp and GHS_POP_ITALY_100m.tif
|
• Execution Time: ~13.45s • Runs: 99 |
The operation is also CPU-bound. The high-level libraries introduce a significant performance overhead compared to the database. |
(Note: DuckDB still does not support native raster operations.)