Conversation
There was a problem hiding this comment.
Thank you @jiayuasu -- I think this looks great!
Here is a preview of what it looks like

If anyone else is interested, here is what the navigation looks like
cc @kylebarron in case you are interested in this content as well
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
2010YOUY01
left a comment
There was a problem hiding this comment.
Thank you, this is a great read! I left a few suggestions for you to consider.
f257a21 to
e20bf95
Compare
There was a problem hiding this comment.
It's been pointed out to me that the coverage matrix doesn't cover statistics/geometry bounding, without which predicate pushdown doesn't work: every rowgroup with the column needs scanning.
Maybe a "what next?" paragraph
Geospatial support in Parquet is still ongoing; as of February 2026 columns statistics collection is incomplete, which means that scanning some types may require reading all the data. Furthermore the query engines themselves need to adopt the new format extensions.
What you do get now is the ability to save geospatial data in Parquet files, with support in those query engines increasing over time.
Maybe a more accurate summary is that the column statistics collection is not yet fully integrated into all engines. FWIW the Rust Parquet implementation does handle such statistics (thanks to @kylebarron and @paleolimbot as I recall) -- https://docs.rs/parquet/latest/parquet/format/struct.GeospatialStatistics.html, and I think SedonaDB has already integrated it into their query engine as well. Perhaps we can add a line to the https://parquet.apache.org/docs/file-format/implementationstatus/ page for these (doing so seems to have the effect of pressuring additional ecosystem adoption) |
|
Reflecting on the discussion about incomplete statistic support. I checked a few implementation and while writing statistics for geometries seems to be there in general, I haven't found a single implementation of geography with any edge interpolation algorithm. The rust implementation seems to handle the stats for points (where edge interpolation is not needed) and allows the user to inject its own implementation.
I agree in case of geometry, but I think that it would make things clearer to mention that for geography this is incomplete, at least in common open source libraries. The blog post mentions "Spatial statistics" as core feature and generally mentions geometry and geography side by side, so the reader may assume that statistics support is widely available for both logical types. This also effect the approach to choosing the best type to use - if bounding boxes are not yet available for geography and per file skipping is critical, then the user should try to build their workload on geometry. I don't know the status of statistics implementation of geography, but I haven't seen PRs about this, so my assumption is that it may take a significant time to have at least spherical interpolation available widely in Parquet libraries (or extension libraries). I would be happy to be proven wrong :) Btw the blog was a great read! |
|
@csringhofer I think @alamb's suggestion about updating the implementation status page might be a tactic, where
(this'd be so much easier if the flat-earthers were right, though then GPS wouldn't work so measuring locations would be a PITA) |
|
Thank you @csringhofer and @steveloughran -- I tried to capture the suggestions on how to improve the status page in a ticket:
I would personally think this would make the page more confusing as
I think a separate blog describing the current state of implementation as of a certain date would be quite valuable for others evaluating potential solutions for their projects |
|
Unless there are any objections, I'll plan to update the date and merge this PR (and publish the blog) tomorrow. |
I think it's accurate to say that writing statistics for non-point Geography columns has not been implemented yet; however, I don't think that is inconsistent with the message we are collectively trying to put out with this post (an overview of spatial types in Parquet and celebration of the significant progress we were able to make over the last year). |
|
There seems to be some issue with the parquet site's style sheets / jquery stuff: #159 I'll try and find some time to look at this over the next day or two |
Thanks, having a more detailed status page would be a great help for people who try to get an overview / looking for a reference implementation.
I see the point - probably there are many things where the implementations could be improved besides geography statistics, and it is not in the scope of the article to go into these. |
julienledem
left a comment
There was a problem hiding this comment.
Thank you for all your contributions, this is going to be a great post!
|
The https://parquet.apache.org/ site is still kind of broken (at least for me), see I think we should fix the site before we publish this post and draw more people's attention there I have a proposed fix here that I would appreciate if someone could help review |
| 3. **Engine interoperability** | ||
| Because the spatial meaning is encoded as a Parquet logical type, engines do not need out of band conventions to interpret the column. A reader that understands Parquet geospatial types can immediately treat the column as a spatial object. | ||
| 4. **Coordinate Reference System (CRS) information** | ||
| CRS information is stored at the file metadata (i.e., type definition) using authoritative identifiers or structured definitions such as EPSG codes or PROJJSON strings. |
There was a problem hiding this comment.
Nit "CRS information is stored in the file metadata"
|
Thanks to @vinooganesh and @emkornfield I think we are good to publish this blog now 🎉 I updated the date to Feb 13 and will merge this PR once the CI passes |
|
The blog is now live: https://parquet.apache.org/blog/2026/02/13/native-geospatial-types-in-apache-parquet/ 🎉 |
I took @alamb 's template and created this PR. I hope this is ok.
This idea of this blog post is inspired by this issue and the initial draft is in this google doc.
Looking forward to having this blog post on Parquet website!