Skip to content

Spatial indexing robustness #418

@alhom

Description

@alhom

The dev branch has a feature that caches a spatial index of CellIDs per rank, for fast creation of the fileindex mapping. This enables fast [O(0.01)s] point queries when the index cache exists, even for large runs like FID.

Cache construction for a given file is driven by this function, and the cache is automatically found when subsequently creating the VlsvReader again for the file.

However, the rtree package (and the underlying libspatialindex) that handles the Rtree construction and file caching is a bit stupid in that that it will access and potentially modify the file cache - the index objects are not thread-safe, but this is a rather extensive take on not being thread-safe.

The cache files for each vlsv files are these:

self.__rtree_idxfile = os.path.join(self.get_cache_folder(),"rtree.idx")
. Python standard library has temporay file handlers, but since the Rtree library isn't really passing around file objects, I imagine the NamedTemporaryFile function would be needed for both. This is a bit risky (potential manual cleanup after crashes), but if this works fast enough it could be a stopgap measure for this feature.

More stable solutions would be preferred, for example adapting a pure C++ library to both Analysator and Vlasiator: https://github.com/nushoin/RTree

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions