Skip to content

Commit a7cdd15

Browse files
authored
Delete catalog fixes (#557)
**Related Issue(s):** - None **Description:** - Implemented "Smart Unlink" logic in delete_catalog: when cascade=False (default), collections are unlinked from the catalog and become root-level orphans if they have no other parents, rather than being deleted. When cascade=True, collections are deleted entirely. This prevents accidental data loss and supports poly-hierarchy scenarios where collections belong to multiple catalogs. - Fixed delete_catalog to use reverse lookup query on parent_ids field instead of fragile link parsing. This ensures all collections are found and updated correctly, preventing ghost relationships where collections remain tagged with deleted catalogs, especially in large catalogs or pagination scenarios. **PR Checklist:** - [x] Code is formatted and linted (run `pre-commit run --all-files`) - [x] Tests pass (run `make test`) - [x] Documentation has been updated to reflect changes, if applicable - [x] Changes are added to the changelog
1 parent 352890e commit a7cdd15

File tree

4 files changed

+167
-77
lines changed

4 files changed

+167
-77
lines changed

CHANGELOG.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,8 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.
1919

2020
- Fix unawaited coroutine in `stac_fastapi.core.core`. [#551](https://github.com/stac-utils/stac-fastapi-elasticsearch-opensearch/pull/551)
2121
- Parse `ES_TIMEOUT` environment variable as an integer. [#556](https://github.com/stac-utils/stac-fastapi-elasticsearch-opensearch/pull/556)
22+
- Implemented "Smart Unlink" logic in delete_catalog: when cascade=False (default), collections are unlinked from the catalog and become root-level orphans if they have no other parents, rather than being deleted. When cascade=True, collections are deleted entirely. This prevents accidental data loss and supports poly-hierarchy scenarios where collections belong to multiple catalogs. [#557](https://github.com/stac-utils/stac-fastapi-elasticsearch-opensearch/pull/557)
23+
- Fixed delete_catalog to use reverse lookup query on parent_ids field instead of fragile link parsing. This ensures all collections are found and updated correctly, preventing ghost relationships where collections remain tagged with deleted catalogs, especially in large catalogs or pagination scenarios. [#557](https://github.com/stac-utils/stac-fastapi-elasticsearch-opensearch/pull/557)
2224

2325
### Removed
2426

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ The following organizations have contributed time and/or funding to support the
2828

2929
## Latest News
3030

31-
- **12/09/2025:** **Feature Merge: Federated Catalogs.** The [`Catalogs Endpoint`](https://github.com/Healy-Hyperspatial/stac-api-extensions-catalogs-endpoint) extension is now in main! This enables a registry of catalogs and supports **poly-hierarchy** (collections belonging to multiple catalogs simultaneously). Enable it via `ENABLE_CATALOGS_EXTENSION`. _Coming next: Support for nested sub-catalogs._
31+
- **12/09/2025:** Feature Merge: **Federated Catalogs**. The [`Catalogs Endpoint`](https://github.com/Healy-Hyperspatial/stac-api-extensions-catalogs-endpoint) extension is now in main! This enables a registry of catalogs and supports **poly-hierarchy** (collections belonging to multiple catalogs simultaneously). Enable it via `ENABLE_CATALOGS_EXTENSION`. _Coming next: Support for nested sub-catalogs._
3232
- **11/07/2025:** 🌍 The SFEOS STAC Viewer is now available at: https://healy-hyperspatial.github.io/sfeos-web. Use this site to examine your data and test your STAC API!
3333
- **10/24/2025:** Added `previous_token` pagination using Redis for efficient navigation. This feature allows users to navigate backwards through large result sets by storing pagination state in Redis. To use this feature, ensure Redis is configured (see [Redis for navigation](#redis-for-navigation)) and set `REDIS_ENABLE=true` in your environment.
3434
- **10/23/2025:** The `EXCLUDED_FROM_QUERYABLES` environment variable was added to exclude fields from the `queryables` endpoint. See [docs](#excluding-fields-from-queryables).

stac_fastapi/core/stac_fastapi/core/extensions/catalogs.py

Lines changed: 54 additions & 76 deletions
Original file line numberDiff line numberDiff line change
@@ -314,71 +314,42 @@ async def delete_catalog(
314314
HTTPException: If the catalog is not found.
315315
"""
316316
try:
317-
# Get the catalog to verify it exists and get its collections
318-
db_catalog = await self.client.database.find_catalog(catalog_id)
319-
catalog = self.client.catalog_serializer.db_to_stac(db_catalog, request)
317+
# Get the catalog to verify it exists
318+
await self.client.database.find_catalog(catalog_id)
320319

321-
# Extract collection IDs from catalog links
322-
collection_ids = []
323-
if hasattr(catalog, "links") and catalog.links:
324-
for link in catalog.links:
325-
rel = (
326-
link.get("rel")
327-
if hasattr(link, "get")
328-
else getattr(link, "rel", None)
329-
)
330-
if rel == "child":
331-
href = (
332-
link.get("href", "")
333-
if hasattr(link, "get")
334-
else getattr(link, "href", "")
335-
)
336-
if href and "/collections/" in href:
337-
# Extract collection ID from href
338-
collection_id = href.split("/collections/")[-1].split("/")[
339-
0
340-
]
341-
if collection_id:
342-
collection_ids.append(collection_id)
343-
344-
if cascade:
345-
# Delete each collection
346-
for coll_id in collection_ids:
347-
try:
348-
await self.client.database.delete_collection(coll_id)
320+
# Use reverse lookup query to find all collections with this catalog in parent_ids.
321+
# This is more reliable than parsing links, as it captures all collections
322+
# regardless of pagination or link truncation.
323+
query_body = {"query": {"term": {"parent_ids": catalog_id}}}
324+
search_result = await self.client.database.client.search(
325+
index=COLLECTIONS_INDEX, body=query_body, size=10000
326+
)
327+
children = [hit["_source"] for hit in search_result["hits"]["hits"]]
328+
329+
# Process each child collection
330+
for child in children:
331+
child_id = child.get("id")
332+
try:
333+
if cascade:
334+
# DANGER ZONE: User explicitly requested cascade delete.
335+
# Delete the collection entirely, regardless of other parents.
336+
await self.client.database.delete_collection(child_id)
349337
logger.info(
350-
f"Deleted collection {coll_id} as part of cascade delete for catalog {catalog_id}"
351-
)
352-
except Exception as e:
353-
error_msg = str(e)
354-
if "not found" in error_msg.lower():
355-
logger.debug(
356-
f"Collection {coll_id} not found, skipping (may have been deleted elsewhere)"
357-
)
358-
else:
359-
logger.warning(
360-
f"Failed to delete collection {coll_id}: {e}"
361-
)
362-
else:
363-
# Remove catalog from each collection's parent_ids and links (orphan them)
364-
for coll_id in collection_ids:
365-
try:
366-
# Get the collection from database to access parent_ids
367-
collection_db = await self.client.database.find_collection(
368-
coll_id
338+
f"Deleted collection {child_id} as part of cascade delete for catalog {catalog_id}"
369339
)
370-
371-
# Remove catalog_id from parent_ids
372-
parent_ids = collection_db.get("parent_ids", [])
340+
else:
341+
# SAFE ZONE: Smart Unlink - Remove only this catalog from parent_ids.
342+
# The collection survives and becomes a root-level collection if it has no other parents.
343+
parent_ids = child.get("parent_ids", [])
373344
if catalog_id in parent_ids:
374345
parent_ids.remove(catalog_id)
375-
collection_db["parent_ids"] = parent_ids
346+
child["parent_ids"] = parent_ids
376347

377348
# Also remove the catalog link from the collection's links
378-
if "links" in collection_db:
379-
collection_db["links"] = [
349+
if "links" in child:
350+
child["links"] = [
380351
link
381-
for link in collection_db.get("links", [])
352+
for link in child.get("links", [])
382353
if not (
383354
link.get("rel") == "catalog"
384355
and catalog_id in link.get("href", "")
@@ -387,30 +358,37 @@ async def delete_catalog(
387358

388359
# Update the collection in the database
389360
await self.client.database.update_collection(
390-
collection_id=coll_id,
391-
collection=collection_db,
392-
refresh=True,
393-
)
394-
logger.info(
395-
f"Removed catalog {catalog_id} from collection {coll_id} parent_ids and links"
361+
collection_id=child_id,
362+
collection=child,
363+
refresh=False,
396364
)
365+
366+
# Log the result
367+
if len(parent_ids) == 0:
368+
logger.info(
369+
f"Collection {child_id} is now a root-level orphan (no parent catalogs)"
370+
)
371+
else:
372+
logger.info(
373+
f"Removed catalog {catalog_id} from collection {child_id}; still belongs to {len(parent_ids)} other catalog(s)"
374+
)
397375
else:
398376
logger.debug(
399-
f"Catalog {catalog_id} not in parent_ids for collection {coll_id}"
400-
)
401-
except Exception as e:
402-
error_msg = str(e)
403-
if "not found" in error_msg.lower():
404-
logger.debug(
405-
f"Collection {coll_id} not found, skipping (may have been deleted elsewhere)"
406-
)
407-
else:
408-
logger.warning(
409-
f"Failed to remove catalog {catalog_id} from collection {coll_id}: {e}"
377+
f"Catalog {catalog_id} not in parent_ids for collection {child_id}"
410378
)
379+
except Exception as e:
380+
error_msg = str(e)
381+
if "not found" in error_msg.lower():
382+
logger.debug(
383+
f"Collection {child_id} not found, skipping (may have been deleted elsewhere)"
384+
)
385+
else:
386+
logger.warning(
387+
f"Failed to process collection {child_id} during catalog deletion: {e}"
388+
)
411389

412-
# Delete the catalog
413-
await self.client.database.delete_catalog(catalog_id)
390+
# Delete the catalog itself
391+
await self.client.database.delete_catalog(catalog_id, refresh=True)
414392
logger.info(f"Deleted catalog {catalog_id}")
415393

416394
except Exception as e:

stac_fastapi/tests/extensions/test_catalogs.py

Lines changed: 110 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -992,6 +992,116 @@ async def test_catalog_links_contain_all_collections(
992992
), f"Collection {collection_id} missing from catalog links. Found links: {child_hrefs}"
993993

994994

995+
@pytest.mark.asyncio
996+
async def test_delete_catalog_no_cascade_orphans_collections(
997+
catalogs_app_client, load_test_data
998+
):
999+
"""Test that deleting a catalog without cascade makes collections root-level orphans."""
1000+
# Create a catalog
1001+
test_catalog = load_test_data("test_catalog.json")
1002+
catalog_id = f"test-catalog-{uuid.uuid4()}"
1003+
test_catalog["id"] = catalog_id
1004+
test_catalog["links"] = [
1005+
link for link in test_catalog.get("links", []) if link.get("rel") != "child"
1006+
]
1007+
1008+
catalog_resp = await catalogs_app_client.post("/catalogs", json=test_catalog)
1009+
assert catalog_resp.status_code == 201
1010+
1011+
# Create a collection in the catalog
1012+
test_collection = load_test_data("test_collection.json")
1013+
collection_id = f"test-collection-{uuid.uuid4()}"
1014+
test_collection["id"] = collection_id
1015+
1016+
create_resp = await catalogs_app_client.post(
1017+
f"/catalogs/{catalog_id}/collections", json=test_collection
1018+
)
1019+
assert create_resp.status_code == 201
1020+
1021+
# Delete the catalog without cascade (default behavior)
1022+
delete_resp = await catalogs_app_client.delete(f"/catalogs/{catalog_id}")
1023+
assert delete_resp.status_code == 204
1024+
1025+
# Verify the collection still exists
1026+
get_resp = await catalogs_app_client.get(f"/collections/{collection_id}")
1027+
assert get_resp.status_code == 200
1028+
1029+
# Verify the collection is now a root-level orphan (accessible from /collections)
1030+
collections_resp = await catalogs_app_client.get("/collections")
1031+
assert collections_resp.status_code == 200
1032+
collections_data = collections_resp.json()
1033+
collection_ids = [col["id"] for col in collections_data.get("collections", [])]
1034+
assert (
1035+
collection_id in collection_ids
1036+
), "Orphaned collection should appear in root /collections endpoint"
1037+
1038+
# Verify the collection no longer has a catalog link to the deleted catalog
1039+
collection_data = get_resp.json()
1040+
collection_links = collection_data.get("links", [])
1041+
catalog_link = None
1042+
for link in collection_links:
1043+
if link.get("rel") == "catalog" and catalog_id in link.get("href", ""):
1044+
catalog_link = link
1045+
break
1046+
1047+
assert (
1048+
catalog_link is None
1049+
), "Orphaned collection should not have link to deleted catalog"
1050+
1051+
1052+
@pytest.mark.asyncio
1053+
async def test_delete_catalog_no_cascade_multi_parent_collection(
1054+
catalogs_app_client, load_test_data
1055+
):
1056+
"""Test that deleting a catalog without cascade preserves collections with other parents."""
1057+
# Create two catalogs
1058+
catalog_ids = []
1059+
for i in range(2):
1060+
test_catalog = load_test_data("test_catalog.json")
1061+
catalog_id = f"test-catalog-{uuid.uuid4()}-{i}"
1062+
test_catalog["id"] = catalog_id
1063+
1064+
catalog_resp = await catalogs_app_client.post("/catalogs", json=test_catalog)
1065+
assert catalog_resp.status_code == 201
1066+
catalog_ids.append(catalog_id)
1067+
1068+
# Create a collection in the first catalog
1069+
test_collection = load_test_data("test_collection.json")
1070+
collection_id = f"test-collection-{uuid.uuid4()}"
1071+
test_collection["id"] = collection_id
1072+
1073+
create_resp = await catalogs_app_client.post(
1074+
f"/catalogs/{catalog_ids[0]}/collections", json=test_collection
1075+
)
1076+
assert create_resp.status_code == 201
1077+
1078+
# Add the collection to the second catalog
1079+
add_resp = await catalogs_app_client.post(
1080+
f"/catalogs/{catalog_ids[1]}/collections", json=test_collection
1081+
)
1082+
assert add_resp.status_code == 201
1083+
1084+
# Delete the first catalog without cascade
1085+
delete_resp = await catalogs_app_client.delete(f"/catalogs/{catalog_ids[0]}")
1086+
assert delete_resp.status_code == 204
1087+
1088+
# Verify the collection still exists
1089+
get_resp = await catalogs_app_client.get(f"/collections/{collection_id}")
1090+
assert get_resp.status_code == 200
1091+
1092+
# Verify the collection is still accessible from the second catalog
1093+
get_from_catalog_resp = await catalogs_app_client.get(
1094+
f"/catalogs/{catalog_ids[1]}/collections/{collection_id}"
1095+
)
1096+
assert get_from_catalog_resp.status_code == 200
1097+
1098+
# Verify the collection is NOT accessible from the deleted catalog
1099+
get_from_deleted_resp = await catalogs_app_client.get(
1100+
f"/catalogs/{catalog_ids[0]}/collections/{collection_id}"
1101+
)
1102+
assert get_from_deleted_resp.status_code == 404
1103+
1104+
9951105
@pytest.mark.asyncio
9961106
async def test_parent_ids_not_exposed_to_client(catalogs_app_client, load_test_data):
9971107
"""Test that parent_ids field is not exposed in API responses."""

0 commit comments

Comments
 (0)