Skip to content

Remove Neo4j ID set filtering from curation indexers#1563

Merged
oblodgett merged 1 commit intostagefrom
remove-neo4j-id-filters
Apr 10, 2026
Merged

Remove Neo4j ID set filtering from curation indexers#1563
oblodgett merged 1 commit intostagefrom
remove-neo4j-id-filters

Conversation

@oblodgett
Copy link
Copy Markdown
Member

Summary

  • Remove legacy Neo4j ID cross-reference filtering from all curation-sourced indexers
  • The curation API is now the authoritative data source; the Neo4j ID intersection was filtering out valid curation entities that hadn't been loaded into Neo4j
  • Existing obsolete/internal entity checks remain as the data quality filter
  • BaseService no longer has any Neo4j dependencies (removed AlleleRepository, GeneRepository, VariantRepository usage)

Indexers updated

  • GeneToGeneOrthologyIndexer — removed objectGene Neo4j filter
  • GeneMolecularInteractionService — removed interacting genes Neo4j filter
  • GeneGeneticInteractionService — removed genes + perturbating alleles Neo4j filter
  • GenePhenotypeAnnotationService — removed subject gene Neo4j filter
  • AllelePhenotypeAnnotationService — removed subject allele Neo4j filter
  • AGMPhenotypeAnnotationService — removed subject AGM Neo4j filter
  • SiteMapAccessionCurationIndexer — removed retainAll() intersection with Neo4j gene/allele ID sets

Test plan

  • Deploy to stage and run a full indexer pass
  • Verify indexed document counts are >= previous counts (more entities expected since Neo4j filter no longer drops valid curation entities)
  • Spot-check orthology, interaction, phenotype, and sitemap results in the stage UI

@oblodgett oblodgett requested a review from a team as a code owner April 10, 2026 09:41
The curation API is now the authoritative data source. The Neo4j ID
cross-reference filters were a legacy gate from when Neo4j was the
source of truth. Existing obsolete/internal checks remain as the
data quality filter.

Removed Neo4j filtering from:
- GeneToGeneOrthologyIndexer (objectGene filter)
- GeneMolecularInteractionService (interacting genes filter)
- GeneGeneticInteractionService (genes + alleles filter)
- GenePhenotypeAnnotationService (subject gene filter)
- AllelePhenotypeAnnotationService (subject allele filter)
- AGMPhenotypeAnnotationService (subject AGM filter)
- SiteMapAccessionCurationIndexer (retainAll intersection)
- BaseService (all Neo4j repository usage and ID set methods)
- BaseInteractionService (hasInteractingGenesInNeo method)
@oblodgett oblodgett force-pushed the remove-neo4j-id-filters branch from 0f14562 to fa60120 Compare April 10, 2026 09:42
@claude
Copy link
Copy Markdown

claude bot commented Apr 10, 2026

Code Review

The changes look correct. This is a clean removal of the Neo4j ID cross-reference filtering from all curation indexers, making the curation API the sole authority for entity validity.

Verified:

  • No remaining callers of the removed methods (getAllNeoGeneIDs, getAllNeoAlleleIDs, getAllNeoModelIDs, getAllNeoVariantIDs, isValidNeoEntity, hasInteractingGenesInNeo, hasPerturbatingAllelesInNeo) anywhere in the codebase.
  • BaseService still has subclasses (BaseInteractionService, BaseDiseaseAnnotationService) that use its remaining functionality (readFromCache, writeToCache, hasNoExcludedEntities), so it's not dead code.
  • The obsolete/internal entity checks are correctly preserved as the remaining data quality filter.
  • No new logic introduced — purely subtractive change, which limits the risk surface.

No issues found.

@oblodgett oblodgett merged commit c2a4e1d into stage Apr 10, 2026
5 checks passed
@oblodgett oblodgett deleted the remove-neo4j-id-filters branch April 10, 2026 11:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants