feat(indexing): implemented a two-stage Kafka processing architecture with event aggregation by psmagin · Pull Request #890 · folio-org/mod-search

psmagin · 2026-01-31T15:53:23Z

Purpose

Address ordering and deduplication issues in instance indexing by implementing a two-stage Kafka processing architecture. This PR introduces a re-emission pattern where instance-level events are published to Kafka with instanceId as the key, ensuring all changes for the same instance are ordered within a single partition. The consumer then aggregates these events before querying the database, reducing database load and preventing race conditions.

Problem Statement

Previously, when multiple updates affected the same instance (e.g., holding changes, item updates), each update triggered an immediate database query and OpenSearch indexing operation. This caused:

Ordering issues: Updates could be processed out of order
Database load: Multiple redundant queries for the same instance
Race conditions: Concurrent updates could overwrite each other
Inefficient indexing: Multiple bulk requests for related changes

Solution Approach

Implemented a two-stage Kafka processing architecture with event aggregation:

Stage 1: Event Re-emission

handleInstanceEvents receives resource events (instance, holding, item, bound-with)
Extracts instanceId from each event
Publishes new IndexInstanceEvent to Kafka with instanceId as key
Result: All events for the same instance go to the same Kafka partition, maintaining order

Stage 2: Event Aggregation & Indexing

handleIndexInstanceEvents consumes batched events
Groups consecutive events by instanceId within each batch
Collapses multiple updates for same instance into single work item
Performs one SQL aggregation per distinct instanceId (instance + holding + item join)
Sends aggregated documents in single bulk indexing request to OpenSearch

Architecture Changes

New Classes

ProducerRecordBuilder
- Responsibility: Build Kafka producer records with tenant header management
- Benefit: Centralized header manipulation logic
InstanceEventMapper
- Responsibility: Map consumer records to producer records with consortium tenant resolution
- Benefit: Encapsulates event mapping and instanceId extraction logic

Refactored Classes

KafkaMessageListener
- Simplified handleInstanceEvents from 30+ lines to 3 lines
- Clear separation: re-emission vs. indexing logic
ResourceService
- Consolidated duplicate methods (indexInstancesById, indexInstancesByIdNew)
- Single indexInstanceEvents method for cleaner API
InstanceFetchService
- Removed duplicate fetching methods
- Shared fetchInstancesFromRepository

Changes Checklist

Related Issues

MSEARCH-1157

Technical Details

Event Flow

Inventory Update (holding/item/instance)
    ↓
handleInstanceEvents (Stage 1)
    ↓
Extract instanceId + Determine target tenant
    ↓
Publish IndexInstanceEvent to Kafka (key: instanceId)
    ↓
Kafka Partition (all events for same instanceId in order)
    ↓
handleIndexInstanceEvents (Stage 2)
    ↓
Group by instanceId + Aggregate consecutive events
    ↓
Fetch instance data (one SQL query per instanceId)
    ↓
Bulk index to OpenSearch (single request per batch)

Kafka Message Format

Topic: {tenant}.search.index.instance
Key: instanceId (ensures partitioning)
Value: IndexInstanceEvent(tenant, instanceId)
Headers: Tenant information, Okapi headers

sonarqubecloud · 2026-02-03T09:51:00Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
95.8% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

psmagin added 3 commits January 31, 2026 17:03

inital implementation

fcfd104

refactor

ddc8ad3

tests

89416e3

psmagin requested a review from a team as a code owner January 31, 2026 15:53

psmagin requested review from SvitlanaKovalova1 and viacheslavkol January 31, 2026 15:53

psmagin added 2 commits January 31, 2026 17:58

update NEWS.md

e40059b

update tests

526aed8

psmagin marked this pull request as draft January 31, 2026 16:57

psmagin added 3 commits January 31, 2026 20:43

add logging

68f1224

fix context

ada97b2

add opensearch logging

850c186

psmagin changed the title ~~MSEARCH-1157: Refactor instance indexing to follow SOLID and DRY principles~~ feat(indexing): implemented a two-stage Kafka processing architecture with event aggregation Feb 2, 2026

psmagin marked this pull request as ready for review February 2, 2026 18:15

psmagin requested a review from vgema February 2, 2026 18:16

SvitlanaKovalova1 approved these changes Feb 2, 2026

View reviewed changes

psmagin force-pushed the MSEARCH-1157 branch from 0f5fe59 to 58910f0 Compare February 3, 2026 08:08

fix data move events

ce58a6a

psmagin force-pushed the MSEARCH-1157 branch from 58910f0 to ce58a6a Compare February 3, 2026 08:18

refactor

e3c96f4

psmagin self-assigned this Feb 3, 2026

fix sonar

0f96c78

vgema approved these changes Feb 3, 2026

View reviewed changes

psmagin merged commit b3322f3 into master Feb 3, 2026
16 checks passed

psmagin deleted the MSEARCH-1157 branch February 3, 2026 11:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(indexing): implemented a two-stage Kafka processing architecture with event aggregation#890

feat(indexing): implemented a two-stage Kafka processing architecture with event aggregation#890
psmagin merged 11 commits intomasterfrom
MSEARCH-1157

psmagin commented Jan 31, 2026 •

edited

Loading

Uh oh!

sonarqubecloud bot commented Feb 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

psmagin commented Jan 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Problem Statement

Solution Approach

Stage 1: Event Re-emission

Stage 2: Event Aggregation & Indexing

Architecture Changes

New Classes

Refactored Classes

Changes Checklist

Related Issues

Technical Details

Event Flow

Kafka Message Format

Uh oh!

sonarqubecloud bot commented Feb 3, 2026

Quality Gate passed

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

psmagin commented Jan 31, 2026 •

edited

Loading