PHOENIX-7524: Fix IndexOutOfBoundsException in queries with OFFSET when rows are exhausted #2307

TheNamesRai · 2025-10-22T16:31:07Z

PHOENIX-7524: Fix IndexOutOfBoundsException in queries with OFFSET when rows are exhausted

Design doc - link

When executing queries with OFFSET where the available rows are exhausted:

OFFSET exceeds the total number of rows in the table, OR
OFFSET exceeds rows returned by WHERE clause (even when WHERE matches some rows), OR
WHERE clause filters all rows, OR
Table/region is empty

The server-side scanner would encounter an empty tuple after exhausting all available rows and attempt to call getOffsetKvWithLastScannedRowKey(), which internally calls tuple.getKey(). This caused IndexOutOfBoundsException in MultiKeyValueTuple.getKey() when accessing an empty tuple.

Solution - Added a check in NonAggregateRegionScannerFactory.java to detect empty tuples before calling getOffsetKvWithLastScannedRowKey()

Modified Files:

NonAggregateRegionScannerFactory.java
- Added empty tuple check before accessing tuple.getKey()
- Implemented fallback logic to derive appropriate row key from scan boundaries

Test Files:

QueryWithOffsetIT.java
- Added 5 comprehensive integration tests covering various scenarios where OFFSET exceeds available rows
CDCQueryIT.java
- Added CDC-specific test for OFFSET exceeding available rows

…en rows are exhausted

virajjasani · 2025-10-23T04:58:19Z

Some test failures seem relevant to new test? https://ci-hadoop.apache.org/job/Phoenix/job/Phoenix-PreCommit-GitHub-PR/job/PR-2307/1/testReport/junit/org.apache.phoenix.end2end/CDCQueryIT/testCDCQueryWithOffsetExceedingRows_forView_false__encodingScheme_NON_ENCODED_QUALIFIERS__multitenant_true__tableSaltBuckets_2__withSchemaName_true__caseSensitiveNames_false_/

TheNamesRai · 2025-10-23T10:17:22Z

Some test failures seem relevant to new test? https://ci-hadoop.apache.org/job/Phoenix/job/Phoenix-PreCommit-GitHub-PR/job/PR-2307/1/testReport/junit/org.apache.phoenix.end2end/CDCQueryIT/testCDCQueryWithOffsetExceedingRows_forView_false__encodingScheme_NON_ENCODED_QUALIFIERS__multitenant_true__tableSaltBuckets_2__withSchemaName_true__caseSensitiveNames_false_/

@virajjasani , Updated the test in CDCQueryIT

TheNamesRai · 2025-10-28T09:28:47Z

phoenix-core-server/src/main/java/org/apache/phoenix/util/ServerUtil.java

+   * @param region The region being scanned
+   * @return A valid row key derived from scan or region boundaries
+   */
+  public static byte[] deriveRowKeyFromScanOrRegionBoundaries(Scan scan, Region region) {


Naming convention for this method is similar to getScanStartRowKeyFromScanOrRegionBoundaries method in the ServerUtil.java

Can we use getScanStartRowKeyFromScanOrRegionBoundaries() instead of this new method?

@virajjasani,
No, we cannot use getScanStartRowKeyFromScanOrRegionBoundaries() because the two methods serve different purposes -

getScanStartRowKeyFromScanOrRegionBoundaries() - Returns only the start row key of the scan

while we need getLargestPossibleRowKeyInRange() which analyzes the entire range. Which is implemented in deriveRowKeyFromScanOrRegionBoundaries(). It basically derives a representative row key within the scan range for the OFFSET queries.

palashc

LGTM +1, thank you!

virajjasani · 2025-11-26T16:42:16Z

phoenix-core-server/src/main/java/org/apache/phoenix/util/ServerUtil.java

+      } else if (scan.includeStopRow()) {
+        rowKey = endKey;


I wonder if this generates correct rowkey: if scan start rowkey is not inclusive, we need to find the shortest possible next rowkey?
I think we should have this logic elsewhere, @TheNamesRai could you please check once?

@virajjasani ,
The existing code uses the same pattern as i used - usecase1, usecase2, ..

But actually you are right, we can use the nextkey before jumping to the endkey here. Something like this should be better? What do you think?

if (rowKey == null) { if (scan.includeStartRow()) { rowKey = startKey; } else { byte[] nextStartKey = ByteUtil.nextKey(startKey); if (nextStartKey != null) { rowKey = nextStartKey; } else if (scan.includeStopRow()) { rowKey = endKey; } else { rowKey = HConstants.EMPTY_END_ROW; } } }

PHOENIX-7524: Fix IndexOutOfBoundsException in queries with OFFSET wh…

2a7ba60

…en rows are exhausted

virajjasani self-requested a review October 22, 2025 16:51

spotless checks

109f1d0

virajjasani requested a review from palashc October 23, 2025 04:53

update CDCQueryIT for multitenant table and with schema table

28de02e

Make a util method - deriveRowKeyFromScanOrRegionBoundaries

d2aef56

TheNamesRai commented Oct 28, 2025

View reviewed changes

palashc approved these changes Nov 4, 2025

View reviewed changes

virajjasani reviewed Nov 26, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

PHOENIX-7524: Fix IndexOutOfBoundsException in queries with OFFSET when rows are exhausted #2307

PHOENIX-7524: Fix IndexOutOfBoundsException in queries with OFFSET when rows are exhausted #2307

TheNamesRai commented Oct 22, 2025 •

edited

Loading

Uh oh!

virajjasani commented Oct 23, 2025

Uh oh!

TheNamesRai commented Oct 23, 2025 •

edited

Loading

Uh oh!

TheNamesRai Oct 28, 2025

Uh oh!

virajjasani Nov 26, 2025

Uh oh!

TheNamesRai Nov 28, 2025 •

edited

Loading

Uh oh!

palashc left a comment

Uh oh!

virajjasani Nov 26, 2025

Uh oh!

TheNamesRai Nov 28, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

PHOENIX-7524: Fix IndexOutOfBoundsException in queries with OFFSET when rows are exhausted #2307

Are you sure you want to change the base?

PHOENIX-7524: Fix IndexOutOfBoundsException in queries with OFFSET when rows are exhausted #2307

Conversation

TheNamesRai commented Oct 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

virajjasani commented Oct 23, 2025

Uh oh!

TheNamesRai commented Oct 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TheNamesRai Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

virajjasani Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

TheNamesRai Nov 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

palashc left a comment

Choose a reason for hiding this comment

Uh oh!

virajjasani Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

TheNamesRai Nov 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

TheNamesRai commented Oct 22, 2025 •

edited

Loading

TheNamesRai commented Oct 23, 2025 •

edited

Loading

TheNamesRai Nov 28, 2025 •

edited

Loading

TheNamesRai Nov 28, 2025 •

edited

Loading