HIVE-29464: Rethink MapWork.aliasToPartnInfo - add getDistinctTableDescs() for callers that only need TableDesc objects by hemanthumashankar0511 · Pull Request #6344 · apache/hive

hemanthumashankar0511 · 2026-03-02T15:36:41Z

What changes were proposed in this pull request?

Added a new method getDistinctTableDescs() in MapWork that returns the unique TableDesc objects used by the map task, and updated configureJobConf to use it.

Before this change, the deduplication logic was sitting inside configureJobConf:

Set<String> processedTables = new HashSet<>();
for (PartitionDesc partition : aliasToPartnInfo.values()) {
    TableDesc tableDesc = partition.getTableDesc();
    if (tableDesc != null && !processedTables.contains(tableDesc.getTableName())) {
        processedTables.add(tableDesc.getTableName());
        PlanUtils.configureJobConf(tableDesc, job);
    }
}

After this change, that logic lives in getDistinctTableDescs() and configureJobConf just calls it cleanly:

for (TableDesc tableDesc : getDistinctTableDescs()) {
    PlanUtils.configureJobConf(tableDesc, job);
}

Why are the changes needed?

Callers like KafkaDagCredentialSupplier that only care about tables are currently forced to loop through all partitions in aliasToPartnInfo just to get the TableDesc objects. A table can have thousands of partitions but only one TableDesc, so everyone ends up writing the same boilerplate deduplication loop.

This method gives callers a clean way to get unique tables directly from MapWork without reinventing the wheel every time.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

I tested this locally by attaching a debugger to the test run and checking two scenarios:

Self-join — I wanted to make sure deduplication wouldn't accidentally skip anything:

SELECT * FROM test t1 JOIN test t2 USING(a);

Confirmed that both aliases point to the exact same TableDesc instance in memory, so the table only gets configured once as expected.

Cross-database join — I wanted to make sure tables with the same name from different databases don't collide:

SELECT * FROM db1.test_cross t1 JOIN db2.test_cross t2 USING(a);

Confirmed that getTableName() returns fully qualified names like db1.test_cross and db2.test_cross as distinct strings, so both tables get configured correctly.

…unique TableDesc objects without iterating partitions

abstractdog · 2026-03-03T08:40:54Z

nice work so far, thanks @hemanthumashankar0511 for taking care of this!
I believe one of the main concerns of this ticket was the getAliasToPartnInfo() method, which simply returns a mutable collection, leaving us totally unsure where this collection is actually touched, so an ideal solutio removes this method altogether or at least attempts to limit the usage of it, can you please take care of that also?

…w aliasToPartnInfo exposure

abstractdog · 2026-03-04T10:25:01Z

ql/src/java/org/apache/hadoop/hive/ql/plan/MapWork.java

+    aliasToPartnInfo.put(alias, partitionDesc);
+  }
+
+  public void putAllPartitionDescs(Map<String, PartitionDesc> partitionDescs) {


fortunately, we don't need this method, not used at all

sonarqubecloud · 2026-03-04T10:39:19Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.3% Duplication on New Code

See analysis details on SonarQube Cloud

abstractdog · 2026-03-04T11:07:19Z

ql/src/java/org/apache/hadoop/hive/ql/plan/MapWork.java

-   */
-  public Map<String, PartitionDesc> getAliasToPartnInfo() {
-    return aliasToPartnInfo;
+  public Collection<PartitionDesc> getPartitionDescs() {


this collection returned by this method is mainly used for iterating: is it possible to return an Iterator instead of copying the whole collection? unfortunately, copying it can be costly, and we could never now how heavily use that now or in the future?

abstractdog · 2026-03-04T11:09:27Z

ql/src/java/org/apache/hadoop/hive/ql/plan/MapWork.java

+      return;
+    }
+    if (aliasToPartnInfo == null) {
+      aliasToPartnInfo = new LinkedHashMap<>();


can we rely on the current instance, like:

private Map<String, PartitionDesc> aliasToPartnInfo = new LinkedHashMap<String, PartitionDesc>();

this ensures that we have an instance and don't need the extra null checks

abstractdog · 2026-03-04T11:09:55Z

ql/src/java/org/apache/hadoop/hive/ql/plan/MapWork.java

+  }
+
+  public void removeAlias(String alias) {
+    if (aliasToPartnInfo != null) {


maybe remove null-check

abstractdog · 2026-03-04T11:10:02Z

ql/src/java/org/apache/hadoop/hive/ql/plan/MapWork.java

+  }
+
+  public void putPartitionDesc(String alias, PartitionDesc partitionDesc) {
+    if (aliasToPartnInfo == null) {


maybe remove null-check

abstractdog · 2026-03-04T11:10:08Z

ql/src/java/org/apache/hadoop/hive/ql/plan/MapWork.java

+  }
+
+  public boolean hasPartitionDesc(String alias) {
+    return aliasToPartnInfo != null && aliasToPartnInfo.containsKey(alias);


maybe remove null-check

abstractdog · 2026-03-04T11:10:11Z

ql/src/java/org/apache/hadoop/hive/ql/plan/MapWork.java

+  }
+
+  public int getPartitionCount() {
+    return aliasToPartnInfo == null ? 0 : aliasToPartnInfo.size();


maybe remove null-check

abstractdog · 2026-03-04T11:10:16Z

ql/src/java/org/apache/hadoop/hive/ql/plan/MapWork.java

-      LinkedHashMap<String, PartitionDesc> aliasToPartnInfo) {
-    this.aliasToPartnInfo = aliasToPartnInfo;
+  public PartitionDesc getPartitionDesc(String alias) {
+    return aliasToPartnInfo == null ? null : aliasToPartnInfo.get(alias);


maybe remove null-check

HIVE-29464: Introduce getDistinctTableDescs() in MapWork to retrieve …

84625c5

…unique TableDesc objects without iterating partitions

asf-ci-hive added tests pending tests passed and removed tests pending labels Mar 2, 2026

hemanthumashankar0511 marked this pull request as ready for review March 3, 2026 05:19

asf-ci-hive added tests pending and removed tests passed labels Mar 4, 2026

hemanthumashankar0511 force-pushed the mapwork-get-tabledescs branch from 48b3e36 to 717d4b4 Compare March 4, 2026 07:09

asf-ci-hive added tests failed tests pending and removed tests pending tests failed labels Mar 4, 2026

hemanthumashankar0511 marked this pull request as draft March 4, 2026 09:02

asf-ci-hive added tests failed and removed tests pending labels Mar 4, 2026

Refactor MapWork partition access to purpose-built APIs and remove ra…

f97641b

…w aliasToPartnInfo exposure

hemanthumashankar0511 force-pushed the mapwork-get-tabledescs branch from 717d4b4 to f97641b Compare March 4, 2026 09:12

asf-ci-hive added tests pending and removed tests failed labels Mar 4, 2026

abstractdog reviewed Mar 4, 2026

View reviewed changes

asf-ci-hive removed the tests pending label Mar 4, 2026

asf-ci-hive added the tests passed label Mar 4, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HIVE-29464: Rethink MapWork.aliasToPartnInfo - add getDistinctTableDescs() for callers that only need TableDesc objects#6344

HIVE-29464: Rethink MapWork.aliasToPartnInfo - add getDistinctTableDescs() for callers that only need TableDesc objects#6344
hemanthumashankar0511 wants to merge 2 commits intoapache:masterfrom
hemanthumashankar0511:mapwork-get-tabledescs

hemanthumashankar0511 commented Mar 2, 2026

Uh oh!

abstractdog commented Mar 3, 2026 •

edited

Loading

Uh oh!

abstractdog Mar 4, 2026

Uh oh!

sonarqubecloud bot commented Mar 4, 2026

Uh oh!

abstractdog Mar 4, 2026 •

edited

Loading

Uh oh!

abstractdog Mar 4, 2026

Uh oh!

abstractdog Mar 4, 2026

Uh oh!

abstractdog Mar 4, 2026

Uh oh!

abstractdog Mar 4, 2026

Uh oh!

abstractdog Mar 4, 2026

Uh oh!

abstractdog Mar 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

hemanthumashankar0511 commented Mar 2, 2026

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

abstractdog commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

abstractdog Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

sonarqubecloud bot commented Mar 4, 2026

Quality Gate passed

Uh oh!

abstractdog Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

abstractdog Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

abstractdog Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

abstractdog Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

abstractdog Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

abstractdog Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

abstractdog Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

abstractdog commented Mar 3, 2026 •

edited

Loading

abstractdog Mar 4, 2026 •

edited

Loading