HDDS-14084. Log details of ineligible nodes in SCMCommonPlacementPolicy#filterNodesWithSpace #9439

jasonosullivan34 · 2025-12-05T12:21:15Z

What changes were proposed in this pull request?

HDDS-14084. updating metadata / data size check logs pipeline node selection to info

Please describe your PR in detail:
Updating log level for metadata and data size checks to info. These checks are done as part of the node selection for adding nodes to a pipeline

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-14084

How was this patch tested?

workflow run on forked git repo

…cks during pipeline node selection

sodonnel

I think this change makes sense. Once concern is that the logs are emitted too frequently, but for that to happen, there must be a lot of nodes on the cluster that are out of space on all disks. That points to a wider cluster issue that needs to be resolved via balancing, adding capacity or removing data. With the logs at debug level, the problem is hidden unless someone thinks to enable debug on this class, and system wide debug is very noisy!

adoroszlai · 2025-12-05T18:50:42Z

Once concern is that the logs are emitted too frequently

This method is called for several reasons, one of them is collecting metrics, which happens every 30 seconds according to #9418 (though I guess it depends on Prometheus and other settings).

My other concern is that this provides little information, only datanode ID/address and requested space. Please see AvailableSpaceFilter for a better approach: upon checking each volume, it keeps track of ones that are full, which are then printed by toString().

hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/SCMCommonPlacementPolicy.java

… and refactored SCMCommonPlacementPolicy.hasEnoughSpace to delegate to new checkSpace method

adoroszlai

Thanks @jasonosullivan34 for updating the patch.

adoroszlai · 2025-12-13T08:37:16Z

...p-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/pipeline/PipelineManagerImpl.java

      if (!(node instanceof DatanodeInfo)) {
        node = nodeManager.getDatanodeInfo(node);
      }
+


nit: avoid whitespace-only change

Suggested change

adoroszlai · 2025-12-13T08:37:31Z

...op-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/TestSCMDatanodeCapacityInfo.java

@@ -0,0 +1,90 @@
+package org.apache.hadoop.hdds.scm;


License header is needed in all files. Example:

ozone/hadoop-hdds/config/src/main/java/org/apache/hadoop/hdds/conf/ConfigFileAppender.java

Lines 1 to 17 in 131eec0

/*

* Licensed to the Apache Software Foundation (ASF) under one

* or more contributor license agreements. See the NOTICE file

* distributed with this work for additional information

* regarding copyright ownership. The ASF licenses this file

* to you under the Apache License, Version 2.0 (the

* "License"); you may not use this file except in compliance

* with the License. You may obtain a copy of the License at

* <p>

* http://www.apache.org/licenses/LICENSE-2.0

* <p>

* Unless required by applicable law or agreed to in writing, software

* distributed under the License is distributed on an "AS IS" BASIS,

* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

* See the License for the specific language governing permissions and

* limitations under the License.

*/

adoroszlai · 2025-12-13T18:00:59Z

hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/node/SCMNodeManager.java

-        .filter(dn -> !hasEnoughSpace(dn, minRatisVolumeSizeBytes, containerSize, conf)
-            && !hasEnoughCommittedVolumeSpace(dn, blockSize))
-        .count();
+        .filter((dn) -> {
+          SCMDatanodeCapacityInfo info = checkSpace(dn, minRatisVolumeSizeBytes, containerSize, conf);
+          return !info.hasEnoughSpace() && !hasEnoughCommittedVolumeSpace(dn, blockSize);
+        }).count();


Let's keep the original code in this file since it doesn't need the SCMDatanodeCapacityInfo.

adoroszlai · 2025-12-15T15:14:17Z

Thanks @jasonosullivan34 for updating the patch. Can you please check test timeout in org.apache.hadoop.ozone.container.placement.TestContainerPlacement?

github-actions · 2026-01-06T00:05:54Z

This PR has been marked as stale due to 21 days of inactivity. Please comment or remove the stale label to keep it open. Otherwise, it will be automatically closed in 7 days.

siddhantsangwan

Thanks @jasonosullivan34 for taking this up.

Do we really want to include volume level information here? Of course more information is better, but even with 20 full datanodes and 20 volumes per datanode, that'll be a lot of logging. Excessive logging costs write performance as well.

We need to think of a different way to surface the fact that many datanodes are full with volume level information. @priyeshkaratha is the capacity distribution project that you're working on doing something in this area?

I think this change is helpful but we should restrict it to Datanode stats (capacity, used, available, reserved, committed). Curious to hear what others think.

siddhantsangwan · 2026-01-06T06:38:01Z

hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/SCMDatanodeCapacityInfo.java

+  public String getInsufficientSpaceMessage() {
+    if (hasEnoughSpace()) {
+      return String.format("Datanode %s has sufficient space (data: %d bytes required, metadata: %d bytes required)",
+          datanodeDetails.getUuidString(), dataVolumeInfo.requiredSpace, metaVolumeInfo.requiredSpace);


Use Datanode ip address or hostname instead of UUID. Makes debugging easier.

siddhantsangwan · 2026-01-06T06:38:32Z

hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/SCMDatanodeCapacityInfo.java

+  @Override
+  public String toString() {
+    return "SCMDatanodeCapacityInfo{" +
+        "datanode=" + datanodeDetails.getUuidString() +


Same, use ip address or hostname.

siddhantsangwan · 2026-01-06T06:39:18Z

hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/SCMDatanodeCapacityInfo.java

+  }
+
+  public void addFullDataVolume(StorageReportProto report, long usableSpace) {
+    this.dataVolumeInfo.addFullVolume(new FullVolume(report.getStorageUuid(), usableSpace));


Better to use volume path instead of storage uuid.

siddhantsangwan · 2026-01-06T07:07:41Z

hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/SCMDatanodeCapacityInfo.java

+    if (!hasEnoughDataSpace()) {
+      return String.format("Datanode %s has no volumes with enough space to allocate %d bytes for data." +
+              " data=%s, metadata=%s", datanodeDetails.getUuidString(), dataVolumeInfo.requiredSpace, dataVolumeInfo,
+          metaVolumeInfo);
+    } else {
+      return String.format("Datanode %s has no volumes with enough space to allocate %d bytes for metadata." +
+              " data=%s, metadata=%s", datanodeDetails.getUuidString(), metaVolumeInfo.requiredSpace, dataVolumeInfo,
+          metaVolumeInfo);
+    }


Like I said above, perhaps it's better to only include datanode level total space, used, available, reserved, committed etc. and exclude volumes.

siddhantsangwan · 2026-01-06T07:16:14Z

I think this change makes sense. Once concern is that the logs are emitted too frequently, but for that to happen, there must be a lot of nodes on the cluster that are out of space on all disks. That points to a wider cluster issue that needs to be resolved via balancing, adding capacity or removing data. With the logs at debug level, the problem is hidden unless someone thinks to enable debug on this class, and system wide debug is very noisy!

Good point - but do you think volume stats should be logged here as well or total DN capacity, used space etc. is good enough? @sodonnel

siddhantsangwan · 2026-01-06T07:20:37Z

My other concern is that this provides little information, only datanode ID/address and requested space. Please see AvailableSpaceFilter for a better approach: upon checking each volume, it keeps track of ones that are full, which are then printed by toString().

@adoroszlai yes the AvailableSpaceFilter approach is pretty good, I'm just wondering if the logs will be excessive if the volumes are printed as well. Let's see if @priyeshkaratha is trying to make this observable in some other way.

siddhantsangwan · 2026-01-06T07:23:49Z

@jasonosullivan34 let's discuss here and get consensus on the approach so you will not have to make unnecessary changes.

adoroszlai · 2026-01-06T07:41:59Z

only datanode ID/address and requested space

wondering if the logs will be excessive if the volumes are printed as well.

I guess it doesn't need to print each volume (or even full volumes) in SCM. Datanode-level summary may be a good middle ground.

HDDS-14084 - setting log level to info for metadata and data size che…

f6b741f

…cks during pipeline node selection

jasonosullivan34 changed the title ~~HDDS-14084. updating metadata / data size check logs pipeline node selection to info~~ HDDS-14084. updating metadata / data size check logs for pipeline node selection to info Dec 5, 2025

sodonnel approved these changes Dec 5, 2025

View reviewed changes

Merge remote-tracking branch 'upstream/master' into HDDS-14084

d856d7e

jasonosullivan34 marked this pull request as draft December 8, 2025 16:28

jasonosullivan34 added 2 commits December 8, 2025 16:44

HDDS-14084 Adding SCMDatanodeSpaceCheckResult class

91e0177

HDDS-14084 Updating references to hasEnoughSpace

7b2de6e

adoroszlai reviewed Dec 9, 2025

View reviewed changes

hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/SCMCommonPlacementPolicy.java Outdated Show resolved Hide resolved

jasonosullivan34 added 4 commits December 10, 2025 16:02

HDDS-14084. Addressing Feedback. Adding SCMDatanodeCapacityInfo class…

5f4eb23

… and refactored SCMCommonPlacementPolicy.hasEnoughSpace to delegate to new checkSpace method

Merge remote-tracking branch 'upstream/master' into HDDS-14084

0a47380

HDDS-14084. Addressing CI failures

5c433d2

HDDS-14084 Adding tests

b8606d1

adoroszlai reviewed Dec 13, 2025

View reviewed changes

adoroszlai changed the title ~~HDDS-14084. updating metadata / data size check logs for pipeline node selection to info~~ HDDS-14084. Log details of ineligible nodes in SCMCommonPlacementPolicy#filterNodesWithSpace Dec 14, 2025

jasonosullivan34 added 3 commits December 15, 2025 09:46

HDDS-14084 Addressing Feedback

de4d098

HDDS-14084 Addressing checkstyle changes

6a3e1a0

HDDS-14084 Addressing checkstyle changes

0cb331f

HDDS-14084 - reduce logging performance hit

7ce0fc6

github-actions bot added the stale label Jan 6, 2026

siddhantsangwan reviewed Jan 6, 2026

View reviewed changes

github-actions bot removed the stale label Jan 7, 2026

	/*
	* Licensed to the Apache Software Foundation (ASF) under one
	* or more contributor license agreements. See the NOTICE file
	* distributed with this work for additional information
	* regarding copyright ownership. The ASF licenses this file
	* to you under the Apache License, Version 2.0 (the
	* "License"); you may not use this file except in compliance
	* with the License. You may obtain a copy of the License at
	* <p>
	* http://www.apache.org/licenses/LICENSE-2.0
	* <p>
	* Unless required by applicable law or agreed to in writing, software
	* distributed under the License is distributed on an "AS IS" BASIS,
	* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
	* See the License for the specific language governing permissions and
	* limitations under the License.
	*/

HDDS-14084. Log details of ineligible nodes in SCMCommonPlacementPolicy#filterNodesWithSpace #9439

Are you sure you want to change the base?

HDDS-14084. Log details of ineligible nodes in SCMCommonPlacementPolicy#filterNodesWithSpace #9439

Conversation

jasonosullivan34 commented Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

What is the link to the Apache JIRA

How was this patch tested?

Uh oh!

sodonnel left a comment

Choose a reason for hiding this comment

Uh oh!

adoroszlai commented Dec 5, 2025

Uh oh!

Uh oh!

adoroszlai left a comment

Choose a reason for hiding this comment

Uh oh!

adoroszlai Dec 13, 2025

Choose a reason for hiding this comment

Uh oh!

adoroszlai Dec 13, 2025

Choose a reason for hiding this comment

Uh oh!

adoroszlai Dec 13, 2025

Choose a reason for hiding this comment

Uh oh!

adoroszlai commented Dec 15, 2025

Uh oh!

github-actions bot commented Jan 6, 2026

Uh oh!

siddhantsangwan left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

siddhantsangwan Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

siddhantsangwan Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

siddhantsangwan Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

siddhantsangwan Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

siddhantsangwan commented Jan 6, 2026

Uh oh!

siddhantsangwan commented Jan 6, 2026

Uh oh!

siddhantsangwan commented Jan 6, 2026

Uh oh!

adoroszlai commented Jan 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jasonosullivan34 commented Dec 5, 2025 •

edited

Loading

siddhantsangwan left a comment •

edited

Loading