[SPARK-55793][CORE] Add multiple log directories support to SHS by sarutak · Pull Request #54575 · apache/spark

sarutak · 2026-03-02T16:32:38Z

What changes were proposed in this pull request?

This PR proposes to add multiple log directories support to SHS, allowing it to monitor event logs from multiple directories simultaneously.

This PR extends spark.history.fs.logDirectory to accept a comma-separated list of directories (e.g., hdfs:///logs/prod,s3a://bucket/logs/staging). Directories can be on the same or different filesystems. Also, a new optional config spark.history.fs.logDirectory.names is added which allows users to assign display names to directories by position (e.g., Production,Staging). Empty entries fall back to the full path. Duplicate display names are rejected at startup.

Behavior of existing spark.history.fs.* settings with multiple directories:

All existing settings apply globally — there are no per-directory configurations.

Setting	Behavior
`update.interval`	One scan cycle covers all directories sequentially
`cleaner.interval`	One cleaner cycle operates on the unified listing across all directories
`cleaner.maxAge`	Applied to each log entry regardless of which directory it belongs to
`cleaner.maxNum`	Total count across all directories; oldest entries are removed first regardless of directory
`numReplayThreads`	Thread pool is shared across all directories
`numCompactThreads`	Thread pool is shared across all directories
`eventLog.rolling.maxFilesToRetain`	Applied per-directory independently
`update.batchSize`	Applied per-directory independently

Regarding UI changes, a "Log Source" column is added to the History UI table showing the display name (or full path) for each application, with a tooltip showing the full path.

Regarding UI changes, A "Log Source" column is added to the History UI table showing the display name (or full path) for each application, with a tooltip showing the full path.

Users can filter applications by their log directory using Filter by Log Source dropdown.

The Event log directory section in the History UI collapses into a <details>/<summary> element when multiple directories are configured.

Why are the changes needed?

Some organization run multiple clusters and have corresponding log directory for each cluster. So if SHS supports multiple log directories, it can be used as a single end point to view event logs, which helps such organizations.

Does this PR introduce any user-facing change?

Yes but will not affect existing users.

How was this patch tested?

Manually confirmed WebUI as screenshots above and added new tests.

Was this patch authored or co-authored using generative AI tooling?

Kiro CLI / Opus 4.6

dongjoon-hyun

Thank you, @sarutak .

I understand prod and staging use cases.

What happens when there exists a conflict among the log directories? For example, a user want to abuse this as a kind of multi-tier log managements like the following and copy from shorterm to longterm? Of course, the sync operation is non-atomic.

hdfs://spark-events/shorterm
hdfs://spark-events/longterm

What is the semantic on the ordering in the config value? Especially, when we have SPARK-52914 ?

#51604

dongjoon-hyun · 2026-03-02T21:35:45Z

Could you fix the CI failures?

[info] *** 24 TESTS FAILED ***
[error] Failed: Total 4431, Failed 24, Errors 0, Passed 4407, Ignored 28, Canceled 6
[error] Failed tests:
[error] 	org.apache.spark.deploy.history.RocksDBBackendHistoryServerSuite
[error] 	org.apache.spark.deploy.history.LevelDBBackendHistoryServerSuite
[error] (core / Test / test) sbt.TestsFailedException: Tests unsuccessful

sarutak · 2026-03-02T23:27:55Z

@dongjoon-hyun Thank you for your interest.

What happens when there exists a conflict among the log directories? For example, a user want to abuse this as a kind of multi-tier log managements like the following and copy from shorterm to longterm? Of course, the sync operation is non-atomic.

hdfs://spark-events/shorterm
hdfs://spark-events/longterm

Each event log file is tracked by its full path as the key in LogInfo. So if the same application's event log exists in both directories, they are treated as separate entries.
I didn't anticipated such kind of usage but during a non-atomic copy, the incomplete log file in the destination directory may fail to parse or show incomplete information temporarily. However, on the next scan cycle, shouldReloadLog invoked through checkForLogs detects the file size change and re-parses it, so the entry self-corrects once the copy completes.

What is the semantic on the ordering in the config value? Especially, when we have SPARK-52914 ?

The ordering of directories in the config value has no semantic. All directories are scanned equally in each polling cycle (checkForLogs iterates over all logDirs). The order does not affect priority.

On-demand loading operates per log file within checkForLogsInDir, which is called independently for each directory. There is no cross-directory interaction, so I believe multiple directories support and on-demand loading are orthogonal and work together without issues.

dongjoon-hyun · 2026-03-02T23:30:10Z

Thank you. This is a nice feature. I'll try to test more seriously.

core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala

dongjoon-hyun

We need more clear definition between this and the existing spark.history.fs.* configuration. At the first glance,

Do you want to have per-directory configurations in the future?
For now, spark.history.fs.update.interval is supposed to be applied for one scan for all directories?
spark.history.fs.cleaner.interval is also supposed to be applied for one scan for all directories?
When spark.history.fs.cleaner.maxNum is applied,
- This PR will consider the total number of files for all directories, right?
- Which directory will be selected as a victim for the tie?

Since this introduces lots of ambiguity a little, could you revise the PR title and provide a corresponding documentation update, docs, together in this PR?

sarutak · 2026-03-04T15:19:05Z

@dongjoon-hyun Thank you for your feedback.

Do you want to have per-directory configurations in the future?

I considered it might be helpful to have per-directory configurations (e.g. spark.history.fs.cleaner.*) but this such configurations are not supported at least in this PR, and I'd like to start with simple global settings and improve based on user feedback.

For now, spark.history.fs.update.interval is supposed to be applied for one scan for all directories?

Yes.

spark.history.fs.cleaner.interval is also supposed to be applied for one scan for all directories?

Yes.

When spark.history.fs.cleaner.maxNum is applied,
This PR will consider the total number of files for all directories, right?
Which directory will be selected as a victim for the tie?

Yes, the property is applied to the total number of log entries across all directories. As the updated document says, when the limit is exceeded, the oldest completed attempts are deleted first regardless of which directory they belong to.

Since this introduces lots of ambiguity a little, could you revise the PR title and provide a corresponding documentation update, docs, together in this PR?

Updated (You said revise the PR title but I thought it's type for PR description so I've updated only the description).

dongjoon-hyun

Thank you for updating.

dongjoon-hyun

+1, I supported @sarutak 's proposal and this PR's approach. Thank you.

cc @mridulm , @yaooqinn , @LuciferYang , too.

yaooqinn · 2026-03-04T17:29:09Z

core/src/main/resources/org/apache/spark/ui/static/historypage-template.html

 -->

 <script id="history-summary-template" type="text/html">
+<div class="row" style="margin-bottom: 10px;">


Inline styles should use Bootstrap 5 utility classes (per SPARK-55775 which was recently merged):

Suggested change

<div class="row" style="margin-bottom: 10px;">

<div class="row mb-2">

Similarly below:

style="margin-right: 10px;" → class="me-2"

style="display: inline-block; width: auto;" → class="d-inline-block w-auto"

yaooqinn · 2026-03-04T17:29:09Z

core/src/main/resources/org/apache/spark/ui/static/historypage-template.html

          App Name
        </span>
      </th>
+      <th>


This uses the old Bootstrap 4 attributes (data-toggle, data-placement). After the BS5 migration (SPARK-55753), these should be:

Suggested change

<th>

<span data-bs-toggle="tooltip" title="Log directory where this application's event log is stored.">

Note: data-bs-placement="top" is not needed since BS5 defaults to top (SPARK-55778).

yaooqinn · 2026-03-04T17:29:10Z

core/src/main/resources/org/apache/spark/ui/static/historypage-template.html

@@ -16,6 +16,14 @@
 -->

 <script id="history-summary-template" type="text/html">


The table class should include table-hover for row highlight on mouseover, consistent with SPARK-55784 which adds table-hover to all Spark UI tables.

yaooqinn · 2026-03-04T17:29:10Z

core/src/main/scala/org/apache/spark/deploy/history/HistoryPage.scala

+                    {dirs.map(d => <li>{d}</li>)}
+                  </ul>
+                </details>
+              </li>


For the collapsible log directories section, please make sure to follow the BS5 Collapse API pattern established in SPARK-55773 (data-bs-toggle="collapse", data-bs-target, aria-expanded, etc.) rather than custom JS toggle logic.

sarutak · 2026-03-04T17:51:05Z

Thanks @yaooqinn for your feedback. I've updated.

Add a feature to SHS to support multiple log directories

24ffc63

dongjoon-hyun reviewed Mar 2, 2026

View reviewed changes

dongjoon-hyun changed the title ~~[SPARK-55793][WEBUI] Add multiple log directories support to SHS~~ [SPARK-55793][CORE] Add multiple log directories support to SHS Mar 2, 2026

Fix for CI

321b2d8