[SPARK-55793][CORE] Add multiple log directories support to SHS#54575
[SPARK-55793][CORE] Add multiple log directories support to SHS#54575sarutak wants to merge 6 commits intoapache:masterfrom
Conversation
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Thank you, @sarutak .
I understand prod and staging use cases.
What happens when there exists a conflict among the log directories? For example, a user want to abuse this as a kind of multi-tier log managements like the following and copy from shorterm to longterm? Of course, the sync operation is non-atomic.
- hdfs://spark-events/shorterm
- hdfs://spark-events/longterm
What is the semantic on the ordering in the config value? Especially, when we have SPARK-52914 ?
|
Could you fix the CI failures? |
|
@dongjoon-hyun Thank you for your interest.
Each event log file is tracked by its full path as the key in
The ordering of directories in the config value has no semantic. All directories are scanned equally in each polling cycle ( On-demand loading operates per log file within |
|
Thank you. This is a nice feature. I'll try to test more seriously. |
core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala
Show resolved
Hide resolved
core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala
Show resolved
Hide resolved
core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala
Outdated
Show resolved
Hide resolved
dongjoon-hyun
left a comment
There was a problem hiding this comment.
We need more clear definition between this and the existing spark.history.fs.* configuration. At the first glance,
- Do you want to have per-directory configurations in the future?
- For now,
spark.history.fs.update.intervalis supposed to be applied for one scan for all directories? spark.history.fs.cleaner.intervalis also supposed to be applied for one scan for all directories?- When
spark.history.fs.cleaner.maxNumis applied,- This PR will consider the total number of files for all directories, right?
- Which directory will be selected as a victim for the tie?
Since this introduces lots of ambiguity a little, could you revise the PR title and provide a corresponding documentation update, docs, together in this PR?
|
@dongjoon-hyun Thank you for your feedback.
I considered it might be helpful to have per-directory configurations (e.g.
Yes.
Yes.
Yes, the property is applied to the total number of log entries across all directories. As the updated document says, when the limit is exceeded, the oldest completed attempts are deleted first regardless of which directory they belong to.
Updated (You said |
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Thank you for updating.
dongjoon-hyun
left a comment
There was a problem hiding this comment.
+1, I supported @sarutak 's proposal and this PR's approach. Thank you.
cc @mridulm , @yaooqinn , @LuciferYang , too.
| --> | ||
|
|
||
| <script id="history-summary-template" type="text/html"> | ||
| <div class="row" style="margin-bottom: 10px;"> |
There was a problem hiding this comment.
Inline styles should use Bootstrap 5 utility classes (per SPARK-55775 which was recently merged):
| <div class="row" style="margin-bottom: 10px;"> | |
| <div class="row mb-2"> |
Similarly below:
style="margin-right: 10px;"→class="me-2"style="display: inline-block; width: auto;"→class="d-inline-block w-auto"
| App Name | ||
| </span> | ||
| </th> | ||
| <th> |
There was a problem hiding this comment.
This uses the old Bootstrap 4 attributes (data-toggle, data-placement). After the BS5 migration (SPARK-55753), these should be:
| <th> | |
| <span data-bs-toggle="tooltip" title="Log directory where this application's event log is stored."> |
Note: data-bs-placement="top" is not needed since BS5 defaults to top (SPARK-55778).
| @@ -16,6 +16,14 @@ | |||
| --> | |||
|
|
|||
| <script id="history-summary-template" type="text/html"> | |||
There was a problem hiding this comment.
The table class should include table-hover for row highlight on mouseover, consistent with SPARK-55784 which adds table-hover to all Spark UI tables.
| {dirs.map(d => <li>{d}</li>)} | ||
| </ul> | ||
| </details> | ||
| </li> |
There was a problem hiding this comment.
For the collapsible log directories section, please make sure to follow the BS5 Collapse API pattern established in SPARK-55773 (data-bs-toggle="collapse", data-bs-target, aria-expanded, etc.) rather than custom JS toggle logic.
|
Thanks @yaooqinn for your feedback. I've updated. |
What changes were proposed in this pull request?
This PR proposes to add multiple log directories support to SHS, allowing it to monitor event logs from multiple directories simultaneously.
This PR extends
spark.history.fs.logDirectoryto accept a comma-separated list of directories (e.g.,hdfs:///logs/prod,s3a://bucket/logs/staging). Directories can be on the same or different filesystems. Also, a new optional configspark.history.fs.logDirectory.namesis added which allows users to assign display names to directories by position (e.g.,Production,Staging). Empty entries fall back to the full path. Duplicate display names are rejected at startup.Behavior of existing
spark.history.fs.*settings with multiple directories:All existing settings apply globally — there are no per-directory configurations.
update.intervalcleaner.intervalcleaner.maxAgecleaner.maxNumnumReplayThreadsnumCompactThreadseventLog.rolling.maxFilesToRetainupdate.batchSizeRegarding UI changes, a "Log Source" column is added to the History UI table showing the display name (or full path) for each application, with a tooltip showing the full path.
Regarding UI changes, A "Log Source" column is added to the History UI table showing the display name (or full path) for each application, with a tooltip showing the full path.

Users can filter applications by their log directory using

Filter by Log Sourcedropdown.The


Event log directorysection in the History UI collapses into a<details>/<summary>element when multiple directories are configured.Why are the changes needed?
Some organization run multiple clusters and have corresponding log directory for each cluster. So if SHS supports multiple log directories, it can be used as a single end point to view event logs, which helps such organizations.
Does this PR introduce any user-facing change?
Yes but will not affect existing users.
How was this patch tested?
Manually confirmed WebUI as screenshots above and added new tests.
Was this patch authored or co-authored using generative AI tooling?
Kiro CLI / Opus 4.6