Skip to content

Conversation

@worryg0d
Copy link
Contributor

No description provided.

@worryg0d worryg0d marked this pull request as ready for review October 23, 2025 07:35
@worryg0d worryg0d marked this pull request as draft October 23, 2025 10:50
@worryg0d worryg0d marked this pull request as ready for review October 24, 2025 07:26
public AbstractOperationRequest() {
// for picocli
if (concurrentConnections == null)
concurrentConnections = getDefaultConcurrentConnections();
Copy link
Collaborator

@smiklosovic smiklosovic Nov 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should validate this, what if I put 0 or negative number? Or number bigger than number of cpus I have? We should check some range the value is allowed to be in. There is "validate" method you might maybe move this validation to?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, totally missed this by assumption that users should know what they're doing. But esop could be called by some script or any other automation tool, so it worth to validate if value is valid

modules.add(new UploadingModule());
modules.add(new DownloadingModule());
modules.add(new HashModule(hashSpec));
modules.add(new HashModule(hashSpec, request.concurrentConnections));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't you put concurrent connections into the constructor of HashSpec? Just pass it to hashSpec so you do not need to accommodate the code by passing request everywhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if it simplifies things, we use concurrentConnections (--cc) in different places and potentially there could no be HashSpec. Needs deeper investigation

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no luck with this?

public HashModule(final HashSpec hashSpec) {
public HashModule(final HashSpec hashSpec, final int parallelHashingThreads) {
this.hashSpec = hashSpec;
this.parallelHashingThreads = parallelHashingThreads;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you would get this from hashSpec so you do not need to change this constructor

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no luck with simplifying this?

@JsonProperty("retry")
public RetrySpec retry = new RetrySpec();

@Option(names = {"--cc", "--concurrent-connections"},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should add "--parallelism" name here, do not remove already existing because if people use it already we would break it for them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I also was thinking about that but got drowned by other things 😅

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added --parallelism option

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@worryg0d can you also rename the variable itself to parallelism?

logger.error(ex.getMessage());
corruptedFiles.add(entry.localFile.toString());
}
entriesToVerify.add(entry);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while this copies the logic which was there before, I am wondering why we continue to verify other files in case we already encountered the verification failure.

If I have 20 10GiB files to verify and I fail on the first one, then I am still verifying the rest (9) completely unnecessarily. It is not like I would continue with the restoration after I verify the rest anyway.

So we might probably just fail and cancel the rest of the tasks prematurely? We are checking if corruptedFiles are empty or not further in the execution flow and throw based on that while we do not even log the corrupted files themselves. (we do for importing phase but no for hardlinking phase, it is not aligned for now).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ha, that's a valid case. Agree we don't need to continue verification as we will fail anyway

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By futher investigation, to interrupt hash computations prematurely we need to check if the current thread is interrupted each time we read data from the disk, and if so throw InterruptedException

return forkJoinPool.submit(() -> manifestEntries.parallelStream().forEach(entry -> entry.hash = hash(entry)));
}

public ForkJoinTask<?> verifyAll(final List<ManifestEntry> manifestEntries, OnFailure onFailure) {
Copy link
Collaborator

@smiklosovic smiklosovic Nov 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we fail fast here? Just cancel the verification of the rest as soon as we see some task failed. I think there is some API in CompletableFuture for achieving that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Answered above

1. value range validation for concurrentConnections option
2. introduced --parallelism alternative for concurrentConnections
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants