Skip to content

Conversation

@zarna1parekh
Copy link
Collaborator

@zarna1parekh zarna1parekh commented Jul 30, 2025

Summary

Adding sanity checks before upload and post download of chunks.

  1. Check lucene directory for corrupted data
  2. check number of files and size of each file locally and on S3.

Requirements

@zarna1parekh zarna1parekh force-pushed the zparekh/debug_cache_nodes branch from 0a046bb to 6ef31f7 Compare August 12, 2025 18:17
@zarna1parekh zarna1parekh force-pushed the zparekh/debug_cache_nodes branch from 6ef31f7 to ac66b41 Compare August 12, 2025 18:49
@zarna1parekh zarna1parekh changed the title Adding log statement + removing eviction on exception for debugging p… Adding additional checks during upload and download of chunk Aug 13, 2025
Comment on lines +201 to +219
assert prefix != null && !prefix.isEmpty();

ListObjectsV2Request listRequest = builder().bucket(bucketName).prefix(prefix).build();
ListObjectsV2Publisher asyncPaginatedListResponse =
s3AsyncClient.listObjectsV2Paginator(listRequest);

Map<String, Long> filesListWithSize = new HashMap<>();
try {
asyncPaginatedListResponse
.subscribe(
listResponse ->
listResponse
.contents()
.forEach(s3Object -> filesListWithSize.put(s3Object.key(), s3Object.size())))
.get();
} catch (InterruptedException | ExecutionException e) {
throw new RuntimeException(e);
}
return filesListWithSize;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could extract a helper method that would be used by both listFiles... methods that takes a prefix and a Consumer. Then this could look like the following.

Also, the block passed to subscribe could be called in multiple threads, so this should use a storage class that is safe wrt concurrent modifications.

Suggested change
assert prefix != null && !prefix.isEmpty();
ListObjectsV2Request listRequest = builder().bucket(bucketName).prefix(prefix).build();
ListObjectsV2Publisher asyncPaginatedListResponse =
s3AsyncClient.listObjectsV2Paginator(listRequest);
Map<String, Long> filesListWithSize = new HashMap<>();
try {
asyncPaginatedListResponse
.subscribe(
listResponse ->
listResponse
.contents()
.forEach(s3Object -> filesListWithSize.put(s3Object.key(), s3Object.size())))
.get();
} catch (InterruptedException | ExecutionException e) {
throw new RuntimeException(e);
}
return filesListWithSize;
Map<String, Long> filesWithSize = new ConcurrentHashMap<>();
listFilesAndDo(prefix, s3Object -> filesListWithSize.put(s3Object.key(), s3Object.size()));
return filesWithSize;

.collect(
Collectors.toMap(
path ->
dataDirectory.relativize(path).toString().replace(File.separator, "/"),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm. I think the replace isn't necessary since Files.list() only returns files in the current directory. Although, maybe you should use Path#getFileName().toString() here, which would align with calling Paths.get(s3Path).getFileName().toString() on the s3 entries below.

// validate the size of the uploaded files
for (String fileName : filesToUpload) {
String s3Path = String.format("%s/%s", chunkInfo.chunkId, fileName);
long sizeOfFile = Files.size(Path.of(dirPath + "/" + fileName));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe use File.separator here instead of "/"?

Comment on lines 294 to 295
"Mismatch for file %s in S3 and local directory of size %s for chunk %s",
s3Path, sizeOfFile, chunkInfo.chunkId));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to include the s3 file size here as well.

Comment on lines +225 to +234
String chunkId = UUID.randomUUID().toString();

assertThat(blobStore.listFiles(chunkId).size()).isEqualTo(0);

Path directoryUpload = Files.createTempDirectory("");
Path foo = Files.createTempFile(directoryUpload, "", "");
try (FileWriter fileWriter = new FileWriter(foo.toFile())) {
fileWriter.write("Example test 1");
}
Path bar = Files.createTempFile(directoryUpload, "", "");

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you used a non-random chunkId and file names, you could have the assertion use more literals and it would be easier to follow.

Also, could you have one of the files have a different number of characters in it so it would be clear that they are different?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants