-
Couldn't load subscription status.
- Fork 1
Adding additional checks during upload and download of chunk #39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: airbnb-main
Are you sure you want to change the base?
Conversation
astra/src/main/java/com/slack/astra/chunk/ReadOnlyChunkImpl.java
Outdated
Show resolved
Hide resolved
0a046bb to
6ef31f7
Compare
6ef31f7 to
ac66b41
Compare
| assert prefix != null && !prefix.isEmpty(); | ||
|
|
||
| ListObjectsV2Request listRequest = builder().bucket(bucketName).prefix(prefix).build(); | ||
| ListObjectsV2Publisher asyncPaginatedListResponse = | ||
| s3AsyncClient.listObjectsV2Paginator(listRequest); | ||
|
|
||
| Map<String, Long> filesListWithSize = new HashMap<>(); | ||
| try { | ||
| asyncPaginatedListResponse | ||
| .subscribe( | ||
| listResponse -> | ||
| listResponse | ||
| .contents() | ||
| .forEach(s3Object -> filesListWithSize.put(s3Object.key(), s3Object.size()))) | ||
| .get(); | ||
| } catch (InterruptedException | ExecutionException e) { | ||
| throw new RuntimeException(e); | ||
| } | ||
| return filesListWithSize; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could extract a helper method that would be used by both listFiles... methods that takes a prefix and a Consumer. Then this could look like the following.
Also, the block passed to subscribe could be called in multiple threads, so this should use a storage class that is safe wrt concurrent modifications.
| assert prefix != null && !prefix.isEmpty(); | |
| ListObjectsV2Request listRequest = builder().bucket(bucketName).prefix(prefix).build(); | |
| ListObjectsV2Publisher asyncPaginatedListResponse = | |
| s3AsyncClient.listObjectsV2Paginator(listRequest); | |
| Map<String, Long> filesListWithSize = new HashMap<>(); | |
| try { | |
| asyncPaginatedListResponse | |
| .subscribe( | |
| listResponse -> | |
| listResponse | |
| .contents() | |
| .forEach(s3Object -> filesListWithSize.put(s3Object.key(), s3Object.size()))) | |
| .get(); | |
| } catch (InterruptedException | ExecutionException e) { | |
| throw new RuntimeException(e); | |
| } | |
| return filesListWithSize; | |
| Map<String, Long> filesWithSize = new ConcurrentHashMap<>(); | |
| listFilesAndDo(prefix, s3Object -> filesListWithSize.put(s3Object.key(), s3Object.size())); | |
| return filesWithSize; |
astra/src/main/java/com/slack/astra/chunk/ChunkValidationUtils.java
Outdated
Show resolved
Hide resolved
| .collect( | ||
| Collectors.toMap( | ||
| path -> | ||
| dataDirectory.relativize(path).toString().replace(File.separator, "/"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm. I think the replace isn't necessary since Files.list() only returns files in the current directory. Although, maybe you should use Path#getFileName().toString() here, which would align with calling Paths.get(s3Path).getFileName().toString() on the s3 entries below.
astra/src/main/java/com/slack/astra/chunk/ReadOnlyChunkImpl.java
Outdated
Show resolved
Hide resolved
astra/src/main/java/com/slack/astra/chunk/ReadOnlyChunkImpl.java
Outdated
Show resolved
Hide resolved
astra/src/main/java/com/slack/astra/chunk/ReadOnlyChunkImpl.java
Outdated
Show resolved
Hide resolved
| // validate the size of the uploaded files | ||
| for (String fileName : filesToUpload) { | ||
| String s3Path = String.format("%s/%s", chunkInfo.chunkId, fileName); | ||
| long sizeOfFile = Files.size(Path.of(dirPath + "/" + fileName)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe use File.separator here instead of "/"?
| "Mismatch for file %s in S3 and local directory of size %s for chunk %s", | ||
| s3Path, sizeOfFile, chunkInfo.chunkId)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to include the s3 file size here as well.
| String chunkId = UUID.randomUUID().toString(); | ||
|
|
||
| assertThat(blobStore.listFiles(chunkId).size()).isEqualTo(0); | ||
|
|
||
| Path directoryUpload = Files.createTempDirectory(""); | ||
| Path foo = Files.createTempFile(directoryUpload, "", ""); | ||
| try (FileWriter fileWriter = new FileWriter(foo.toFile())) { | ||
| fileWriter.write("Example test 1"); | ||
| } | ||
| Path bar = Files.createTempFile(directoryUpload, "", ""); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you used a non-random chunkId and file names, you could have the assertion use more literals and it would be easier to follow.
Also, could you have one of the files have a different number of characters in it so it would be clear that they are different?
Summary
Adding sanity checks before upload and post download of chunks.
Requirements