-
Notifications
You must be signed in to change notification settings - Fork 5
Open
Description
A user has pointed out that they get much better performance downloading the same file from HuggingFace than they do with data.source.coop. Diving into this, we can see that HuggingFace responds with a 302 redirect to a Cloudfront server. HuggingFace can do this because they are not pursuing interoperability with S3-clients (at least, I don't believe that they are advertising that their API is S3-compatible). While redirects are technically supported by S31, some clients like boto3 don't support this2. NVIDIA uses a patch3 to get around this with their S3-compatible API.
However, Cloudfront would bring benefits, particularly for users that are located far away from us-west-2. As such, we should consider:
- Setting up one or many Cloudfront distributions to serve our data. The strategy for handling different clouds providers and different buckets is to be determined.
- Serving redirects from our S3-compliant API when the client supports this (perhaps inferred based on the user-agent header).
logs
▶ wget https://data.source.coop/csaybar/3dclouds/pretraining/__TACOCAT__
--2025-11-10 09:02:40-- https://data.source.coop/csaybar/3dclouds/pretraining/__TACOCAT__
Resolving data.source.coop (data.source.coop)... 54.188.53.230, 54.186.232.91, 34.214.239.109
Connecting to data.source.coop (data.source.coop)|54.188.53.230|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 144434880 (138M) [binary/octet-stream]
Saving to: ‘__TACOCAT__’
__TACOCAT__ 100%[======================================================>] 137.74M 18.2MB/s in 7.6s
2025-11-10 09:02:50 (18.2 MB/s) - ‘__TACOCAT__’ saved [144434880/144434880]
▶ dig -x 18.172.185.108 +short
server-18-172-185-108.yvr52.r.cloudfront.net.
▶ wget https://huggingface.co/csaybar/playground/resolve/main/__TACOCAT__
--2025-11-10 09:03:01-- https://huggingface.co/csaybar/playground/resolve/main/__TACOCAT__
Resolving huggingface.co (huggingface.co)... 18.64.67.102, 18.64.67.107, 18.64.67.39, ...
Connecting to huggingface.co (huggingface.co)|18.64.67.102|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://cas-bridge.xethub.hf.co/xet-bridge-us/68cecf4a2a5dac94776cd712/847778c085fffa81923c520f7c1eaff1eaec3f2215fb097da8c6e2da50952636?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Content-Sha256=UNSIGNED-PAYLOAD&X-Amz-Credential=cas%2F20251110%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20251110T170302Z&X-Amz-Expires=3600&X-Amz-Signature=dca5b5fd1a661a116aaa0703097915c8eabc4a3394bb5fd4bb87226dc59cb3f6&X-Amz-SignedHeaders=host&X-Xet-Cas-Uid=public&response-content-disposition=inline%3B+filename*%3DUTF-8%27%27__TACOCAT__%3B+filename%3D%22__TACOCAT__%22%3B&x-id=GetObject&Expires=1762797782&Policy=eyJTdGF0ZW1lbnQiOlt7IkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTc2Mjc5Nzc4Mn19LCJSZXNvdXJjZSI6Imh0dHBzOi8vY2FzLWJyaWRnZS54ZXRodWIuaGYuY28veGV0LWJyaWRnZS11cy82OGNlY2Y0YTJhNWRhYzk0Nzc2Y2Q3MTIvODQ3Nzc4YzA4NWZmZmE4MTkyM2M1MjBmN2MxZWFmZjFlYWVjM2YyMjE1ZmIwOTdkYThjNmUyZGE1MDk1MjYzNioifV19&Signature=Dc%7ELcACz0t0UC4AHBaH22Cr8AbVmUsKjl4tvltnbPeixwWkN-KiEVueFOcF6Hma91yWYJ-fzoZ85Nxca0-n1W0X8QvQdWT7rWBoTQUDc6RaKpB5M-D6e38iT9CfFVgHsJCQtP0xRMPXkUFXl5vyw9oQFC3t-Od1UdBSaOZTcXDJg5Kb7xc9Hs7XhqkDWzgkPqLu7egu1LTm3Z1rmJBEoxlaEoihDE7KZSLjQMofTmB8FNVe9uSSvgH87HCKmf4Nk6dOwGVQCfCYym9EwyeTFijyNozXmM3NpyqOc4jGbJb408JgHuRfWTCXEm1RFk0zB83-2%7EiKvmDZ1s0htAJezcw__&Key-Pair-Id=K2L8F4GPSG1IFC [following]
--2025-11-10 09:03:02-- https://cas-bridge.xethub.hf.co/xet-bridge-us/68cecf4a2a5dac94776cd712/847778c085fffa81923c520f7c1eaff1eaec3f2215fb097da8c6e2da50952636?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Content-Sha256=UNSIGNED-PAYLOAD&X-Amz-Credential=cas%2F20251110%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20251110T170302Z&X-Amz-Expires=3600&X-Amz-Signature=dca5b5fd1a661a116aaa0703097915c8eabc4a3394bb5fd4bb87226dc59cb3f6&X-Amz-SignedHeaders=host&X-Xet-Cas-Uid=public&response-content-disposition=inline%3B+filename*%3DUTF-8%27%27__TACOCAT__%3B+filename%3D%22__TACOCAT__%22%3B&x-id=GetObject&Expires=1762797782&Policy=eyJTdGF0ZW1lbnQiOlt7IkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTc2Mjc5Nzc4Mn19LCJSZXNvdXJjZSI6Imh0dHBzOi8vY2FzLWJyaWRnZS54ZXRodWIuaGYuY28veGV0LWJyaWRnZS11cy82OGNlY2Y0YTJhNWRhYzk0Nzc2Y2Q3MTIvODQ3Nzc4YzA4NWZmZmE4MTkyM2M1MjBmN2MxZWFmZjFlYWVjM2YyMjE1ZmIwOTdkYThjNmUyZGE1MDk1MjYzNioifV19&Signature=Dc%7ELcACz0t0UC4AHBaH22Cr8AbVmUsKjl4tvltnbPeixwWkN-KiEVueFOcF6Hma91yWYJ-fzoZ85Nxca0-n1W0X8QvQdWT7rWBoTQUDc6RaKpB5M-D6e38iT9CfFVgHsJCQtP0xRMPXkUFXl5vyw9oQFC3t-Od1UdBSaOZTcXDJg5Kb7xc9Hs7XhqkDWzgkPqLu7egu1LTm3Z1rmJBEoxlaEoihDE7KZSLjQMofTmB8FNVe9uSSvgH87HCKmf4Nk6dOwGVQCfCYym9EwyeTFijyNozXmM3NpyqOc4jGbJb408JgHuRfWTCXEm1RFk0zB83-2%7EiKvmDZ1s0htAJezcw__&Key-Pair-Id=K2L8F4GPSG1IFC
Resolving cas-bridge.xethub.hf.co (cas-bridge.xethub.hf.co)... 18.172.185.108, 18.172.185.16, 18.172.185.127, ...
Connecting to cas-bridge.xethub.hf.co (cas-bridge.xethub.hf.co)|18.172.185.108|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 144434880 (138M)
Saving to: ‘__TACOCAT__.1’
__TACOCAT__.1 100%[======================================================>] 137.74M 23.8MB/s in 6.5s
2025-11-10 09:03:08 (21.3 MB/s) - ‘__TACOCAT__.1’ saved [144434880/144434880]
▶ wget http://us-west-2.opendata.source.coop.s3.us-west-2.amazonaws.com/csaybar/3dclouds/pretraining/__TACOCAT__
--2025-11-10 09:10:09-- http://us-west-2.opendata.source.coop.s3.us-west-2.amazonaws.com/csaybar/3dclouds/pretraining/__TACOCAT__
Resolving us-west-2.opendata.source.coop.s3.us-west-2.amazonaws.com (us-west-2.opendata.source.coop.s3.us-west-2.amazonaws.com)... 3.5.78.113, 3.5.82.187, 3.5.82.65, ...
Connecting to us-west-2.opendata.source.coop.s3.us-west-2.amazonaws.com (us-west-2.opendata.source.coop.s3.us-west-2.amazonaws.com)|3.5.78.113|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 144434880 (138M) [binary/octet-stream]
Saving to: ‘__TACOCAT__.2’
__TACOCAT__.2 100%[======================================================>] 137.74M 19.0MB/s in 7.3s
2025-11-10 09:10:16 (18.9 MB/s) - ‘__TACOCAT__.2’ saved [144434880/144434880]
Footnotes
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels