feat!: rearchitect proxy with Cloudflare Workers-compatible API#109
Draft
feat!: rearchitect proxy with Cloudflare Workers-compatible API#109
Conversation
…er support non-s3 backends
Replaced raw format! string interpolation of upload_id and part_number into query strings with url::form_urlencoded::Serializer::append_pair(), which properly percent-encodes special characters (&, =, etc.) in both UploadPart and CompleteMultipartUpload/AbortMultipartUpload URL construction.
Added validate_path_segment() that rejects values containing /, \, \0, .., ., or empty strings. Called before every format!("/path/{}", user_input) interpolation in get_bucket, get_role, get_credential, and get_temporary_credential.
Changed all four tracing::debug! calls in dispatch_operation from url = %fwd.url (which logged the full presigned URL including auth signatures in query params) to path = fwd.url.path() (which logs only the URL path — bucket and key, no credentials). The multipart backend_url log on line 418 was left as-is since that URL doesn't contain presigned auth params.
ad208aa to
3b78af5
Compare
CLI now lives at https://github.com/source-cooperative/source-coop-cli
## Summary - Adds a comprehensive VitePress documentation site in `docs/` covering authentication, configuration, deployment, architecture, and extension points - Organized into user-facing guide (accessing data) and admin-facing sections (deploying/configuring the proxy) - Styled to match docs.source.coop visual identity: IBM Plex Sans body, Cascadia Mono headings, warm off-white/teal-gray color scheme ## Test plan - [ ] `cd docs && pnpm install && pnpm docs:dev` — site builds and serves locally - [ ] Navigate all sidebar links — no broken links - [ ] Mermaid diagrams render correctly - [ ] Light and dark themes both render properly - [ ] Code examples are syntactically valid 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
CLI now lives at https://github.com/source-cooperative/source-coop-cli
143dfc2 to
bbd7377
Compare
68c3746 to
8ea93f8
Compare
…koff Reject non-HTTPS OIDC issuer URLs per the OIDC spec to prevent MITM attacks. Cache failed JWKS fetches for 30s to avoid hammering broken endpoints on repeated STS requests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Move canonical request details to debug level and stop logging expected/provided signatures entirely. Add access key and token context to sealed token unsealing failures for easier debugging. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Important
This is not yet ready. I'm creating this PR to document the idea and its progress.
Note
This is a continuation of #108
What I'm changing
This PR performs a complete rebuild of the Source Data Proxy into a system of modular crates that can be composed to build proxies for various runtimes. This allows us to build for the Cloudflare Workers runtime by compiling to WASM.
Overall Architecture
I recommend that the proxy run in a multi-environment fashion, wherein clients could select a proxy that best fits their needs:
data.source.coop- Workers based deployment for general usage by users in non-cloud or cross-cloud environments.{region}.data.source.coop- Cloud provider specific deployments of the data proxy to support in-region access (avoiding egress fees and promoting high throughput). These can be built-for and deployed-to traditional runtime environments, such as AWS ECS Fargate (as we do today).Why Workers?
I think that Cloudflare Workers is the ideal runtime environment for most Source Data access for the following reasons:
Performance Comparisons
Using AWS Cloudshell, I downloaded a single 73MB file (
cholmes/admin-boundaries/countries.parquet) fromus-west-2(Oregon),ap-south-1(Mumbai), andeu-west-3(Paris) and compared the result5. Across regions, the Cloudflare proxy significantly outperforms the Legacy Source proxy, primarily by reducing DNS, connection, and TLS handshake latency—often by 3–6×—which lowers TTFB by roughly 20–40% and improves throughput by up to ~25%. In-region (us-west-2), Direct S3 remains marginally fastest, but Cloudflare adds only modest overhead and still materially outperforms Legacy. In cross-region scenarios (Paris and Mumbai), Cloudflare eliminates most of the connection and TLS penalties seen with Legacy and, in some cases, matches or slightly exceeds Direct S3 total transfer performance due to optimized edge termination and backbone routing. Overall, Cloudflare removes the bulk of proxy-induced latency while delivering more consistent global performance than both Legacy and, at distance, even direct S3 access.How I did it
Note
Almost all of this codebase was written by Claude Code via Opus 4.6.
The key challenge when working with Cloudflare Workers is to avoid hitting the CPU timeout. This challenge is particularly apparent when dealing with large streams of data. Given that the Cloudflare Workers uses the V8 runtime, we must compile our system to WASM. Cloudflare Workers exposes the request and response bodies as native JS ReadableStreams, exposed as a
web_sys::ReadableStream. It's critical that these streams NOT be transformed to aByteStreamas this exhausts the CPU timeout for any bodies greater than ~70MB. As such, the system was written to allow each runtime environment to define its stream format and for those streams to be passed between the incoming requests and the backend fileserver or vice-versa.How to test it
The system is currently deployed to https://s3-proxy-rs.alukach.workers.dev with a subset of data. For experimentation, try to access either the
cholmesorharvard-lilbuckets.TODO
Related Issues
closes #1
Footnotes
https://developers.cloudflare.com/workers/reference/how-workers-works/#isolates ↩
https://blog.cloudflare.com/eliminating-cold-starts-with-cloudflare-workers/ ↩
https://blog.cloudflare.com/backbone2024/ ↩
https://developers.cloudflare.com/workers/platform/pricing/#workers ↩
https://gist.github.com/alukach/416f5f588d0305034801369932e0ce40 ↩
https://developers.cloudflare.com/workers/platform/infrastructure-as-code/#terraform ↩