Feat/r2 centralized data sharing by yradwan147 · Pull Request #1245 · desci-labs/nodes

yradwan147 · 2026-03-10T19:42:58Z

Before the deployed dev environment will work, we need to add these to Vault at secrets/desci-server/dev/app:

R2_ACCOUNT_ID
R2_ACCESS_KEY_ID
R2_SECRET_ACCESS_KEY
R2_BUCKET_NAME

Once deployed, the test UI will be at:
https://nodes-api-dev.desci.com/test/centralizedData.html

Summary by CodeRabbit

New Features
- Centralized file storage: upload, hierarchical browse, single-file download and ZIP export.
- Secure sharing and grant management for node-level access.
- GET endpoint exposing Google auth client ID; dev-only magic-code preview.
Tests
- Added a standalone test UI to exercise centralized-data flows and access control.
Chores
- Added Cloudflare R2 env vars, local dev orchestration script, and ZIP/archive runtime dependency.
- Minor dev server/test UI adjustments.
Documentation
- New guide for Centralized Data (R2) usage and setup.
Bug Fixes
- Improved resilience when manifest or external storage operations fail.

coderabbitai · 2026-03-10T19:43:19Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds Cloudflare R2-backed centralized file storage: new R2 service, upload/tree/download/zip controllers and routes, access-grant model and APIs, DB migrations, dev tooling and test UI, environment and deployment updates, and an archiver dependency.

Changes

Cohort / File(s)	Summary
Configuration & Deployment `\.env.example`, `desci-server/kubernetes/deployment_dev.yaml`, `desci-server/kubernetes/deployment_prod.yaml`, `docker-compose.dev.yml`	Added Cloudflare R2 env vars (`R2_ACCOUNT_ID`, `R2_ACCESS_KEY_ID`, `R2_SECRET_ACCESS_KEY`, `R2_BUCKET_NAME`) to env example and Vault templates; added build block for dev compose service.
R2 Storage Service `desci-server/src/services/r2.ts`	New R2 wrapper (AWS S3 client) exporting `isR2Configured`, `r2Client`, and functions: `uploadToR2`, `getStreamFromR2`, `deleteFromR2`, `listR2Objects`, `listR2ObjectsPages`.
Centralized Data Controllers `desci-server/src/controllers/data/...` `centralizedTree.ts`, `downloadCentralized.ts`, `downloadCentralizedZip.ts`, `uploadCentralized.ts`	New controllers: directory tree listing, single-file download, ZIP streaming download (uses `archiver`), and multipart upload (Multer + SHA‑256 + metadata). Include access checks, validation, streaming and error handling.
Data Access Service & Grants `desci-server/src/services/centralizedDataAccess.ts`, `desci-server/src/controllers/nodes/dataGrants.ts`, `desci-server/prisma/...`	Added `checkCentralizedDataAccess`, Prisma model `NodeDataGrant`, migrations and controllers/routes to create/revoke/list grants and list my grants.
Routes `desci-server/src/routes/v1/data.ts`, `desci-server/src/routes/v1/nodes.ts`, `desci-server/src/routes/v1/auth.ts`	Wired new centralized-data routes (upload, tree, downloads, zip) and node grant routes (grant/revoke/list/myGrants); added `GET /google/client-id`.
Server & Dev Tooling `desci-server/src/server.ts`, `desci-server/scripts/dev-local.sh`, `desci-server/src/test/centralizedData.html`	Serve test UI under `/test` in non-prod, adjust helmet usage, add dev-local orchestration script, and include a large single-file test UI for centralized data flows.
Node draft & Auth tweaks `desci-server/src/controllers/nodes/draftCreate.ts`, `desci-server/src/controllers/auth/magic.ts`	Made draft creation resilient to manifest/Automerge failures; magic link endpoint returns `devCode` in non-prod for easier local testing.
Dependencies & Docs `desci-server/package.json`, `desci-server/CENTRALIZED_DATA.md`	Added `archiver` (^7.0.1) dependency and a new `CENTRALIZED_DATA.md` documenting R2 integration, endpoints, and local/dev instructions.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Client as Client
  participant Server as Server (Express)
  participant DB as Database (Postgres)
  participant R2 as Cloudflare R2
  participant Arch as Archiver

  Client->>Server: POST /v1/uploadCentralized (files + auth)
  Server->>DB: verify node ownership / share/grant access
  DB-->>Server: access allowed
  Server->>R2: PUT object(s)
  R2-->>Server: OK
  Server-->>Client: 200 {uploaded files meta}

  Client->>Server: GET /v1/centralizedTree/:nodeUuid
  Server->>R2: List objects (prefix)
  R2-->>Server: object list
  Server->>Server: build hierarchical tree
  Server-->>Client: 200 {tree}

  Client->>Server: GET /v1/downloadCentralized/:nodeUuid/zip
  Server->>DB: verify access
  DB-->>Server: allowed
  Server->>R2: stream objects
  R2-->>Server: streams
  Server->>Arch: append streams to ZIP
  Arch-->>Server: zip stream -> response
  Server-->>Client: streaming ZIP

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Posthog deployment envs #1217: Modifies the same Kubernetes Vault deployment templates (same areas) — overlap in secret/env handling.
K8s env cfg update #1228: Also updates Vault-exported environment variables in deployment_dev/prod templates — closely related to env wiring.
Myst file imports #1107: Touched upload route/middleware patterns similar to the new centralized upload handlers — related to upload flow changes.

Suggested reviewers

kadamidev
shadrach-tayo

Poem

"🐰 I hopped in with a tiny plan,
Brought R2 baskets in my pawed hand.
Uploads, trees, and zips — what fun!
Dev scripts hum until it's done.
Now files are shared — a carrot for everyone! 🥕"

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 16.67% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Feat/r2 centralized data sharing' directly describes the main feature addition: implementing centralized data sharing backed by Cloudflare R2, which is the primary focus of the changeset.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/r2-centralized-data-sharing

📝 Coding Plan

Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Add private data storage via Cloudflare R2, completely parallel to the existing IPFS pipeline. Stakeholders can upload, browse, share, and download research data through a self-contained test UI. Key changes: - R2 service (r2.ts): S3-compatible client for upload, download, list, and delete operations using @aws-sdk/client-s3 - Upload endpoint (POST /v1/data/uploadCentralized): multipart file upload to R2 with SHA-256 content hashing stored as object metadata - Tree endpoint (GET /v1/data/centralizedTree/:uuid): public file listing built from R2 object keys, no auth required - Download endpoint (GET /v1/data/downloadCentralized/:uuid/*): gated single-file download — requires node ownership or valid shareId - Zip download (GET /v1/data/downloadCentralized/:uuid/zip): streams all node files as a zip archive using archiver, same access controls - Share endpoint reused from existing POST /v1/nodes/share/:uuid - Magic link auth in test UI: uses existing POST /v1/auth/magic, with devCode auto-fill in non-production for easier local testing - draftCreate made IPFS-resilient: node creation succeeds even when IPFS/sync-server are unavailable — only the DB record is required for R2 operations - Dev script (scripts/dev-local.sh): starts Postgres, Redis, and IPFS via Docker; overrides IPFS URLs to localhost for native execution - URL-encodes file paths with spaces in download links and share URLs - Removed dead buildTree() function from centralizedTree.ts - Removed leftover debugger comment from magic.ts - Added CENTRALIZED_DATA.md developer documentation No changes to existing IPFS flows. All R2 endpoints verified: upload, tree, download, zip, share link, and access control (403s).

R2_ACCOUNT_ID, R2_ACCESS_KEY_ID, R2_SECRET_ACCESS_KEY, and R2_BUCKET_NAME must be added to Vault at secrets/desci-server/{dev,prod}/app for the centralized data endpoints to work in deployed environments.

coderabbitai

Actionable comments posted: 8

🧹 Nitpick comments (5)

desci-server/src/controllers/auth/magic.ts (1)

55-58: Ordering inconsistency with auth.ts.

This query uses orderBy: { createdAt: 'desc' }, while auth.ts (lines 52, 58) uses orderBy: { id: 'desc' } for the same MagicLink model. Both typically return the same result since id is auto-increment and createdAt defaults to now(), but this inconsistency could cause subtle bugs if records are ever bulk-inserted or have explicit timestamps.

Consider aligning with the existing pattern in auth.ts:
♻️ Suggested alignment
         const latest = await prismaClient.magicLink.findFirst({
           where: { email: { equals: cleanEmail, mode: 'insensitive' } },
-          orderBy: { createdAt: 'desc' },
+          orderBy: { id: 'desc' },
         });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@desci-server/src/controllers/auth/magic.ts` around lines 55 - 58, The query
in prismaClient.magicLink.findFirst within magic.ts uses orderBy: { createdAt:
'desc' } which is inconsistent with auth.ts (which uses orderBy: { id: 'desc'
}); change the orderBy to { id: 'desc' } to match the existing pattern and avoid
subtle differences from bulk inserts or manual timestamps—update the
prismaClient.magicLink.findFirst call that assigns latest to use id descending
so the same record selection logic is used across auth.ts and magic.ts.

desci-server/src/services/r2.ts (1)

33-48: Consider documenting stream upload limitations.

uploadToR2 accepts Readable streams, but the current implementation doesn't specify ContentLength. While AWS SDK v3 handles this for many cases, very large stream uploads may require multipart upload handling. The current usage in uploadCentralized.ts reads files into buffers first, which is safe but could be documented.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@desci-server/src/services/r2.ts` around lines 33 - 48, The uploadToR2
function accepts Readable streams but does not set ContentLength or handle
multipart uploads, which can fail for very large streams; update the
code/comments for uploadToR2 and PutObjectCommand to document this limitation
and recommend one of two fixes: (1) require callers to provide ContentLength and
handle single-part uploads, or (2) buffer streams to memory (as done in
uploadCentralized.ts) or implement multipart upload logic for large objects;
reference uploadToR2 and PutObjectCommand in the comment so callers know to
either pass a Buffer/known-length stream or use a multipart strategy.

desci-server/src/controllers/data/downloadCentralizedZip.ts (1)

28-54: Consider extracting shared access control logic.

The access control pattern (lines 28-54) is duplicated from downloadCentralized.ts. Consider extracting to a shared utility or middleware to reduce duplication and ensure consistent behavior.

Example shared utility

// In a shared utility file
export async function checkCentralizedDataAccess(
  normalizedUuid: string,
  shareId: string | undefined,
  userId: number | undefined
): Promise<boolean> {
  if (shareId) {
    const privateShare = await prisma.privateShare.findFirst({
      where: { shareId, nodeUUID: normalizedUuid },
    });
    if (privateShare) return true;
  }
  if (userId) {
    const node = await prisma.node.findFirst({
      where: { uuid: normalizedUuid, ownerId: userId },
    });
    if (node) return true;
  }
  return false;
}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@desci-server/src/controllers/data/downloadCentralizedZip.ts` around lines 28
- 54, The access-check block duplicated from downloadCentralized.ts should be
extracted into a shared utility (e.g., checkCentralizedDataAccess) or middleware
and then used from downloadCentralizedZip; implement a function that takes
normalizedUuid, shareId and user?.id, performs the two checks (privateShare by
shareId AND nodeUUID match, and node by uuid+ownerId) and returns boolean,
replace the inline logic in downloadCentralizedZip with a call to that utility,
remove the ensureUuidEndsWithDot/duplicate checks from the controller and ensure
the utility performs the normalized nodeUUID comparison consistently so both
endpoints share identical behavior.

desci-server/src/controllers/data/downloadCentralized.ts (1)

34-43: Consider combining shareId and nodeUUID in a single query.

The current pattern queries by shareId alone, then validates nodeUUID separately. While secure, a single query with both conditions (as done in desci-server/src/controllers/nodes/thumbnails.ts:50-56) would be more efficient:

const privateShare = await prisma.privateShare.findFirst({
  where: { shareId, nodeUUID: normalizedUuid },
});
if (privateShare) hasAccess = true;

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@desci-server/src/controllers/data/downloadCentralized.ts` around lines 34 -
43, Combine the two-step check into a single DB query: call
prisma.privateShare.findFirst with where: { shareId, nodeUUID: normalizedUuid }
(instead of querying by shareId only and then validating nodeUUID with
ensureUuidEndsWithDot), then set hasAccess = true if privateShare is returned;
remove the separate privateShare.node and ensureUuidEndsWithDot(...)
conditional. This uses the existing variables shareId, normalizedUuid,
privateShare and hasAccess and the prisma.privateShare.findFirst call.

desci-server/src/controllers/data/uploadCentralized.ts (1)

52-70: Memory concern: entire file loaded into buffer for large uploads.

Line 58 reads the entire file into memory before uploading. For large research datasets, this could cause memory pressure. Consider streaming uploads directly from disk for files above a size threshold.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@desci-server/src/controllers/data/uploadCentralized.ts` around lines 52 - 70,
The current loop in uploadCentralized.ts loads each file fully via
fs.promises.readFile (file.path) and then passes fileBuffer to uploadToR2, which
will OOM for large files; change to stream large files instead: for files over a
size threshold (e.g., file.size > X) use fs.createReadStream(file.path) and a
streaming upload path from uploadToR2 (or the R2 client) while computing the
SHA-256 by piping the stream through a crypto.createHash instance (or using a
Transform that updates the hash) so you never hold the full file in memory; keep
small files using the current readFile path for compatibility, and still unlink
file.path after the upload completes; reference uploadToR2, the for (const file
of files) loop, nodeUuid/contextPath, and the read/unlink of file.path when
locating where to implement this change.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.env.example:
- Around line 80-84: The committed R2 credentials must be removed from
.env.example: replace the real-looking values for R2_ACCOUNT_ID,
R2_ACCESS_KEY_ID, R2_SECRET_ACCESS_KEY, and R2_BUCKET_NAME with clearly fake or
blank placeholder values (e.g., EMPTY or YOUR_R2_ACCOUNT_ID) so the file
contains no real secrets; update the lines for the variables R2_ACCOUNT_ID,
R2_ACCESS_KEY_ID, R2_SECRET_ACCESS_KEY, and R2_BUCKET_NAME accordingly and
ensure no other secret-like strings remain in the file.

In `@desci-server/scripts/dev-local.sh`:
- Around line 15-29: The script currently only starts the single "ipfs" service
but advertises PUBLIC_IPFS_RESOLVER and GUEST_IPFS_NODE_URL endpoints, so update
the docker compose invocation to also bring up ipfs_public and ipfs_guest (i.e.,
change the compose up command in the block that runs in "$NODES_DIR" to include
ipfs_public and ipfs_guest), and add readiness checks similar to the existing
"until docker exec ipfs ipfs id" loops for the containers ipfs_public and
ipfs_guest (or equivalent HTTP/health checks against the advertised ports 5002
and 5005) so those advertised endpoints are actually up before the script
reports "IPFS ready."
- Around line 40-55: The script currently hardcodes the Sentry profiler filename
to darwin-arm64; update the logic that sets NODE_ABI, SENTRY_LIB and TARGET so
the TARGET filename is built from the host tuple (process.platform and
process.arch as well as process.versions.modules) instead of a fixed
"darwin-arm64" token: run node to emit platform and arch (similar to how
NODE_ABI is derived), compose a platform-arch string and use it when
constructing TARGET and when searching/copying LATEST, and keep the existing
fallback behavior (checking if TARGET exists, finding the highest ABI with the
pattern sentry_cpu_profiler-<platform>-<arch>-*.node, copying LATEST to TARGET,
and emitting the same log messages).

In `@desci-server/src/controllers/data/downloadCentralized.ts`:
- Around line 67-68: The Content-Disposition uses the raw fileName which can
allow header injection; in the handler that sets headers (the res.setHeader
calls around Content-Type/Content-Disposition) sanitize the fileName before use
by stripping or replacing control characters (CR/LF), quotes, backslashes and
other header-breaking characters, fall back to a safe default name if the result
is empty, and when possible set both a plain filename and an RFC5987 encoded
filename* (e.g., filename*=UTF-8''<encoded>) to preserve non-ASCII characters;
update the code referencing res.setHeader('Content-Disposition', `attachment;
filename="${fileName}"`) to use the sanitized and encoded value instead.

In `@desci-server/src/controllers/data/uploadCentralized.ts`:
- Around line 54-55: Sanitize and validate the incoming contextPath before
building relativePath/r2Key: reject or normalize any input containing path
traversal patterns (e.g., "../" or leading "/" ), strip or replace path
separators, enforce an allowlist of safe characters and a max length, and fall
back to an empty string if validation fails; then construct relativePath as
`${contextPath}/${file.originalname}` (or just file.originalname) using the
sanitized value so r2Key (`${nodeUuid}/${relativePath}`) cannot be manipulated
to escape the intended bucket namespace. Ensure the checks target the variables
contextPath, relativePath and r2Key construction and also validate
file.originalname for unsafe characters.

In `@desci-server/src/routes/v1/auth.ts`:
- Around line 52-55: The route handler for router.get('/google/client-id')
checks process.env.GOOGLE_CLIENT_ID_AUTH but deployments expose
GOOGLE_CLIENT_ID; update the handler to read process.env.GOOGLE_CLIENT_ID
instead (and keep the same error message/context), i.e., replace references to
GOOGLE_CLIENT_ID_AUTH with GOOGLE_CLIENT_ID in that handler so /google/client-id
returns the configured client ID in deployed environments.

In `@desci-server/src/server.ts`:
- Around line 141-144: The middleware that skips helmet for paths starting with
'/test/' should only be active in non-production; update the this.app.use
middleware and any other similar block (the one that checks
req.path.startsWith('/test/')) to first check the environment (e.g.,
process.env.NODE_ENV !== 'production' or an existing isProduction/isDev flag)
and only mount the /test UI and short-circuit helmet when running in
non-production; ensure helmet() is applied normally in production for all paths
and that the /test route itself is only registered when not in production.

In `@desci-server/src/services/r2.ts`:
- Around line 50-62: getStreamFromR2 may cast response.Body to Readable even
when it's undefined; update getStreamFromR2 to defensively check that r2Client
exists (r2Client) and that response.Body is non-null before casting: if
response.Body is undefined throw a descriptive error or return a controlled
failure (include key and bucket in message), and ensure metadata
(response.Metadata) is still returned; reference GetObjectCommand, response.Body
and the getStreamFromR2 function when locating and updating the code.

---

Nitpick comments:
In `@desci-server/src/controllers/auth/magic.ts`:
- Around line 55-58: The query in prismaClient.magicLink.findFirst within
magic.ts uses orderBy: { createdAt: 'desc' } which is inconsistent with auth.ts
(which uses orderBy: { id: 'desc' }); change the orderBy to { id: 'desc' } to
match the existing pattern and avoid subtle differences from bulk inserts or
manual timestamps—update the prismaClient.magicLink.findFirst call that assigns
latest to use id descending so the same record selection logic is used across
auth.ts and magic.ts.

In `@desci-server/src/controllers/data/downloadCentralized.ts`:
- Around line 34-43: Combine the two-step check into a single DB query: call
prisma.privateShare.findFirst with where: { shareId, nodeUUID: normalizedUuid }
(instead of querying by shareId only and then validating nodeUUID with
ensureUuidEndsWithDot), then set hasAccess = true if privateShare is returned;
remove the separate privateShare.node and ensureUuidEndsWithDot(...)
conditional. This uses the existing variables shareId, normalizedUuid,
privateShare and hasAccess and the prisma.privateShare.findFirst call.

In `@desci-server/src/controllers/data/downloadCentralizedZip.ts`:
- Around line 28-54: The access-check block duplicated from
downloadCentralized.ts should be extracted into a shared utility (e.g.,
checkCentralizedDataAccess) or middleware and then used from
downloadCentralizedZip; implement a function that takes normalizedUuid, shareId
and user?.id, performs the two checks (privateShare by shareId AND nodeUUID
match, and node by uuid+ownerId) and returns boolean, replace the inline logic
in downloadCentralizedZip with a call to that utility, remove the
ensureUuidEndsWithDot/duplicate checks from the controller and ensure the
utility performs the normalized nodeUUID comparison consistently so both
endpoints share identical behavior.

In `@desci-server/src/controllers/data/uploadCentralized.ts`:
- Around line 52-70: The current loop in uploadCentralized.ts loads each file
fully via fs.promises.readFile (file.path) and then passes fileBuffer to
uploadToR2, which will OOM for large files; change to stream large files
instead: for files over a size threshold (e.g., file.size > X) use
fs.createReadStream(file.path) and a streaming upload path from uploadToR2 (or
the R2 client) while computing the SHA-256 by piping the stream through a
crypto.createHash instance (or using a Transform that updates the hash) so you
never hold the full file in memory; keep small files using the current readFile
path for compatibility, and still unlink file.path after the upload completes;
reference uploadToR2, the for (const file of files) loop, nodeUuid/contextPath,
and the read/unlink of file.path when locating where to implement this change.

In `@desci-server/src/services/r2.ts`:
- Around line 33-48: The uploadToR2 function accepts Readable streams but does
not set ContentLength or handle multipart uploads, which can fail for very large
streams; update the code/comments for uploadToR2 and PutObjectCommand to
document this limitation and recommend one of two fixes: (1) require callers to
provide ContentLength and handle single-part uploads, or (2) buffer streams to
memory (as done in uploadCentralized.ts) or implement multipart upload logic for
large objects; reference uploadToR2 and PutObjectCommand in the comment so
callers know to either pass a Buffer/known-length stream or use a multipart
strategy.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e7775363-d5d2-4389-be89-06fb648846fb

📥 Commits

Reviewing files that changed from the base of the PR and between 06e93ac and 3460219.

⛔ Files ignored due to path filters (1)

desci-server/yarn.lock is excluded by !**/yarn.lock, !**/*.lock

📒 Files selected for processing (17)

.env.example
desci-server/kubernetes/deployment_dev.yaml
desci-server/kubernetes/deployment_prod.yaml
desci-server/package.json
desci-server/scripts/dev-local.sh
desci-server/src/controllers/auth/magic.ts
desci-server/src/controllers/data/centralizedTree.ts
desci-server/src/controllers/data/downloadCentralized.ts
desci-server/src/controllers/data/downloadCentralizedZip.ts
desci-server/src/controllers/data/uploadCentralized.ts
desci-server/src/controllers/nodes/draftCreate.ts
desci-server/src/routes/v1/auth.ts
desci-server/src/routes/v1/data.ts
desci-server/src/server.ts
desci-server/src/services/r2.ts
desci-server/src/test/centralizedData.html
docker-compose.dev.yml

.env.example

desci-server/scripts/dev-local.sh

desci-server/src/controllers/data/downloadCentralized.ts

desci-server/src/controllers/data/uploadCentralized.ts

desci-server/src/routes/v1/auth.ts

desci-server/src/server.ts

desci-server/src/services/r2.ts

coderabbitai

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

desci-server/src/server.ts (1)
190-192: ⚠️ Potential issue | 🔴 Critical

Infinite recursion in httpServer getter.

The getter references itself (return this.httpServer) instead of this.server, causing stack overflow if accessed.
🐛 Proposed fix
   get httpServer() {
-    return this.httpServer;
+    return this.server;
   }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@desci-server/src/server.ts` around lines 190 - 192, The httpServer getter
currently returns itself causing infinite recursion; change the getter in the
class that defines httpServer so it returns the underlying server instance
(this.server) instead of this.httpServer, i.e., update the httpServer getter
implementation to return this.server and ensure any backing field named server
exists and is used consistently.

🧹 Nitpick comments (1)

desci-server/scripts/dev-local.sh (1)

87-89: TypeScript compilation error handling may be misleading.

The pipeline npx tsc 2>&1 | grep -c "error TS" will output 0 when there are no TypeScript errors, displaying "(0 type errors, non-blocking)" even when compilation succeeds cleanly. Consider showing a cleaner message for successful compilation.

💡 Suggested improvement

-npx tsc 2>&1 | grep -c "error TS" | xargs -I{} echo "       TypeScript compiled ({} type errors, non-blocking)"
+TS_OUTPUT=$(npx tsc 2>&1)
+TS_ERRORS=$(echo "$TS_OUTPUT" | grep -c "error TS" || true)
+if [ "$TS_ERRORS" -gt 0 ]; then
+  echo "       TypeScript compiled with $TS_ERRORS type errors (non-blocking)"
+else
+  echo "       TypeScript compiled successfully"
+fi

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@desci-server/scripts/dev-local.sh` around lines 87 - 89, The current pipeline
using `npx tsc 2>&1 | grep -c "error TS"` always prints "(0 type errors,
non-blocking)" even on a clean build; change it to capture the grep count into a
variable (run `npx tsc` and pipe to `grep -c "error TS"`), then branch on that
variable: if zero print a clear "TypeScript compiled successfully" message,
otherwise print the existing "(N type errors, non-blocking)" message; keep the
final `exec node ./dist/index.js` behavior unchanged.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@desci-server/CENTRALIZED_DATA.md`:
- Around line 119-125: The Deployed URLs table in CENTRALIZED_DATA.md exposes a
production Test UI URL which conflicts with the requirement to gate the /test
route from production; update the documentation to either remove the Prod row
value (`https://nodes-api.desci.com/test/centralizedData.html`) or add a
clarifying note next to the Prod entry (or under the "Deployed URLs" heading)
stating that the /test route is disabled/gated in production and the listed URL
is intentionally non-functional in Prod, ensuring the table accurately reflects
the runtime gating behavior.

In `@desci-server/src/controllers/nodes/draftCreate.ts`:
- Around line 104-122: The current draft creation can write an empty string into
manifestDocumentId which breaks downstream code; update draftCreate.ts so you
only call prisma.node.update({ data: { manifestDocumentId } }) when
result.documentId is a non-empty string (i.e., truthy and length > 0) and
otherwise leave manifestDocumentId undefined/null on the node; keep the existing
logging paths (Automerge document created / creation returned empty result /
creation failed) but ensure you do not persist an empty string. Also audit
callers like getLatestManifestFromNode, getDraftManifest, getNodeManifest, and
dispatchAction to expect possible undefined/null manifestDocumentId (add guards
where they assume a DocumentId) so downstream casts do not receive an empty
string.

In `@desci-server/src/routes/v1/auth.ts`:
- Around line 50-51: The file registers the same route twice:
router.get('/orcid/validate', validateOrcid) appears on consecutive lines;
remove the duplicate registration so the route is only registered once (keep a
single router.get('/orcid/validate', validateOrcid) and delete the redundant
line), referencing the validateOrcid handler to ensure the route still points to
the correct function.

In `@desci-server/src/services/r2.ts`:
- Around line 79-100: The current implementation in the do/while loop
accumulates every object into the in-memory results array (results) while
calling r2Client.send(new ListObjectsV2Command(...)), which can OOM for large
prefixes; change the API of the enclosing service function to stream or return a
paginated response instead of materializing all entries: stop pushing every obj
into results, and instead emit or return each page (the response.Contents) to
the caller along with response.NextContinuationToken/IsTruncated so callers can
request further pages; update the function signature to return either an async
iterator/stream of pages or a {items:..., continuationToken:...} page object and
adjust callers to iterate/paginate rather than expecting a full list.

---

Outside diff comments:
In `@desci-server/src/server.ts`:
- Around line 190-192: The httpServer getter currently returns itself causing
infinite recursion; change the getter in the class that defines httpServer so it
returns the underlying server instance (this.server) instead of this.httpServer,
i.e., update the httpServer getter implementation to return this.server and
ensure any backing field named server exists and is used consistently.

---

Nitpick comments:
In `@desci-server/scripts/dev-local.sh`:
- Around line 87-89: The current pipeline using `npx tsc 2>&1 | grep -c "error
TS"` always prints "(0 type errors, non-blocking)" even on a clean build; change
it to capture the grep count into a variable (run `npx tsc` and pipe to `grep -c
"error TS"`), then branch on that variable: if zero print a clear "TypeScript
compiled successfully" message, otherwise print the existing "(N type errors,
non-blocking)" message; keep the final `exec node ./dist/index.js` behavior
unchanged.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 6794dad7-1a10-4db4-9130-e767f86a3b1f

📥 Commits

Reviewing files that changed from the base of the PR and between 3460219 and 8ed77ad.

⛔ Files ignored due to path filters (1)

desci-server/yarn.lock is excluded by !**/yarn.lock, !**/*.lock

📒 Files selected for processing (18)

.env.example
desci-server/CENTRALIZED_DATA.md
desci-server/kubernetes/deployment_dev.yaml
desci-server/kubernetes/deployment_prod.yaml
desci-server/package.json
desci-server/scripts/dev-local.sh
desci-server/src/controllers/auth/magic.ts
desci-server/src/controllers/data/centralizedTree.ts
desci-server/src/controllers/data/downloadCentralized.ts
desci-server/src/controllers/data/downloadCentralizedZip.ts
desci-server/src/controllers/data/uploadCentralized.ts
desci-server/src/controllers/nodes/draftCreate.ts
desci-server/src/routes/v1/auth.ts
desci-server/src/routes/v1/data.ts
desci-server/src/server.ts
desci-server/src/services/r2.ts
desci-server/src/test/centralizedData.html
docker-compose.dev.yml

🚧 Files skipped from review as they are similar to previous changes (9)

desci-server/src/controllers/data/downloadCentralized.ts
desci-server/src/controllers/data/uploadCentralized.ts
desci-server/kubernetes/deployment_dev.yaml
desci-server/src/controllers/data/downloadCentralizedZip.ts
desci-server/package.json
desci-server/kubernetes/deployment_prod.yaml
docker-compose.dev.yml
desci-server/src/test/centralizedData.html
desci-server/src/controllers/auth/magic.ts

desci-server/CENTRALIZED_DATA.md

desci-server/src/controllers/nodes/draftCreate.ts

desci-server/src/routes/v1/auth.ts

desci-server/src/services/r2.ts

coderabbitai

Actionable comments posted: 3

♻️ Duplicate comments (1)

desci-server/src/services/r2.ts (1)
51-52: ⚠️ Potential issue | 🟠 Major

Avoid logging raw R2 object keys for private data.

The R2 keys are derived from user-provided filenames/paths. Logging them directly exposes private dataset structure in application logs. Consider logging a redacted form, a hash, or omitting the key entirely.

Also applies to line 79.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@desci-server/src/services/r2.ts` around lines 51 - 52, The code logs the raw
R2 object key via logger.info({ key }, 'Uploaded to R2') which can leak private
filenames; update the logging to avoid printing the raw "key" variable by either
omitting it or replacing it with a redacted form (e.g., a hash or truncated
mask) before calling logger.info; locate the logger.info calls in this file (the
instances that pass the "key" variable) and replace them with a call that logs a
safe identifier (e.g., compute and log a SHA256/hex of key or log only the
first/last N chars or a static redaction token) while retaining context in the
message.

🧹 Nitpick comments (6)

desci-server/CENTRALIZED_DATA.md (1)

34-37: Consider adding language identifiers to fenced code blocks.

Several code blocks (lines 34, 41, 57, 62, 82, 93) lack language specifiers. Adding identifiers like http, bash, or text would improve syntax highlighting and satisfy markdown lint rules.

Example fix for endpoint blocks

-```
+```http
 POST /v1/auth/magic  { email }          → sends verification code

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@desci-server/CENTRALIZED_DATA.md` around lines 34 - 37, The fenced code
blocks showing HTTP endpoint examples such as "POST /v1/auth/magic  { email }   
→ sends verification code" and similar blocks around the file should include
language identifiers for syntax highlighting and markdown-lint compliance;
update those backtick-fenced blocks (e.g., the endpoint examples at lines
showing POST /v1/auth/magic and the other blocks referenced) to use an
appropriate language tag like ```http or ```text so the blocks render correctly
and pass lint rules.

desci-server/src/controllers/data/downloadCentralized.ts (1)

57-65: Add error handling for stream pipe failures.

If the R2 stream encounters an error mid-transfer (network issue, client disconnect), the error is not caught, which could leave the response in an inconsistent state.

Suggested improvement

-    stream.pipe(res);
+    stream.on('error', (err) => {
+      logger.error({ err }, 'R2 stream error during download');
+      if (!res.headersSent) {
+        res.status(500).json({ ok: false, message: 'Download failed' });
+      } else {
+        res.destroy();
+      }
+    });
+    stream.pipe(res);
   } catch (err: any) {

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@desci-server/src/controllers/data/downloadCentralized.ts` around lines 57 -
65, The current download handler pipes the R2 Readable (stream) to the Express
response via stream.pipe(res) but doesn't handle stream or response
errors/aborts, so mid-transfer failures are uncaught; update the code around
stream.pipe(res) and the catch block (referencing stream.pipe(res), the
surrounding try/catch and logger.error) to attach listeners: listen for 'error'
on the R2 stream to log the error (using logger.error) and respond/cleanup
(destroy/unpipe the stream and end the response if not finished), listen for
'close'/'aborted' on res to destroy the stream to stop work, and ensure any
listeners are removed after finish to avoid leaks; also convert any uncaught
stream errors to a 500 JSON response when possible and include the error details
in logs.

desci-server/prisma/schema.prisma (1)

584-601: NodeDataGrant model is well-designed.

The soft-revoke pattern (revokedAt instead of deletion) is a good choice for audit trails. The composite unique constraint on [nodeUUID, granteeId] correctly enforces one grant per user per node.

Consider adding an index on grantedById if you anticipate queries like "list all grants I've given" becoming common:
@@index([grantedById])
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@desci-server/prisma/schema.prisma` around lines 584 - 601, The schema's
NodeDataGrant model is missing an index on grantedById which will slow queries
that list grants by the grantor; update the NodeDataGrant model (the block
defining NodeDataGrant with fields id, createdAt, updatedAt, nodeUUID,
granteeId, grantedById, etc.) to add an @@index([grantedById]) directive
alongside the existing @@unique and @@index entries so queries like "list all
grants I've given" are efficient.

desci-server/src/controllers/data/uploadCentralized.ts (2)

60-66: Unsafe filename rejection silently skips files.

When a file has an unsafe name, it's skipped with a warning log but the upload continues. The response will show fewer files uploaded than provided. Consider returning an error for the specific file in the response so clients know which files were rejected.

🔧 Suggested improvement

+  const rejected: { name: string; reason: string }[] = [];
+
   try {
     for (const file of files) {
       if (UNSAFE_PATH_PATTERN.test(file.originalname) || file.originalname.startsWith('/')) {
         logger.warn({ fileName: file.originalname }, 'Rejected unsafe filename');
+        rejected.push({ name: file.originalname, reason: 'unsafe filename' });
         await fs.promises.unlink(file.path).catch(() => {});
         continue;
       }
       // ... rest of loop
     }
-    return res.status(200).json({ ok: true, files: uploaded });
+    return res.status(200).json({ ok: true, files: uploaded, rejected });

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@desci-server/src/controllers/data/uploadCentralized.ts` around lines 60 - 66,
The loop that checks UNSAFE_PATH_PATTERN currently logs and unlinks unsafe files
but silently skips them; update the upload handling in the same function (the
for (const file of files) loop that calls UNSAFE_PATH_PATTERN.test and
logger.warn) to record each rejected file (e.g., push { fileName:
file.originalname, error: 'unsafe filename' } into a rejectedFiles array or add
an error entry into the per-file results array) before unlinking, and ensure the
final response includes that rejectedFiles/errors list alongside successfully
uploaded files so clients can see which files were rejected and why.

73-90: Streaming logic doesn't provide memory benefits for large files.

The streaming branch for files >50MB still accumulates all chunks in memory (chunks.push(chunk)) before calling Buffer.concat. This negates the memory-efficiency benefit of streaming.

For true streaming uploads, consider piping directly to R2 with a pass-through hash stream, or accepting the current implementation if R2 SDK requires a complete buffer.

🔧 If R2 SDK supports stream uploads

       if (file.size > STREAM_THRESHOLD) {
         const hashStream = crypto.createHash('sha256');
-        const chunks: Buffer[] = [];
-        const collector = new Transform({
-          transform(chunk, _encoding, cb) {
-            hashStream.update(chunk);
-            chunks.push(chunk);
-            cb(null, chunk);
-          },
-        });
         const readStream = fs.createReadStream(file.path);
-        await pipeline(readStream, collector);
+        // Compute hash first
+        await pipeline(readStream, hashStream);
         hash = hashStream.digest('hex');
-        const fileBuffer = Buffer.concat(chunks);
+        // Then read file for upload (or use stream if R2 SDK supports it)
+        const fileBuffer = await fs.promises.readFile(file.path);
         await uploadToR2(r2Key, fileBuffer, {

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@desci-server/src/controllers/data/uploadCentralized.ts` around lines 73 - 90,
The streaming branch for files > STREAM_THRESHOLD is defeating memory savings by
collecting chunks into the chunks array and calling Buffer.concat; modify the
logic so the Transform "collector" (or a PassThrough) pipes directly into
uploadToR2 without accumulating buffers while still updating the
crypto.createHash('sha256') as data flows, compute hashStream.digest('hex')
after the stream ends, and pass the resulting stream (or PassThrough) to
uploadToR2 (r2Key, stream, { 'content-hash': hash, 'mime-type': ... }); if the
uploadToR2 API requires a full Buffer, keep the current fallback but
document/branch clearly so large-file streaming uses the stream path.

desci-server/src/controllers/nodes/dataGrants.ts (1)

131-155: listMyGrants may return grants for deleted nodes.

Consider filtering out grants where the associated node is deleted:

🔧 Suggested filter

   const grants = await prisma.nodeDataGrant.findMany({
-    where: { granteeId: user.id, revokedAt: null },
+    where: { granteeId: user.id, revokedAt: null, node: { isDeleted: false } },
     include: {
       node: { select: { uuid: true, title: true } },
       grantedBy: { select: { email: true, name: true } },
     },
     orderBy: { createdAt: 'desc' },
   });

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@desci-server/src/controllers/nodes/dataGrants.ts` around lines 131 - 155, The
current listMyGrants returns grants even if the related node was soft-deleted;
update the prisma query inside listMyGrants to only fetch grants whose node is
not deleted by adding a relation filter (e.g., include node: { deletedAt: null }
in the where clause) and defensively handle the mapping (ensure g.node exists
before using g.node.uuid/title) so deleted or missing nodes are excluded from
the response.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@desci-server/src/controllers/auth/magic.ts`:
- Around line 54-63: The current logic fetches a development token into devCode
unconditionally in non-production, which can be returned even if the send failed
(ok is false); modify the block around prismaClient.magicLink.findFirst so that
you only assign or include devCode when ok is true (i.e., after the send
succeeded). Concretely: after determining ok, if process.env.NODE_ENV !==
'production' and ok is true, call prismaClient.magicLink.findFirst({ where: {
email: { equals: cleanEmail, mode: 'insensitive' } }, orderBy: { id: 'desc' } })
and set devCode = latest?.token, then return res.send({ ok: !!ok, ...(devCode ?
{ devCode } : {}) }) as before so stale tokens are not exposed when ok is false.

In `@desci-server/src/controllers/nodes/dataGrants.ts`:
- Around line 71-73: The prisma.node.findFirst lookup in revokeDataGrant and
listDataGrants is missing the same isDeleted: false condition used in
createDataGrant, which allows operating on deleted nodes; update the where
clause in the node lookup inside revokeDataGrant and listDataGrants to include
isDeleted: false (matching the filter in createDataGrant) so both
revokeDataGrant and listDataGrants ignore deleted nodes when resolving the node
by normalizedUuid and owner.id.

In `@desci-server/src/test/centralizedData.html`:
- Around line 1166-1170: testTreeDeniedNoAuth currently treats 200 and 403 as
success so it never fails; update the test so that it only returns success when
access is denied (i.e. resp.status === 403) instead of allowing resp.ok, and if
you also need to verify published nodes return 200 create a separate test (e.g.,
testTreeAllowedWithAuth or testTreePublishedReturns200) that asserts resp.ok for
the published-node scenario; locate the change in the async function
testTreeDeniedNoAuth and adjust the return condition accordingly.

---

Duplicate comments:
In `@desci-server/src/services/r2.ts`:
- Around line 51-52: The code logs the raw R2 object key via logger.info({ key
}, 'Uploaded to R2') which can leak private filenames; update the logging to
avoid printing the raw "key" variable by either omitting it or replacing it with
a redacted form (e.g., a hash or truncated mask) before calling logger.info;
locate the logger.info calls in this file (the instances that pass the "key"
variable) and replace them with a call that logs a safe identifier (e.g.,
compute and log a SHA256/hex of key or log only the first/last N chars or a
static redaction token) while retaining context in the message.

---

Nitpick comments:
In `@desci-server/CENTRALIZED_DATA.md`:
- Around line 34-37: The fenced code blocks showing HTTP endpoint examples such
as "POST /v1/auth/magic  { email }          → sends verification code" and
similar blocks around the file should include language identifiers for syntax
highlighting and markdown-lint compliance; update those backtick-fenced blocks
(e.g., the endpoint examples at lines showing POST /v1/auth/magic and the other
blocks referenced) to use an appropriate language tag like ```http or ```text so
the blocks render correctly and pass lint rules.

In `@desci-server/prisma/schema.prisma`:
- Around line 584-601: The schema's NodeDataGrant model is missing an index on
grantedById which will slow queries that list grants by the grantor; update the
NodeDataGrant model (the block defining NodeDataGrant with fields id, createdAt,
updatedAt, nodeUUID, granteeId, grantedById, etc.) to add an
@@index([grantedById]) directive alongside the existing @@unique and @@index
entries so queries like "list all grants I've given" are efficient.

In `@desci-server/src/controllers/data/downloadCentralized.ts`:
- Around line 57-65: The current download handler pipes the R2 Readable (stream)
to the Express response via stream.pipe(res) but doesn't handle stream or
response errors/aborts, so mid-transfer failures are uncaught; update the code
around stream.pipe(res) and the catch block (referencing stream.pipe(res), the
surrounding try/catch and logger.error) to attach listeners: listen for 'error'
on the R2 stream to log the error (using logger.error) and respond/cleanup
(destroy/unpipe the stream and end the response if not finished), listen for
'close'/'aborted' on res to destroy the stream to stop work, and ensure any
listeners are removed after finish to avoid leaks; also convert any uncaught
stream errors to a 500 JSON response when possible and include the error details
in logs.

In `@desci-server/src/controllers/data/uploadCentralized.ts`:
- Around line 60-66: The loop that checks UNSAFE_PATH_PATTERN currently logs and
unlinks unsafe files but silently skips them; update the upload handling in the
same function (the for (const file of files) loop that calls
UNSAFE_PATH_PATTERN.test and logger.warn) to record each rejected file (e.g.,
push { fileName: file.originalname, error: 'unsafe filename' } into a
rejectedFiles array or add an error entry into the per-file results array)
before unlinking, and ensure the final response includes that
rejectedFiles/errors list alongside successfully uploaded files so clients can
see which files were rejected and why.
- Around line 73-90: The streaming branch for files > STREAM_THRESHOLD is
defeating memory savings by collecting chunks into the chunks array and calling
Buffer.concat; modify the logic so the Transform "collector" (or a PassThrough)
pipes directly into uploadToR2 without accumulating buffers while still updating
the crypto.createHash('sha256') as data flows, compute hashStream.digest('hex')
after the stream ends, and pass the resulting stream (or PassThrough) to
uploadToR2 (r2Key, stream, { 'content-hash': hash, 'mime-type': ... }); if the
uploadToR2 API requires a full Buffer, keep the current fallback but
document/branch clearly so large-file streaming uses the stream path.

In `@desci-server/src/controllers/nodes/dataGrants.ts`:
- Around line 131-155: The current listMyGrants returns grants even if the
related node was soft-deleted; update the prisma query inside listMyGrants to
only fetch grants whose node is not deleted by adding a relation filter (e.g.,
include node: { deletedAt: null } in the where clause) and defensively handle
the mapping (ensure g.node exists before using g.node.uuid/title) so deleted or
missing nodes are excluded from the response.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 39e419e5-03c8-4fa6-a7f2-fea6495d5514

📥 Commits

Reviewing files that changed from the base of the PR and between 8ed77ad and 101f2b9.

📒 Files selected for processing (19)

.env.example
desci-server/CENTRALIZED_DATA.md
desci-server/prisma/migrations/20260317105928_add_node_data_grant/migration.sql
desci-server/prisma/schema.prisma
desci-server/scripts/dev-local.sh
desci-server/src/controllers/auth/magic.ts
desci-server/src/controllers/data/centralizedTree.ts
desci-server/src/controllers/data/downloadCentralized.ts
desci-server/src/controllers/data/downloadCentralizedZip.ts
desci-server/src/controllers/data/uploadCentralized.ts
desci-server/src/controllers/nodes/dataGrants.ts
desci-server/src/controllers/nodes/draftCreate.ts
desci-server/src/routes/v1/auth.ts
desci-server/src/routes/v1/data.ts
desci-server/src/routes/v1/nodes.ts
desci-server/src/server.ts
desci-server/src/services/centralizedDataAccess.ts
desci-server/src/services/r2.ts
desci-server/src/test/centralizedData.html

🚧 Files skipped from review as they are similar to previous changes (3)

desci-server/src/routes/v1/auth.ts
desci-server/scripts/dev-local.sh
desci-server/src/controllers/data/centralizedTree.ts

desci-server/src/controllers/auth/magic.ts

desci-server/src/controllers/nodes/dataGrants.ts

desci-server/src/test/centralizedData.html

…s + test UI overhaul Add user-level access grants for centralized data so workflows can authenticate as users who have been explicitly granted access by the node owner, solving the problem of shareId exposure in FDOs. New features: - NodeDataGrant Prisma model with soft-revoke (revokedAt) - Shared access check helper (centralizedDataAccess.ts) replacing duplicated auth logic in download endpoints - Grant management CRUD endpoints (POST/DELETE/GET /v1/nodes/grant) - Tree endpoint gated for unpublished nodes - Test UI: "Shared With Me" panel, grant management UI, curl reference Security fixes: - Path traversal validation on upload contextPath and filenames - Content-Disposition header injection sanitization (RFC5987) - Helmet + test UI gated from production (NODE_ENV check) - R2 getStreamFromR2 null body check - httpServer getter infinite recursion fix - Real R2 credentials removed from .env.example Bug fixes: - Duplicate /orcid/validate route removed - Empty manifestDocumentId guard in draftCreate - GOOGLE_CLIENT_ID_AUTH → GOOGLE_CLIENT_ID - magic.ts orderBy consistency (id desc) Improvements: - listR2Objects converted to async generator (listR2ObjectsPages) - Streaming upload for files >50MB - dev-local.sh: platform-aware Sentry binary, ipfs_public/ipfs_guest services, improved TypeScript error reporting

- magic.ts: only fetch devCode when send succeeded (ok is true) - dataGrants: add isDeleted:false to revokeDataGrant/listDataGrants, filter deleted nodes from listMyGrants - downloadCentralized: add stream error + response close handlers - uploadCentralized: report rejected files in response, fix streaming branch to not accumulate chunks in memory (two-pass: hash then stream) - r2.ts: redact file keys in log output (show only prefix) - schema: add @@index([grantedById]) to NodeDataGrant - testTreeDeniedNoAuth: only pass on 403 (not 200) - CENTRALIZED_DATA.md: add language tags to fenced code blocks

coderabbitai

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@desci-server/src/controllers/data/downloadCentralized.ts`:
- Around line 31-38: Move the await checkCentralizedDataAccess(normalizedUuid,
shareId, user) call into the existing try/catch inside the downloadCentralized
controller so any thrown errors are caught by the existing catch handler;
specifically, call and await checkCentralizedDataAccess within the try block,
preserve the hasAccess check and the res.status(403).json({ ok: false, message:
... }) early-return behavior, and rely on the current catch block to handle
unexpected exceptions from checkCentralizedDataAccess.

In `@desci-server/src/controllers/data/uploadCentralized.ts`:
- Around line 34-38: The validation currently assumes req.body.contextPath is a
string and calls contextPath.startsWith(...) which will throw for non-strings;
update the check around contextPath (used in this file and the variable
contextPath and the UNSAFE_PATH_PATTERN) to first ensure typeof
req.body.contextPath === 'string' (or coerce safely) before assigning/using
contextPath, and return the same 400 validation response for non-string values;
adjust the if condition that uses contextPath.startsWith('/') and
UNSAFE_PATH_PATTERN.test(contextPath) to only run when contextPath is a string
to avoid a 500 error.
- Around line 12-13: The multer setup (localUpload and uploadMiddleware) has no
limits and can exhaust tmp storage; update the multer configuration to include a
limits object (e.g., limits: { fileSize: <maxBytes>, files: <maxFiles> }) or use
multer(...).array('files', <maxFiles>) to cap per-request file count and
per-file size, keeping preservePath and dest as-is; adjust <maxBytes>/<maxFiles>
to values appropriate for your app and ensure uploadMiddleware still references
the same symbol (uploadMiddleware) after the change.
- Line 36: Replace the hex escape sequence in the UNSAFE_PATH_PATTERN regex with
a unicode escape to satisfy Biome linting: update the constant
UNSAFE_PATH_PATTERN (currently /(\.\.|\\|[\x00-\x1f])/) so the control-character
range uses unicode escapes (e.g., [\u0000-\u001f]) while keeping the same
overall pattern and behavior.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9340fbc9-9f42-4073-8dd1-092fbca1fc48

📥 Commits

Reviewing files that changed from the base of the PR and between 101f2b9 and 4c3dd6e.

📒 Files selected for processing (9)

desci-server/CENTRALIZED_DATA.md
desci-server/prisma/migrations/20260317112936_add_granted_by_index/migration.sql
desci-server/prisma/schema.prisma
desci-server/src/controllers/auth/magic.ts
desci-server/src/controllers/data/downloadCentralized.ts
desci-server/src/controllers/data/uploadCentralized.ts
desci-server/src/controllers/nodes/dataGrants.ts
desci-server/src/services/r2.ts
desci-server/src/test/centralizedData.html

🚧 Files skipped from review as they are similar to previous changes (4)

desci-server/src/controllers/auth/magic.ts
desci-server/src/test/centralizedData.html
desci-server/src/controllers/nodes/dataGrants.ts
desci-server/CENTRALIZED_DATA.md

desci-server/src/controllers/data/downloadCentralized.ts

coderabbitai · 2026-03-17T11:46:34Z

desci-server/src/controllers/data/uploadCentralized.ts

+const localUpload = multer({ dest: os.tmpdir(), preservePath: true });
+const uploadMiddleware = localUpload.array('files');


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

fd -t f "uploadCentralized.ts" --exec wc -l {}

Repository: desci-labs/nodes

Length of output: 120

🏁 Script executed:

fd -t f "uploadCentralized.ts" --exec cat -n {}

Repository: desci-labs/nodes

Length of output: 5286

🏁 Script executed:

rg "CENTRALIZED_UPLOAD_MAX" -t ts -t js

Repository: desci-labs/nodes

Length of output: 42

🏁 Script executed:

rg "process\.env\." desci-server/src/controllers/data/uploadCentralized.ts -A 2 -B 2

Repository: desci-labs/nodes

Length of output: 42

Add upload limits to prevent tmp-dir exhaustion.

multer({ dest: os.tmpdir(), preservePath: true }) currently has no limits on file count or file size. A single request can write unlimited data to the temporary directory before files are uploaded to R2, risking disk exhaustion and denial of service.

Suggested fix

-const localUpload = multer({ dest: os.tmpdir(), preservePath: true }); +const localUpload = multer({ + dest: os.tmpdir(), + preservePath: true, + limits: { + files: Number(process.env.CENTRALIZED_UPLOAD_MAX_FILES ?? 100), + fileSize: Number(process.env.CENTRALIZED_UPLOAD_MAX_FILE_SIZE ?? 1024 * 1024 * 1024), // 1 GiB default + }, +});

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

const localUpload = multer({ dest: os.tmpdir(), preservePath: true });

const uploadMiddleware = localUpload.array('files');

const localUpload = multer({

dest: os.tmpdir(),

preservePath: true,

limits: {

files: Number(process.env.CENTRALIZED_UPLOAD_MAX_FILES ?? 100),

fileSize: Number(process.env.CENTRALIZED_UPLOAD_MAX_FILE_SIZE ?? 1024 * 1024 * 1024), // 1 GiB default

},

});

const uploadMiddleware = localUpload.array('files');

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@desci-server/src/controllers/data/uploadCentralized.ts` around lines 12 - 13, The multer setup (localUpload and uploadMiddleware) has no limits and can exhaust tmp storage; update the multer configuration to include a limits object (e.g., limits: { fileSize: <maxBytes>, files: <maxFiles> }) or use multer(...).array('files', <maxFiles>) to cap per-request file count and per-file size, keeping preservePath and dest as-is; adjust <maxBytes>/<maxFiles> to values appropriate for your app and ensure uploadMiddleware still references the same symbol (uploadMiddleware) after the change.

coderabbitai · 2026-03-17T11:46:34Z

desci-server/src/controllers/data/uploadCentralized.ts

+  const contextPath = req.body.contextPath || '';
+
+  const UNSAFE_PATH_PATTERN = /(\.\.|\\|[\x00-\x1f])/;
+  if (contextPath && (contextPath.startsWith('/') || UNSAFE_PATH_PATTERN.test(contextPath))) {
+    return res.status(400).send({ ok: false, message: 'Invalid contextPath: must not contain path traversal sequences, backslashes, or control characters' });


⚠️ Potential issue | 🟠 Major

Guard contextPath type before string operations.

If req.body.contextPath is non-string (e.g., array/object), contextPath.startsWith(...) throws and returns a 500 instead of a 400 validation error.

Suggested fix

- const contextPath = req.body.contextPath || ''; + const rawContextPath = req.body.contextPath; + if (rawContextPath != null && typeof rawContextPath !== 'string') { + return res.status(400).send({ ok: false, message: 'Invalid contextPath: must be a string' }); + } + const contextPath = rawContextPath ?? '';

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

const contextPath = req.body.contextPath || '';

const UNSAFE_PATH_PATTERN = /(\.\.|\\|[\x00-\x1f])/;

if (contextPath && (contextPath.startsWith('/') || UNSAFE_PATH_PATTERN.test(contextPath))) {

return res.status(400).send({ ok: false, message: 'Invalid contextPath: must not contain path traversal sequences, backslashes, or control characters' });

const rawContextPath = req.body.contextPath;

if (rawContextPath != null && typeof rawContextPath !== 'string') {

return res.status(400).send({ ok: false, message: 'Invalid contextPath: must be a string' });

}

const contextPath = rawContextPath ?? '';

const UNSAFE_PATH_PATTERN = /(\.\.|\\|[\x00-\x1f])/;

if (contextPath && (contextPath.startsWith('/') || UNSAFE_PATH_PATTERN.test(contextPath))) {

return res.status(400).send({ ok: false, message: 'Invalid contextPath: must not contain path traversal sequences, backslashes, or control characters' });

🧰 Tools

🪛 Biome (2.4.7)

[error] 36-36: Unexpected control character in a regular expression.

(lint/suspicious/noControlCharactersInRegex)

[error] 36-36: Unexpected control character in a regular expression.

(lint/suspicious/noControlCharactersInRegex)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@desci-server/src/controllers/data/uploadCentralized.ts` around lines 34 - 38, The validation currently assumes req.body.contextPath is a string and calls contextPath.startsWith(...) which will throw for non-strings; update the check around contextPath (used in this file and the variable contextPath and the UNSAFE_PATH_PATTERN) to first ensure typeof req.body.contextPath === 'string' (or coerce safely) before assigning/using contextPath, and return the same 400 validation response for non-string values; adjust the if condition that uses contextPath.startsWith('/') and UNSAFE_PATH_PATTERN.test(contextPath) to only run when contextPath is a string to avoid a 500 error.

coderabbitai · 2026-03-17T11:46:34Z

desci-server/src/controllers/data/uploadCentralized.ts

+  const node = (req as any).node;
+  const contextPath = req.body.contextPath || '';
+
+  const UNSAFE_PATH_PATTERN = /(\.\.|\\|[\x00-\x1f])/;


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # First, let's find and examine the file find . -name "uploadCentralized.ts" -type f

Repository: desci-labs/nodes

Length of output: 116

🏁 Script executed:

#!/bin/bash # Check the file content around line 36 cat -n desci-server/src/controllers/data/uploadCentralized.ts | sed -n '30,45p'

Repository: desci-labs/nodes

Length of output: 850

🏁 Script executed:

#!/bin/bash # Look for Biome configuration in the repository find . -name "biome.json" -o -name "biome.jsonc" -o -name ".biomerc*" | head -10

Repository: desci-labs/nodes

Length of output: 72

🏁 Script executed:

#!/bin/bash # Check the Biome configuration cat biome.json

Repository: desci-labs/nodes

Length of output: 1834

🏁 Script executed:

#!/bin/bash # Search for any lint rules related to control characters or regex in Biome config cat biome.json | jq . 2>/dev/null || cat biome.json

Repository: desci-labs/nodes

Length of output: 1834

🏁 Script executed:

#!/bin/bash # Let's verify that the regex patterns are functionally equivalent # Testing if \x00-\x1f and \u0000-\u001f produce the same character ranges python3 << 'EOF' import re # Create regex patterns using hex and unicode escapes hex_pattern = r'[\x00-\x1f]' unicode_pattern = r'[\u0000-\u001f]' # Create actual compiled regex objects # Note: In Python, we need to use raw strings carefully hex_regex = re.compile('[\x00-\x1f]') unicode_regex = re.compile('[\u0000-\u001f]') # Test characters in the range test_chars = [ '\x00', # null '\x0a', # newline '\x1f', # unit separator (last control char) '\x20', # space (first non-control) 'a' # regular char ] print("Testing character matching:") print(f"{'Char':<10} {'Hex Match':<15} {'Unicode Match':<15}") print("-" * 40) all_match = True for char in test_chars: hex_match = bool(hex_regex.search(char)) unicode_match = bool(unicode_regex.search(char)) match_status = "✓" if hex_match == unicode_match else "✗" print(f"{repr(char):<10} {str(hex_match):<15} {str(unicode_match):<15} {match_status}") if hex_match != unicode_match: all_match = False print("\nBoth patterns are functionally equivalent:" if all_match else "\nPatterns differ!") EOF

Repository: desci-labs/nodes

Length of output: 440

Use unicode escapes instead of hex escapes in regex to comply with Biome linting.

The regex pattern with hex escape sequences [\x00-\x1f] triggers Biome's control character lint rule. Replace with unicode escapes [\u0000-\u001f] for better readability and compliance—both patterns are functionally identical.

Suggested fix

- const UNSAFE_PATH_PATTERN = /(\.\.|\\|[\x00-\x1f])/; + const UNSAFE_PATH_PATTERN = /(\.\.|\\|[\u0000-\u001f])/;

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

const UNSAFE_PATH_PATTERN = /(\.\.|\\|[\x00-\x1f])/;

const UNSAFE_PATH_PATTERN = /(\.\.|\\|[\u0000-\u001f])/;

🧰 Tools

🪛 Biome (2.4.7)

[error] 36-36: Unexpected control character in a regular expression.

(lint/suspicious/noControlCharactersInRegex)

[error] 36-36: Unexpected control character in a regular expression.

(lint/suspicious/noControlCharactersInRegex)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@desci-server/src/controllers/data/uploadCentralized.ts` at line 36, Replace the hex escape sequence in the UNSAFE_PATH_PATTERN regex with a unicode escape to satisfy Biome linting: update the constant UNSAFE_PATH_PATTERN (currently /(\.\.|\\|[\x00-\x1f])/) so the control-character range uses unicode escapes (e.g., [\u0000-\u001f]) while keeping the same overall pattern and behavior.

coderabbitai

Actionable comments posted: 9

♻️ Duplicate comments (4)

desci-server/src/controllers/data/uploadCentralized.ts (3)

34-38: ⚠️ Potential issue | 🟠 Major

Validate contextPath is a string before using string methods.

A non-string multipart value here still falls through to startsWith()/.test(), which turns a bad request into a 500 instead of a 400.

Minimal guard

-  const contextPath = req.body.contextPath || '';
+  const rawContextPath = req.body.contextPath;
+  if (rawContextPath != null && typeof rawContextPath !== 'string') {
+    return res.status(400).send({ ok: false, message: 'Invalid contextPath: must be a string' });
+  }
+  const contextPath = rawContextPath ?? '';

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@desci-server/src/controllers/data/uploadCentralized.ts` around lines 34 - 38,
The handler currently assumes req.body.contextPath is a string and calls
contextPath.startsWith and UNSAFE_PATH_PATTERN.test, which throws on
non-strings; update the validation around the contextPath variable in
uploadCentralized.ts to first check typeof contextPath === 'string' (or
explicitly coerce only after that check) and return a 400 error for non-string
values before running contextPath.startsWith('/') or
UNSAFE_PATH_PATTERN.test(contextPath), referencing the existing contextPath
variable and UNSAFE_PATH_PATTERN to locate where to add the guard.

12-13: ⚠️ Potential issue | 🟠 Major

Cap the temp upload size/count before Multer writes to disk.

This middleware still accepts an unbounded number of files and per-file size, so a single request can exhaust os.tmpdir() before the controller gets a chance to reject it.

Add Multer limits

-const localUpload = multer({ dest: os.tmpdir(), preservePath: true });
+const localUpload = multer({
+  dest: os.tmpdir(),
+  preservePath: true,
+  limits: {
+    files: Number(process.env.CENTRALIZED_UPLOAD_MAX_FILES ?? 100),
+    fileSize: Number(process.env.CENTRALIZED_UPLOAD_MAX_FILE_SIZE ?? 1024 * 1024 * 1024),
+  },
+});

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@desci-server/src/controllers/data/uploadCentralized.ts` around lines 12 - 13,
The multer configuration (localUpload / uploadMiddleware) currently allows an
unbounded number of files and per-file size which can exhaust os.tmpdir();
update the localUpload initialization to include a sensible limits object (e.g.,
limits: { files: <maxFiles>, fileSize: <maxBytesPerFile> } or use configured
constants) so Multer enforces the cap before writing to disk, and keep
uploadMiddleware = localUpload.array('files') unchanged; ensure you choose
appropriate maxFiles and maxBytesPerFile values (or read them from config/env)
and handle the resulting Multer errors in the controller.

36-36: ⚠️ Potential issue | 🟠 Major

Biome will reject UNSAFE_PATH_PATTERN as written.

Line 36 still uses control-character hex escapes that Biome flags as noControlCharactersInRegex.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@desci-server/src/controllers/data/uploadCentralized.ts` at line 36,
UNSAFE_PATH_PATTERN uses hex control-character escapes ([\x00-\x1f]) which Biome
flags; replace that character class with a Unicode property escape (e.g. use
\p{Cc} to match control characters) and enable the Unicode flag on the regex
(change /.../ to /.../u) so the pattern becomes equivalent but Biome-safe;
update the constant UNSAFE_PATH_PATTERN accordingly and keep the existing checks
for ".." and backslash intact.

desci-server/src/controllers/data/downloadCentralized.ts (1)

31-38: ⚠️ Potential issue | 🟠 Major

Wrap the access check in the existing error boundary.

checkCentralizedDataAccess() still runs before the try, so a thrown auth/data-grant lookup skips the controller's catch path and bubbles out as an unhandled failure.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@desci-server/src/controllers/data/downloadCentralized.ts` around lines 31 -
38, The call to checkCentralizedDataAccess (which returns hasAccess) is executed
outside the controller's try block so any exceptions it throws bypass the
controller's catch; move the checkCentralizedDataAccess invocation (using
normalizedUuid, shareId, user) inside the existing try block or wrap it in the
same try so thrown errors are caught by the controller's catch, and retain the
existing behavior that sends res.status(403).json(...) when hasAccess is false;
ensure the catch then handles and logs unexpected errors consistently with the
rest of the controller.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@desci-server/scripts/dev-local.sh`:
- Line 2: The script's error handling is masking failures because pipelines with
tail can return a zero exit status; update the shell options to enable pipefail
so upstream command failures (e.g., npx prisma generate and npx prisma migrate
deploy) cause the script to exit. Replace the top-level "set -e" with a form
that enables pipefail (e.g., add "-o pipefail" or use "set -euo pipefail") and
also update the later "set" invocation around the pipelines (the lines near the
prisma commands/tail usage) to include pipefail so the pipeline exit status
reflects failures.

In `@desci-server/src/controllers/data/centralizedTree.ts`:
- Around line 91-108: The Prisma query prisma.node.findFirst(...) and the access
check call checkCentralizedDataAccess(...) should be executed inside the
existing try block so any thrown errors are caught and handled by the structured
500 response; move the retrieval of node (using normalizedUuid) and the
subsequent isPublished logic (including reading req.query.shareId and invoking
checkCentralizedDataAccess) into the try scope and ensure that on unexpected
exceptions you return the same res.status(500).json(...) path as other try/catch
branches.
- Around line 114-124: The path filter currently uses
startsWith(`${normalizedUuid}/${pathFilter}`) which matches partial names (e.g.,
"foo" matches "foobar.txt"); change the filtering to anchor to a directory
boundary by normalizing the filterPrefix to include a trailing slash (e.g.,
`${normalizedUuid}/${pathFilter}/`) when treating pathFilter as a directory, or
add an exact-file branch that checks equality against the full key when
pathFilter refers to a file; update the computation of filterPrefix,
filteredObjects, and treePrefix/normalizedTreePrefix (the variables and
functions involved are pathFilter, normalizedUuid, filteredObjects, treePrefix,
normalizedTreePrefix, and buildTreeSimple) so buildTreeSimple receives only keys
rooted at the correct directory or the single exact file.

In `@desci-server/src/controllers/data/downloadCentralized.ts`:
- Around line 45-47: The regex in the sanitizedFileName computation (the two
.replace calls on fileName) contains control-character hex escapes that trigger
Biome's noControlCharactersInRegex rule; either add a Biome ignore directive
immediately above the offending line (e.g., an inline biome-ignore for
noControlCharactersInRegex) or refactor the sanitization to avoid explicit
control-character ranges (for example, use an explicit allowlist of safe
printable characters or Unicode property classes) so the replace calls no longer
contain \x00-\x1f or similar ranges; update the code around
sanitizedFileName/fileName accordingly.

In `@desci-server/src/controllers/data/downloadCentralizedZip.ts`:
- Around line 29-36: Move the await checkCentralizedDataAccess(normalizedUuid,
shareId, user) call inside the existing try block so any exceptions from
auth/data-grant lookup are caught by the controller's catch; specifically, wrap
the invocation of checkCentralizedDataAccess (and the subsequent hasAccess check
that sends the 403 via res.status(403).json(...)) inside the try in
downloadCentralizedZip, preserving the existing 403 response when hasAccess is
false and ensuring thrown errors flow to the catch handler.
- Around line 68-74: The code currently appends untrusted relativePath into
archive (see variables relativePath and archive.append inside the loop that
iterates objects and calls getStreamFromR2), so sanitize and validate the entry
name before appending: convert backslashes to forward slashes, run
path.posix.normalize on the result, strip any leading slashes, remove control
characters (e.g. non-printable chars via a regex), then reject or skip the entry
if normalize produces empty string or contains any ".." path segment (or starts
with "..") to prevent traversal; only after these checks use the sanitized name
when calling archive.append(stream, { name: safeName }). Ensure this logic lives
just before archive.append in downloadCentralizedZip.ts.
- Around line 48-50: The regexes use control-character hex escapes that violate
Biome's noControlCharactersInRegex rule; update the sanitize logic (the
sanitizedZipName assignment that reads zipName...) to use Unicode escapes
instead, e.g. replace /[\x00-\x1f\x7f"\\]/g with /[\u0000-\u001F\u007F"\\]/g and
/[^\x20-\x7e]/g with /[^\u0020-\u007E]/g, and make the same change in
downloadCentralized.ts so the file lints cleanly under the recommended Biome
rules.

In `@desci-server/src/controllers/nodes/draftCreate.ts`:
- Around line 39-52: The current try/catch wraps both makeManifest and
addBufferToIpfs so manifest construction errors are misclassified as IPFS
outages; split the logic so makeManifest(...) is called first and allowed to
throw (so invalid input/regressions surface) and then call addBufferToIpfs(...)
inside its own try/catch. Concretely: call makeManifest({ title, researchFields,
defaultLicense: defaultLicense || '', ipfsNode: getNodeToUse(isGuest) }) and
assign researchObject from manifestResult; then in a separate try block call
addBufferToIpfs(manifestResult.manifest, '', getNodeToUse(isGuest)) and on
failure catch only IPFS errors (log via logger.warn with err) and leave hash/uri
unset/default. Ensure references to makeManifest, manifestResult,
addBufferToIpfs, researchObject, hash, uri and getNodeToUse(isGuest) are updated
accordingly.

In `@desci-server/src/routes/v1/data.ts`:
- Line 32: The current middleware order runs Multer (centralizedUploadHandler)
before authorization (ensureNodeAccess), causing orphan temp files on rejection;
either move ensureNodeAccess to run before centralizedUploadHandler in the
router registration (so Multer never writes unless authorized), or add cleanup
logic inside ensureNodeAccess to delete any uploaded temp files (inspect
req.file/req.files or the Multer temp paths and unlink them) before returning
the missing-uuid or unauthorized errors referenced in authorisation.ts (lines
referenced around the missing uuid check and unauthorized path). Ensure
uploadCentralized remains the final handler and that cleanup uses the same temp
file locations Multer writes to.

---

Duplicate comments:
In `@desci-server/src/controllers/data/downloadCentralized.ts`:
- Around line 31-38: The call to checkCentralizedDataAccess (which returns
hasAccess) is executed outside the controller's try block so any exceptions it
throws bypass the controller's catch; move the checkCentralizedDataAccess
invocation (using normalizedUuid, shareId, user) inside the existing try block
or wrap it in the same try so thrown errors are caught by the controller's
catch, and retain the existing behavior that sends res.status(403).json(...)
when hasAccess is false; ensure the catch then handles and logs unexpected
errors consistently with the rest of the controller.

In `@desci-server/src/controllers/data/uploadCentralized.ts`:
- Around line 34-38: The handler currently assumes req.body.contextPath is a
string and calls contextPath.startsWith and UNSAFE_PATH_PATTERN.test, which
throws on non-strings; update the validation around the contextPath variable in
uploadCentralized.ts to first check typeof contextPath === 'string' (or
explicitly coerce only after that check) and return a 400 error for non-string
values before running contextPath.startsWith('/') or
UNSAFE_PATH_PATTERN.test(contextPath), referencing the existing contextPath
variable and UNSAFE_PATH_PATTERN to locate where to add the guard.
- Around line 12-13: The multer configuration (localUpload / uploadMiddleware)
currently allows an unbounded number of files and per-file size which can
exhaust os.tmpdir(); update the localUpload initialization to include a sensible
limits object (e.g., limits: { files: <maxFiles>, fileSize: <maxBytesPerFile> }
or use configured constants) so Multer enforces the cap before writing to disk,
and keep uploadMiddleware = localUpload.array('files') unchanged; ensure you
choose appropriate maxFiles and maxBytesPerFile values (or read them from
config/env) and handle the resulting Multer errors in the controller.
- Line 36: UNSAFE_PATH_PATTERN uses hex control-character escapes ([\x00-\x1f])
which Biome flags; replace that character class with a Unicode property escape
(e.g. use \p{Cc} to match control characters) and enable the Unicode flag on the
regex (change /.../ to /.../u) so the pattern becomes equivalent but Biome-safe;
update the constant UNSAFE_PATH_PATTERN accordingly and keep the existing checks
for ".." and backslash intact.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b0a7db1a-117b-435b-866f-42d0b8722f4e

📥 Commits

Reviewing files that changed from the base of the PR and between 4c3dd6e and 3ce6d79.

📒 Files selected for processing (20)

.env.example
desci-server/CENTRALIZED_DATA.md
desci-server/prisma/migrations/20260317105928_add_node_data_grant/migration.sql
desci-server/prisma/migrations/20260317112936_add_granted_by_index/migration.sql
desci-server/prisma/schema.prisma
desci-server/scripts/dev-local.sh
desci-server/src/controllers/auth/magic.ts
desci-server/src/controllers/data/centralizedTree.ts
desci-server/src/controllers/data/downloadCentralized.ts
desci-server/src/controllers/data/downloadCentralizedZip.ts
desci-server/src/controllers/data/uploadCentralized.ts
desci-server/src/controllers/nodes/dataGrants.ts
desci-server/src/controllers/nodes/draftCreate.ts
desci-server/src/routes/v1/auth.ts
desci-server/src/routes/v1/data.ts
desci-server/src/routes/v1/nodes.ts
desci-server/src/server.ts
desci-server/src/services/centralizedDataAccess.ts
desci-server/src/services/r2.ts
desci-server/src/test/centralizedData.html

🚧 Files skipped from review as they are similar to previous changes (10)

desci-server/src/services/centralizedDataAccess.ts
desci-server/CENTRALIZED_DATA.md
desci-server/prisma/migrations/20260317105928_add_node_data_grant/migration.sql
desci-server/src/routes/v1/nodes.ts
desci-server/src/controllers/auth/magic.ts
desci-server/src/controllers/nodes/dataGrants.ts
desci-server/src/test/centralizedData.html
desci-server/src/services/r2.ts
desci-server/src/routes/v1/auth.ts
desci-server/prisma/migrations/20260317112936_add_granted_by_index/migration.sql

coderabbitai · 2026-03-17T12:03:57Z

desci-server/scripts/dev-local.sh

@@ -0,0 +1,103 @@
+#!/bin/bash
+set -e


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -e echo "Without pipefail:" bash -lc 'set -e; false | tail -1; echo "pipeline survived"' || true echo echo "With pipefail:" bash -lc 'set -e -o pipefail; false | tail -1; echo "unreachable"' || echo "pipeline stopped as expected"

Repository: desci-labs/nodes

Length of output: 355

🏁 Script executed:

cat -n desci-server/scripts/dev-local.sh | head -80

Repository: desci-labs/nodes

Length of output: 3382

Prisma failures are currently masked by the tail pipelines.

set -e alone does not stop the script if npx prisma generate or npx prisma migrate deploy fails. When a command fails but is piped to another command, only the final command's exit code matters—tail will exit 0 even if Prisma fails, allowing the script to continue with a broken setup.

Add pipefail to the set options (lines 2 and 67-69):

Make the pipelines fail correctly

-#!/bin/bash -set -e +#!/bin/bash +set -euo pipefail

Without pipefail, a failing Prisma command masked by tail will allow the script to proceed. With pipefail, the script will correctly exit when Prisma fails.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

set -e

#!/bin/bash

set -euo pipefail

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@desci-server/scripts/dev-local.sh` at line 2, The script's error handling is masking failures because pipelines with tail can return a zero exit status; update the shell options to enable pipefail so upstream command failures (e.g., npx prisma generate and npx prisma migrate deploy) cause the script to exit. Replace the top-level "set -e" with a form that enables pipefail (e.g., add "-o pipefail" or use "set -euo pipefail") and also update the later "set" invocation around the pipelines (the lines near the prisma commands/tail usage) to include pipefail so the pipeline exit status reflects failures.

coderabbitai · 2026-03-17T12:03:57Z

desci-server/src/controllers/data/centralizedTree.ts

+  const node = await prisma.node.findFirst({
+    where: { uuid: normalizedUuid },
+    include: { versions: { where: { transactionId: { not: null } }, take: 1 } },
+  });
+
+  if (!node) {
+    return res.status(404).json({ ok: false, message: 'Node not found' });
+  }
+
+  const isPublished = node.versions.length > 0;
+  if (!isPublished) {
+    const shareId = req.query.shareId as string | undefined;
+    const user = (req as any).user;
+    const { hasAccess } = await checkCentralizedDataAccess(normalizedUuid, shareId, user);
+    if (!hasAccess) {
+      return res.status(403).json({ ok: false, message: 'Access denied. This node is unpublished — provide a valid shareId, authenticate as owner, or request a data grant.' });
+    }
+  }


⚠️ Potential issue | 🟠 Major

Move the Prisma lookup and access check under the try.

prisma.node.findFirst() and checkCentralizedDataAccess() can both throw here. Right now those failures skip the catch block entirely and never hit the structured 500 response.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@desci-server/src/controllers/data/centralizedTree.ts` around lines 91 - 108, The Prisma query prisma.node.findFirst(...) and the access check call checkCentralizedDataAccess(...) should be executed inside the existing try block so any thrown errors are caught and handled by the structured 500 response; move the retrieval of node (using normalizedUuid) and the subsequent isPublished logic (including reading req.query.shareId and invoking checkCentralizedDataAccess) into the try scope and ensure that on unexpected exceptions you return the same res.status(500).json(...) path as other try/catch branches.

coderabbitai · 2026-03-17T12:03:57Z

desci-server/src/controllers/data/centralizedTree.ts

+    const pathFilter = req.query.path as string | undefined;
+    let filteredObjects = objects;
+    if (pathFilter) {
+      const filterPrefix = `${normalizedUuid}/${pathFilter}`;
+      filteredObjects = objects.filter((obj) => obj.key.startsWith(filterPrefix));
+    }
+
+    const treePrefix = pathFilter ? `${normalizedUuid}/${pathFilter}` : prefix;
+    // Ensure treePrefix ends with /
+    const normalizedTreePrefix = treePrefix.endsWith('/') ? treePrefix : treePrefix + '/';
+    const tree = buildTreeSimple(filteredObjects, normalizedTreePrefix);


⚠️ Potential issue | 🟠 Major

Anchor path filters to a directory boundary.

With startsWith(${normalizedUuid}/${pathFilter}), a request for path=foo also pulls in siblings like foobar.txt, and buildTreeSimple() then slices them against foo/, returning mangled paths. Normalize the filter to .../foo/ or add a separate exact-file branch.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@desci-server/src/controllers/data/centralizedTree.ts` around lines 114 - 124, The path filter currently uses startsWith(`${normalizedUuid}/${pathFilter}`) which matches partial names (e.g., "foo" matches "foobar.txt"); change the filtering to anchor to a directory boundary by normalizing the filterPrefix to include a trailing slash (e.g., `${normalizedUuid}/${pathFilter}/`) when treating pathFilter as a directory, or add an exact-file branch that checks equality against the full key when pathFilter refers to a file; update the computation of filterPrefix, filteredObjects, and treePrefix/normalizedTreePrefix (the variables and functions involved are pathFilter, normalizedUuid, filteredObjects, treePrefix, normalizedTreePrefix, and buildTreeSimple) so buildTreeSimple receives only keys rooted at the correct directory or the single exact file.

coderabbitai · 2026-03-17T12:03:57Z

desci-server/src/controllers/data/downloadCentralized.ts

+    const sanitizedFileName = fileName
+      .replace(/[\x00-\x1f\x7f"\\]/g, '_')
+      .replace(/[^\x20-\x7e]/g, '_')


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

find . -name "downloadCentralized.ts" -type f

Repository: desci-labs/nodes

Length of output: 118

🏁 Script executed:

cat -n desci-server/src/controllers/data/downloadCentralized.ts | head -60

Repository: desci-labs/nodes

Length of output: 2828

🌐 Web query:

Biome noControlCharactersInRegex rule regex control characters

💡 Result:

Biome’s lint/suspicious/noControlCharactersInRegex rule flags control characters (ASCII 0–31, i.e. U+0000–U+001F) inside regex literals because they’re usually accidental/unreadable in JS regexes. (biomejs.dev)

What it disallows

In a regex literal like /.../, it reports:

\x00 … \x1F

\u0000 … \u001F

\u{0} … \u{1F} (with the u flag)

Any raw unescaped character in the range U+0000–U+001F (biomejs.dev)

What it allows

Common “control escapes” such as \t and \n are explicitly allowed. (biomejs.dev)

Example

Invalid (flagged):

var pattern = /\x0C/; (biomejs.dev)

Valid:

var pattern = /\n/; (biomejs.dev)

Config (biome.json)

{ "linter": { "rules": { "suspicious": { "noControlCharactersInRegex": "error" } } } } ``` ([biomejs.dev](https://biomejs.dev/linter/rules/no-control-characters-in-regex/)) Citations: - 1: https://biomejs.dev/linter/rules/no-control-characters-in-regex/ - 2: https://biomejs.dev/linter/rules/no-control-characters-in-regex/ - 3: https://biomejs.dev/linter/rules/no-control-characters-in-regex/ - 4: https://biomejs.dev/linter/rules/no-control-characters-in-regex/ - 5: https://biomejs.dev/linter/rules/no-control-characters-in-regex/ - 6: https://biomejs.dev/linter/rules/no-control-characters-in-regex/ --- 🏁 Script executed: ```shell find . -name "biome.json" -o -name "biome.jsonc" -o -name ".biomeignore" | head -20

Repository: desci-labs/nodes

Length of output: 72

🏁 Script executed:

rg "noControlCharactersInRegex" --type json --type jsonc

Repository: desci-labs/nodes

Length of output: 89

🏁 Script executed:

# Let's see what characters the regex is actually trying to filter # \x00-\x1f = control characters (0-31) # \x7f = DEL character (127) # So the first regex removes control chars + DEL, then second removes non-ASCII printable # The real question: what's the intended behavior and proper fix? # Let's see if there are other similar patterns in the codebase rg "\\\\x[0-9a-f]" --type ts --type js

Repository: desci-labs/nodes

Length of output: 1532

🏁 Script executed:

cat biome.json

Repository: desci-labs/nodes

Length of output: 1834

🏁 Script executed:

# Check if the rule is disabled or configured in any way cat biome.json | jq '.linter.rules' 2>/dev/null || cat biome.json

Repository: desci-labs/nodes

Length of output: 1273

Remove control character hex escapes from regex to satisfy Biome linting.

Line 46 uses hex escapes \x00-\x1f which trigger Biome's noControlCharactersInRegex rule. However, converting to unicode escapes (e.g., \u0000-\u001F) won't resolve this—Biome also flags those ranges. Instead, add a biome-ignore directive above the line or refactor the regex to avoid character ranges that include control characters.

Current code

const sanitizedFileName = fileName .replace(/[\x00-\x1f\x7f"\\]/g, '_') .replace(/[^\x20-\x7e]/g, '_')

🧰 Tools

🪛 Biome (2.4.7)

[error] 46-46: Unexpected control character in a regular expression.

(lint/suspicious/noControlCharactersInRegex)

[error] 46-46: Unexpected control character in a regular expression.

(lint/suspicious/noControlCharactersInRegex)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@desci-server/src/controllers/data/downloadCentralized.ts` around lines 45 - 47, The regex in the sanitizedFileName computation (the two .replace calls on fileName) contains control-character hex escapes that trigger Biome's noControlCharactersInRegex rule; either add a Biome ignore directive immediately above the offending line (e.g., an inline biome-ignore for noControlCharactersInRegex) or refactor the sanitization to avoid explicit control-character ranges (for example, use an explicit allowlist of safe printable characters or Unicode property classes) so the replace calls no longer contain \x00-\x1f or similar ranges; update the code around sanitizedFileName/fileName accordingly.

coderabbitai · 2026-03-17T12:03:57Z

desci-server/src/controllers/data/downloadCentralizedZip.ts

+  const { hasAccess } = await checkCentralizedDataAccess(normalizedUuid, shareId, user);
+
+  if (!hasAccess) {
+    res.status(403).json({ ok: false, message: 'Access denied. Provide a valid shareId, authenticate as the node owner, or request a data grant.' });
+    return;
+  }
+
+  try {


⚠️ Potential issue | 🟠 Major

Wrap the access check in the zip handler's try.

checkCentralizedDataAccess() is awaited before the error boundary, so a thrown auth/data-grant lookup bypasses the controller's catch path.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@desci-server/src/controllers/data/downloadCentralizedZip.ts` around lines 29 - 36, Move the await checkCentralizedDataAccess(normalizedUuid, shareId, user) call inside the existing try block so any exceptions from auth/data-grant lookup are caught by the controller's catch; specifically, wrap the invocation of checkCentralizedDataAccess (and the subsequent hasAccess check that sends the 403 via res.status(403).json(...)) inside the try in downloadCentralizedZip, preserving the existing 403 response when hasAccess is false and ensuring thrown errors flow to the catch handler.

coderabbitai · 2026-03-17T12:03:57Z

desci-server/src/controllers/data/downloadCentralizedZip.ts

+    const sanitizedZipName = zipName
+      .replace(/[\x00-\x1f\x7f"\\]/g, '_')
+      .replace(/[^\x20-\x7e]/g, '_')


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's examine the file and the specific lines mentioned head -60 desci-server/src/controllers/data/downloadCentralizedZip.ts | tail -20

Repository: desci-labs/nodes

Length of output: 927

🏁 Script executed:

# Get more context around lines 48-50 sed -n '40,60p' desci-server/src/controllers/data/downloadCentralizedZip.ts

Repository: desci-labs/nodes

Length of output: 959

🏁 Script executed:

# Check if Biome is configured in the project and what rules are enabled find . -name "biome.json*" -o -name ".biomerc*" | head -5

Repository: desci-labs/nodes

Length of output: 72

🏁 Script executed:

# Search for other centralized-data handlers mentioned fd "centralized" desci-server/src/controllers/data/ -type f

Repository: desci-labs/nodes

Length of output: 230

🏁 Script executed:

# Look for similar regex patterns in centralized-data handlers rg "noControlCharacters" --type ts --type js

Repository: desci-labs/nodes

Length of output: 42

🏁 Script executed:

# Check biome.json configuration cat biome.json

Repository: desci-labs/nodes

Length of output: 1834

🏁 Script executed:

# Find other handlers in the data directory ls -la desci-server/src/controllers/data/

Repository: desci-labs/nodes

Length of output: 1134

🏁 Script executed:

# Search for similar regex patterns with control characters in other files rg "\\\\x00-\\\\x1f" desci-server/src/controllers/

Repository: desci-labs/nodes

Length of output: 372

🏁 Script executed:

# Look for noControlCharacters rule mentions in biome config or docs rg "noControlCharacters" .

Repository: desci-labs/nodes

Length of output: 42

🏁 Script executed:

# Let's check if there are any linter errors or configuration that might indicate the issue # First, check the entire downloadCentralizedZip.ts file cat desci-server/src/controllers/data/downloadCentralizedZip.ts

Repository: desci-labs/nodes

Length of output: 3199

🏁 Script executed:

# Check downloadCentralized.ts to see the comparison cat desci-server/src/controllers/data/downloadCentralized.ts

Repository: desci-labs/nodes

Length of output: 2998

🏁 Script executed:

# Search for any comments or issues about regex/control characters in the repo rg -i "control.?char|biome.*regex|regex.*biome" desci-server/

Repository: desci-labs/nodes

Length of output: 396

🌐 Web query:

Biome noControlCharactersInRegex rule regex security

💡 Result:

Biome’s lint/suspicious/noControlCharactersInRegex rule flags control characters (ASCII 0–31 / U+0000–U+001F) in regular-expression literals, because they’re hidden/unusual and are usually a mistake or accidental copy/paste. It disallows: \x00–\x1F, \u0000–\u001F, \u{0}–\u{1F} (with u), and raw unescaped control chars in the regex. It allows common “control escapes” like \t and \n. [1]

Security relevance: control characters are invisible and can obfuscate a pattern, making reviews/audits miss what the regex really matches (and potentially enabling filter/validation bypasses). This is why both Biome and ESLint treat them as “probably a mistake” and recommend the rule by default. [1][2]

How to configure (Biome):

{ "linter": { "rules": { "suspicious": { "noControlCharactersInRegex": "error" } } } }

[1]

Equivalent ESLint rule: no-control-regex. [1][2]

Sources

Biome rule docs: noControlCharactersInRegex https://biomejs.dev/linter/rules/no-control-characters-in-regex/ [1]

ESLint rule docs: no-control-regex https://eslint.org/docs/latest/rules/no-control-regex [2]

Biome will fail on the noControlCharactersInRegex rule at lines 49–50.

The regex patterns /[\x00-\x1f\x7f"\\]/g and /[^\x20-\x7e]/g contain control character escapes that Biome's recommended linting rules disallow. Since biome.json enables "recommended": true, this file will not lint cleanly without fixing these patterns. The same issue exists in downloadCentralized.ts.

🧰 Tools

🪛 Biome (2.4.7)

[error] 49-49: Unexpected control character in a regular expression.

(lint/suspicious/noControlCharactersInRegex)

[error] 49-49: Unexpected control character in a regular expression.

(lint/suspicious/noControlCharactersInRegex)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@desci-server/src/controllers/data/downloadCentralizedZip.ts` around lines 48 - 50, The regexes use control-character hex escapes that violate Biome's noControlCharactersInRegex rule; update the sanitize logic (the sanitizedZipName assignment that reads zipName...) to use Unicode escapes instead, e.g. replace /[\x00-\x1f\x7f"\\]/g with /[\u0000-\u001F\u007F"\\]/g and /[^\x20-\x7e]/g with /[^\u0020-\u007E]/g, and make the same change in downloadCentralized.ts so the file lints cleanly under the recommended Biome rules.

coderabbitai · 2026-03-17T12:03:57Z

desci-server/src/controllers/data/downloadCentralizedZip.ts

+    for (const obj of objects) {
+      const relativePath = obj.key.slice(prefix.length);
+      if (!relativePath) continue;
+
+      try {
+        const { stream } = await getStreamFromR2(obj.key);
+        archive.append(stream, { name: relativePath });


⚠️ Potential issue | 🟠 Major

Don't trust stored R2 keys as ZIP entry names.

relativePath is copied straight into the archive. If the bucket already contains a key with ../, a leading slash, backslashes, or control characters, this endpoint will generate a traversal entry inside the ZIP.

Guard the archive entry name before appending

for (const obj of objects) { const relativePath = obj.key.slice(prefix.length); if (!relativePath) continue; + if ( + relativePath.startsWith('/') || + relativePath.includes('\\') || + /(^|\/)\.\.(\/|$)|[\u0000-\u001f\u007f]/.test(relativePath) + ) { + logger.warn({ key: obj.key }, 'Skipping unsafe zip entry name'); + continue; + } try { const { stream } = await getStreamFromR2(obj.key);

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

for (const obj of objects) {

const relativePath = obj.key.slice(prefix.length);

if (!relativePath) continue;

try {

const { stream } = await getStreamFromR2(obj.key);

archive.append(stream, { name: relativePath });

for (const obj of objects) {

const relativePath = obj.key.slice(prefix.length);

if (!relativePath) continue;

if (

relativePath.startsWith('/') ||

relativePath.includes('\\') ||

/(^|\/)\.\.(\/|$)|[\u0000-\u001f\u007f]/.test(relativePath)

) {

logger.warn({ key: obj.key }, 'Skipping unsafe zip entry name');

continue;

}

try {

const { stream } = await getStreamFromR2(obj.key);

archive.append(stream, { name: relativePath });

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@desci-server/src/controllers/data/downloadCentralizedZip.ts` around lines 68 - 74, The code currently appends untrusted relativePath into archive (see variables relativePath and archive.append inside the loop that iterates objects and calls getStreamFromR2), so sanitize and validate the entry name before appending: convert backslashes to forward slashes, run path.posix.normalize on the result, strip any leading slashes, remove control characters (e.g. non-printable chars via a regex), then reject or skip the entry if normalize produces empty string or contains any ".." path segment (or starts with "..") to prevent traversal; only after these checks use the sanitized name when calling archive.append(stream, { name: safeName }). Ensure this logic lives just before archive.append in downloadCentralizedZip.ts.

coderabbitai · 2026-03-17T12:03:57Z

desci-server/src/controllers/nodes/draftCreate.ts

+    try {
+      const manifestResult = await makeManifest({
+        title,
+        researchFields,
+        defaultLicense: defaultLicense || '',
+        ipfsNode: getNodeToUse(isGuest),
+      });
+      researchObject = manifestResult.researchObject;
+      const ipfsResult = await addBufferToIpfs(manifestResult.manifest, '', getNodeToUse(isGuest));
+      hash = ipfsResult.cid;
+      uri = `${hash}`;
+    } catch (ipfsErr) {
+      logger.warn({ err: ipfsErr }, 'IPFS unavailable, creating node without manifest');
+    }


⚠️ Potential issue | 🟠 Major

Separate manifest building from the IPFS-availability fallback.

This catch wraps both makeManifest() and addBufferToIpfs(). If manifest construction throws because of invalid input or a local regression, the request still creates the node and logs it as an IPFS outage. Only the external IPFS write should be fail-soft here.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@desci-server/src/controllers/nodes/draftCreate.ts` around lines 39 - 52, The current try/catch wraps both makeManifest and addBufferToIpfs so manifest construction errors are misclassified as IPFS outages; split the logic so makeManifest(...) is called first and allowed to throw (so invalid input/regressions surface) and then call addBufferToIpfs(...) inside its own try/catch. Concretely: call makeManifest({ title, researchFields, defaultLicense: defaultLicense || '', ipfsNode: getNodeToUse(isGuest) }) and assign researchObject from manifestResult; then in a separate try block call addBufferToIpfs(manifestResult.manifest, '', getNodeToUse(isGuest)) and on failure catch only IPFS errors (log via logger.warn with err) and leave hash/uri unset/default. Ensure references to makeManifest, manifestResult, addBufferToIpfs, researchObject, hash, uri and getNodeToUse(isGuest) are updated accordingly.

coderabbitai · 2026-03-17T12:03:57Z

desci-server/src/routes/v1/data.ts

 router.post('/google/import', [ensureUser, ensureNodeAccess], googleImport);

+// Centralized data (R2) routes
+router.post('/uploadCentralized', [ensureUser, centralizedUploadHandler, ensureNodeAccess], uploadCentralized);


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Check if centralizedUploadHandler has any cleanup logic on error ast-grep --pattern $'centralizedUploadHandler' # Look for Multer configuration and temp file handling rg -n -C5 'multer' --type=ts -g '!node_modules'

Repository: desci-labs/nodes

Length of output: 17246

🏁 Script executed:

cat -n desci-server/src/controllers/data/uploadCentralized.ts

Repository: desci-labs/nodes

Length of output: 5286

🏁 Script executed:

rg -n 'ensureNodeAccess' --type=ts -A 10 | head -100

Repository: desci-labs/nodes

Length of output: 7692

🏁 Script executed:

rg -n 'fs.unlink|fs.rm|cleanup|unlinkSync' desci-server/src/controllers/data/uploadCentralized.ts

Repository: desci-labs/nodes

Length of output: 42

🏁 Script executed:

sed -n '51,120p' desci-server/src/middleware/authorisation.ts

Repository: desci-labs/nodes

Length of output: 2245

Multer runs before authorization — orphan temp files on rejection.

The middleware order [ensureUser, centralizedUploadHandler, ensureNodeAccess] means Multer writes uploaded files to disk before ensureNodeAccess verifies the user owns the node. If ensureNodeAccess rejects the request (missing uuid on line 67, or unauthorized on line 78 of authorisation.ts), the temp files written to os.tmpdir() remain on disk since uploadCentralized is never reached to clean them.

Consider either:

Moving ensureNodeAccess before centralizedUploadHandler, or

Adding cleanup logic in ensureNodeAccess to remove orphan uploads before returning an error.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@desci-server/src/routes/v1/data.ts` at line 32, The current middleware order runs Multer (centralizedUploadHandler) before authorization (ensureNodeAccess), causing orphan temp files on rejection; either move ensureNodeAccess to run before centralizedUploadHandler in the router registration (so Multer never writes unless authorized), or add cleanup logic inside ensureNodeAccess to delete any uploaded temp files (inspect req.file/req.files or the Multer temp paths and unlink them) before returning the missing-uuid or unauthorized errors referenced in authorisation.ts (lines referenced around the missing uuid check and unauthorized path). Ensure uploadCentralized remains the final handler and that cleanup uses the same temp file locations Multer writes to.

yradwan147 requested a review from kadamidev March 10, 2026 19:42

yradwan147 added 2 commits March 10, 2026 22:46

feat: add R2 env vars to K8s deployments for centralized data

8ed77ad

R2_ACCOUNT_ID, R2_ACCESS_KEY_ID, R2_SECRET_ACCESS_KEY, and R2_BUCKET_NAME must be added to Vault at secrets/desci-server/{dev,prod}/app for the centralized data endpoints to work in deployed environments.

yradwan147 force-pushed the feat/r2-centralized-data-sharing branch from 3460219 to 8ed77ad Compare March 10, 2026 19:46

coderabbitai bot reviewed Mar 10, 2026

View reviewed changes

coderabbitai bot reviewed Mar 17, 2026

View reviewed changes

desci-server/src/controllers/auth/magic.ts Outdated Show resolved Hide resolved

desci-server/src/controllers/nodes/dataGrants.ts Show resolved Hide resolved

desci-server/src/test/centralizedData.html Show resolved Hide resolved

yradwan147 added 2 commits March 17, 2026 13:40

yradwan147 force-pushed the feat/r2-centralized-data-sharing branch from 4c3dd6e to 3ce6d79 Compare March 17, 2026 11:40

coderabbitai bot reviewed Mar 17, 2026

View reviewed changes

		const localUpload = multer({ dest: os.tmpdir(), preservePath: true });
		const uploadMiddleware = localUpload.array('files');

	const UNSAFE_PATH_PATTERN = /(\.\.\|\\\|[\x00-\x1f])/;
	const UNSAFE_PATH_PATTERN = /(\.\.\|\\\|[\u0000-\u001f])/;

Conversation

yradwan147 commented Mar 10, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 17, 2026

Choose a reason for hiding this comment

What it disallows

What it allows

Example

Config (biome.json)

Uh oh!

coderabbitai bot Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

yradwan147 commented Mar 10, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 10, 2026 •

edited

Loading