Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds Cloudflare R2-backed centralized file storage: new R2 service, upload/tree/download/zip controllers and routes, access-grant model and APIs, DB migrations, dev tooling and test UI, environment and deployment updates, and an archiver dependency. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Client as Client
participant Server as Server (Express)
participant DB as Database (Postgres)
participant R2 as Cloudflare R2
participant Arch as Archiver
Client->>Server: POST /v1/uploadCentralized (files + auth)
Server->>DB: verify node ownership / share/grant access
DB-->>Server: access allowed
Server->>R2: PUT object(s)
R2-->>Server: OK
Server-->>Client: 200 {uploaded files meta}
Client->>Server: GET /v1/centralizedTree/:nodeUuid
Server->>R2: List objects (prefix)
R2-->>Server: object list
Server->>Server: build hierarchical tree
Server-->>Client: 200 {tree}
Client->>Server: GET /v1/downloadCentralized/:nodeUuid/zip
Server->>DB: verify access
DB-->>Server: allowed
Server->>R2: stream objects
R2-->>Server: streams
Server->>Arch: append streams to ZIP
Arch-->>Server: zip stream -> response
Server-->>Client: streaming ZIP
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Add private data storage via Cloudflare R2, completely parallel to the existing IPFS pipeline. Stakeholders can upload, browse, share, and download research data through a self-contained test UI. Key changes: - R2 service (r2.ts): S3-compatible client for upload, download, list, and delete operations using @aws-sdk/client-s3 - Upload endpoint (POST /v1/data/uploadCentralized): multipart file upload to R2 with SHA-256 content hashing stored as object metadata - Tree endpoint (GET /v1/data/centralizedTree/:uuid): public file listing built from R2 object keys, no auth required - Download endpoint (GET /v1/data/downloadCentralized/:uuid/*): gated single-file download — requires node ownership or valid shareId - Zip download (GET /v1/data/downloadCentralized/:uuid/zip): streams all node files as a zip archive using archiver, same access controls - Share endpoint reused from existing POST /v1/nodes/share/:uuid - Magic link auth in test UI: uses existing POST /v1/auth/magic, with devCode auto-fill in non-production for easier local testing - draftCreate made IPFS-resilient: node creation succeeds even when IPFS/sync-server are unavailable — only the DB record is required for R2 operations - Dev script (scripts/dev-local.sh): starts Postgres, Redis, and IPFS via Docker; overrides IPFS URLs to localhost for native execution - URL-encodes file paths with spaces in download links and share URLs - Removed dead buildTree() function from centralizedTree.ts - Removed leftover debugger comment from magic.ts - Added CENTRALIZED_DATA.md developer documentation No changes to existing IPFS flows. All R2 endpoints verified: upload, tree, download, zip, share link, and access control (403s).
R2_ACCOUNT_ID, R2_ACCESS_KEY_ID, R2_SECRET_ACCESS_KEY, and
R2_BUCKET_NAME must be added to Vault at secrets/desci-server/{dev,prod}/app
for the centralized data endpoints to work in deployed environments.
3460219 to
8ed77ad
Compare
There was a problem hiding this comment.
Actionable comments posted: 8
🧹 Nitpick comments (5)
desci-server/src/controllers/auth/magic.ts (1)
55-58: Ordering inconsistency with auth.ts.This query uses
orderBy: { createdAt: 'desc' }, whileauth.ts(lines 52, 58) usesorderBy: { id: 'desc' }for the sameMagicLinkmodel. Both typically return the same result sinceidis auto-increment andcreatedAtdefaults tonow(), but this inconsistency could cause subtle bugs if records are ever bulk-inserted or have explicit timestamps.Consider aligning with the existing pattern in
auth.ts:♻️ Suggested alignment
const latest = await prismaClient.magicLink.findFirst({ where: { email: { equals: cleanEmail, mode: 'insensitive' } }, - orderBy: { createdAt: 'desc' }, + orderBy: { id: 'desc' }, });🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@desci-server/src/controllers/auth/magic.ts` around lines 55 - 58, The query in prismaClient.magicLink.findFirst within magic.ts uses orderBy: { createdAt: 'desc' } which is inconsistent with auth.ts (which uses orderBy: { id: 'desc' }); change the orderBy to { id: 'desc' } to match the existing pattern and avoid subtle differences from bulk inserts or manual timestamps—update the prismaClient.magicLink.findFirst call that assigns latest to use id descending so the same record selection logic is used across auth.ts and magic.ts.desci-server/src/services/r2.ts (1)
33-48: Consider documenting stream upload limitations.
uploadToR2acceptsReadablestreams, but the current implementation doesn't specifyContentLength. While AWS SDK v3 handles this for many cases, very large stream uploads may require multipart upload handling. The current usage inuploadCentralized.tsreads files into buffers first, which is safe but could be documented.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@desci-server/src/services/r2.ts` around lines 33 - 48, The uploadToR2 function accepts Readable streams but does not set ContentLength or handle multipart uploads, which can fail for very large streams; update the code/comments for uploadToR2 and PutObjectCommand to document this limitation and recommend one of two fixes: (1) require callers to provide ContentLength and handle single-part uploads, or (2) buffer streams to memory (as done in uploadCentralized.ts) or implement multipart upload logic for large objects; reference uploadToR2 and PutObjectCommand in the comment so callers know to either pass a Buffer/known-length stream or use a multipart strategy.desci-server/src/controllers/data/downloadCentralizedZip.ts (1)
28-54: Consider extracting shared access control logic.The access control pattern (lines 28-54) is duplicated from
downloadCentralized.ts. Consider extracting to a shared utility or middleware to reduce duplication and ensure consistent behavior.Example shared utility
// In a shared utility file export async function checkCentralizedDataAccess( normalizedUuid: string, shareId: string | undefined, userId: number | undefined ): Promise<boolean> { if (shareId) { const privateShare = await prisma.privateShare.findFirst({ where: { shareId, nodeUUID: normalizedUuid }, }); if (privateShare) return true; } if (userId) { const node = await prisma.node.findFirst({ where: { uuid: normalizedUuid, ownerId: userId }, }); if (node) return true; } return false; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@desci-server/src/controllers/data/downloadCentralizedZip.ts` around lines 28 - 54, The access-check block duplicated from downloadCentralized.ts should be extracted into a shared utility (e.g., checkCentralizedDataAccess) or middleware and then used from downloadCentralizedZip; implement a function that takes normalizedUuid, shareId and user?.id, performs the two checks (privateShare by shareId AND nodeUUID match, and node by uuid+ownerId) and returns boolean, replace the inline logic in downloadCentralizedZip with a call to that utility, remove the ensureUuidEndsWithDot/duplicate checks from the controller and ensure the utility performs the normalized nodeUUID comparison consistently so both endpoints share identical behavior.desci-server/src/controllers/data/downloadCentralized.ts (1)
34-43: Consider combining shareId and nodeUUID in a single query.The current pattern queries by
shareIdalone, then validatesnodeUUIDseparately. While secure, a single query with both conditions (as done indesci-server/src/controllers/nodes/thumbnails.ts:50-56) would be more efficient:const privateShare = await prisma.privateShare.findFirst({ where: { shareId, nodeUUID: normalizedUuid }, }); if (privateShare) hasAccess = true;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@desci-server/src/controllers/data/downloadCentralized.ts` around lines 34 - 43, Combine the two-step check into a single DB query: call prisma.privateShare.findFirst with where: { shareId, nodeUUID: normalizedUuid } (instead of querying by shareId only and then validating nodeUUID with ensureUuidEndsWithDot), then set hasAccess = true if privateShare is returned; remove the separate privateShare.node and ensureUuidEndsWithDot(...) conditional. This uses the existing variables shareId, normalizedUuid, privateShare and hasAccess and the prisma.privateShare.findFirst call.desci-server/src/controllers/data/uploadCentralized.ts (1)
52-70: Memory concern: entire file loaded into buffer for large uploads.Line 58 reads the entire file into memory before uploading. For large research datasets, this could cause memory pressure. Consider streaming uploads directly from disk for files above a size threshold.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@desci-server/src/controllers/data/uploadCentralized.ts` around lines 52 - 70, The current loop in uploadCentralized.ts loads each file fully via fs.promises.readFile (file.path) and then passes fileBuffer to uploadToR2, which will OOM for large files; change to stream large files instead: for files over a size threshold (e.g., file.size > X) use fs.createReadStream(file.path) and a streaming upload path from uploadToR2 (or the R2 client) while computing the SHA-256 by piping the stream through a crypto.createHash instance (or using a Transform that updates the hash) so you never hold the full file in memory; keep small files using the current readFile path for compatibility, and still unlink file.path after the upload completes; reference uploadToR2, the for (const file of files) loop, nodeUuid/contextPath, and the read/unlink of file.path when locating where to implement this change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.env.example:
- Around line 80-84: The committed R2 credentials must be removed from
.env.example: replace the real-looking values for R2_ACCOUNT_ID,
R2_ACCESS_KEY_ID, R2_SECRET_ACCESS_KEY, and R2_BUCKET_NAME with clearly fake or
blank placeholder values (e.g., EMPTY or YOUR_R2_ACCOUNT_ID) so the file
contains no real secrets; update the lines for the variables R2_ACCOUNT_ID,
R2_ACCESS_KEY_ID, R2_SECRET_ACCESS_KEY, and R2_BUCKET_NAME accordingly and
ensure no other secret-like strings remain in the file.
In `@desci-server/scripts/dev-local.sh`:
- Around line 15-29: The script currently only starts the single "ipfs" service
but advertises PUBLIC_IPFS_RESOLVER and GUEST_IPFS_NODE_URL endpoints, so update
the docker compose invocation to also bring up ipfs_public and ipfs_guest (i.e.,
change the compose up command in the block that runs in "$NODES_DIR" to include
ipfs_public and ipfs_guest), and add readiness checks similar to the existing
"until docker exec ipfs ipfs id" loops for the containers ipfs_public and
ipfs_guest (or equivalent HTTP/health checks against the advertised ports 5002
and 5005) so those advertised endpoints are actually up before the script
reports "IPFS ready."
- Around line 40-55: The script currently hardcodes the Sentry profiler filename
to darwin-arm64; update the logic that sets NODE_ABI, SENTRY_LIB and TARGET so
the TARGET filename is built from the host tuple (process.platform and
process.arch as well as process.versions.modules) instead of a fixed
"darwin-arm64" token: run node to emit platform and arch (similar to how
NODE_ABI is derived), compose a platform-arch string and use it when
constructing TARGET and when searching/copying LATEST, and keep the existing
fallback behavior (checking if TARGET exists, finding the highest ABI with the
pattern sentry_cpu_profiler-<platform>-<arch>-*.node, copying LATEST to TARGET,
and emitting the same log messages).
In `@desci-server/src/controllers/data/downloadCentralized.ts`:
- Around line 67-68: The Content-Disposition uses the raw fileName which can
allow header injection; in the handler that sets headers (the res.setHeader
calls around Content-Type/Content-Disposition) sanitize the fileName before use
by stripping or replacing control characters (CR/LF), quotes, backslashes and
other header-breaking characters, fall back to a safe default name if the result
is empty, and when possible set both a plain filename and an RFC5987 encoded
filename* (e.g., filename*=UTF-8''<encoded>) to preserve non-ASCII characters;
update the code referencing res.setHeader('Content-Disposition', `attachment;
filename="${fileName}"`) to use the sanitized and encoded value instead.
In `@desci-server/src/controllers/data/uploadCentralized.ts`:
- Around line 54-55: Sanitize and validate the incoming contextPath before
building relativePath/r2Key: reject or normalize any input containing path
traversal patterns (e.g., "../" or leading "/" ), strip or replace path
separators, enforce an allowlist of safe characters and a max length, and fall
back to an empty string if validation fails; then construct relativePath as
`${contextPath}/${file.originalname}` (or just file.originalname) using the
sanitized value so r2Key (`${nodeUuid}/${relativePath}`) cannot be manipulated
to escape the intended bucket namespace. Ensure the checks target the variables
contextPath, relativePath and r2Key construction and also validate
file.originalname for unsafe characters.
In `@desci-server/src/routes/v1/auth.ts`:
- Around line 52-55: The route handler for router.get('/google/client-id')
checks process.env.GOOGLE_CLIENT_ID_AUTH but deployments expose
GOOGLE_CLIENT_ID; update the handler to read process.env.GOOGLE_CLIENT_ID
instead (and keep the same error message/context), i.e., replace references to
GOOGLE_CLIENT_ID_AUTH with GOOGLE_CLIENT_ID in that handler so /google/client-id
returns the configured client ID in deployed environments.
In `@desci-server/src/server.ts`:
- Around line 141-144: The middleware that skips helmet for paths starting with
'/test/' should only be active in non-production; update the this.app.use
middleware and any other similar block (the one that checks
req.path.startsWith('/test/')) to first check the environment (e.g.,
process.env.NODE_ENV !== 'production' or an existing isProduction/isDev flag)
and only mount the /test UI and short-circuit helmet when running in
non-production; ensure helmet() is applied normally in production for all paths
and that the /test route itself is only registered when not in production.
In `@desci-server/src/services/r2.ts`:
- Around line 50-62: getStreamFromR2 may cast response.Body to Readable even
when it's undefined; update getStreamFromR2 to defensively check that r2Client
exists (r2Client) and that response.Body is non-null before casting: if
response.Body is undefined throw a descriptive error or return a controlled
failure (include key and bucket in message), and ensure metadata
(response.Metadata) is still returned; reference GetObjectCommand, response.Body
and the getStreamFromR2 function when locating and updating the code.
---
Nitpick comments:
In `@desci-server/src/controllers/auth/magic.ts`:
- Around line 55-58: The query in prismaClient.magicLink.findFirst within
magic.ts uses orderBy: { createdAt: 'desc' } which is inconsistent with auth.ts
(which uses orderBy: { id: 'desc' }); change the orderBy to { id: 'desc' } to
match the existing pattern and avoid subtle differences from bulk inserts or
manual timestamps—update the prismaClient.magicLink.findFirst call that assigns
latest to use id descending so the same record selection logic is used across
auth.ts and magic.ts.
In `@desci-server/src/controllers/data/downloadCentralized.ts`:
- Around line 34-43: Combine the two-step check into a single DB query: call
prisma.privateShare.findFirst with where: { shareId, nodeUUID: normalizedUuid }
(instead of querying by shareId only and then validating nodeUUID with
ensureUuidEndsWithDot), then set hasAccess = true if privateShare is returned;
remove the separate privateShare.node and ensureUuidEndsWithDot(...)
conditional. This uses the existing variables shareId, normalizedUuid,
privateShare and hasAccess and the prisma.privateShare.findFirst call.
In `@desci-server/src/controllers/data/downloadCentralizedZip.ts`:
- Around line 28-54: The access-check block duplicated from
downloadCentralized.ts should be extracted into a shared utility (e.g.,
checkCentralizedDataAccess) or middleware and then used from
downloadCentralizedZip; implement a function that takes normalizedUuid, shareId
and user?.id, performs the two checks (privateShare by shareId AND nodeUUID
match, and node by uuid+ownerId) and returns boolean, replace the inline logic
in downloadCentralizedZip with a call to that utility, remove the
ensureUuidEndsWithDot/duplicate checks from the controller and ensure the
utility performs the normalized nodeUUID comparison consistently so both
endpoints share identical behavior.
In `@desci-server/src/controllers/data/uploadCentralized.ts`:
- Around line 52-70: The current loop in uploadCentralized.ts loads each file
fully via fs.promises.readFile (file.path) and then passes fileBuffer to
uploadToR2, which will OOM for large files; change to stream large files
instead: for files over a size threshold (e.g., file.size > X) use
fs.createReadStream(file.path) and a streaming upload path from uploadToR2 (or
the R2 client) while computing the SHA-256 by piping the stream through a
crypto.createHash instance (or using a Transform that updates the hash) so you
never hold the full file in memory; keep small files using the current readFile
path for compatibility, and still unlink file.path after the upload completes;
reference uploadToR2, the for (const file of files) loop, nodeUuid/contextPath,
and the read/unlink of file.path when locating where to implement this change.
In `@desci-server/src/services/r2.ts`:
- Around line 33-48: The uploadToR2 function accepts Readable streams but does
not set ContentLength or handle multipart uploads, which can fail for very large
streams; update the code/comments for uploadToR2 and PutObjectCommand to
document this limitation and recommend one of two fixes: (1) require callers to
provide ContentLength and handle single-part uploads, or (2) buffer streams to
memory (as done in uploadCentralized.ts) or implement multipart upload logic for
large objects; reference uploadToR2 and PutObjectCommand in the comment so
callers know to either pass a Buffer/known-length stream or use a multipart
strategy.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: e7775363-d5d2-4389-be89-06fb648846fb
⛔ Files ignored due to path filters (1)
desci-server/yarn.lockis excluded by!**/yarn.lock,!**/*.lock
📒 Files selected for processing (17)
.env.exampledesci-server/kubernetes/deployment_dev.yamldesci-server/kubernetes/deployment_prod.yamldesci-server/package.jsondesci-server/scripts/dev-local.shdesci-server/src/controllers/auth/magic.tsdesci-server/src/controllers/data/centralizedTree.tsdesci-server/src/controllers/data/downloadCentralized.tsdesci-server/src/controllers/data/downloadCentralizedZip.tsdesci-server/src/controllers/data/uploadCentralized.tsdesci-server/src/controllers/nodes/draftCreate.tsdesci-server/src/routes/v1/auth.tsdesci-server/src/routes/v1/data.tsdesci-server/src/server.tsdesci-server/src/services/r2.tsdesci-server/src/test/centralizedData.htmldocker-compose.dev.yml
There was a problem hiding this comment.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
desci-server/src/server.ts (1)
190-192:⚠️ Potential issue | 🔴 CriticalInfinite recursion in
httpServergetter.The getter references itself (
return this.httpServer) instead ofthis.server, causing stack overflow if accessed.🐛 Proposed fix
get httpServer() { - return this.httpServer; + return this.server; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@desci-server/src/server.ts` around lines 190 - 192, The httpServer getter currently returns itself causing infinite recursion; change the getter in the class that defines httpServer so it returns the underlying server instance (this.server) instead of this.httpServer, i.e., update the httpServer getter implementation to return this.server and ensure any backing field named server exists and is used consistently.
🧹 Nitpick comments (1)
desci-server/scripts/dev-local.sh (1)
87-89: TypeScript compilation error handling may be misleading.The pipeline
npx tsc 2>&1 | grep -c "error TS"will output0when there are no TypeScript errors, displaying "(0 type errors, non-blocking)" even when compilation succeeds cleanly. Consider showing a cleaner message for successful compilation.💡 Suggested improvement
-npx tsc 2>&1 | grep -c "error TS" | xargs -I{} echo " TypeScript compiled ({} type errors, non-blocking)" +TS_OUTPUT=$(npx tsc 2>&1) +TS_ERRORS=$(echo "$TS_OUTPUT" | grep -c "error TS" || true) +if [ "$TS_ERRORS" -gt 0 ]; then + echo " TypeScript compiled with $TS_ERRORS type errors (non-blocking)" +else + echo " TypeScript compiled successfully" +fi🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@desci-server/scripts/dev-local.sh` around lines 87 - 89, The current pipeline using `npx tsc 2>&1 | grep -c "error TS"` always prints "(0 type errors, non-blocking)" even on a clean build; change it to capture the grep count into a variable (run `npx tsc` and pipe to `grep -c "error TS"`), then branch on that variable: if zero print a clear "TypeScript compiled successfully" message, otherwise print the existing "(N type errors, non-blocking)" message; keep the final `exec node ./dist/index.js` behavior unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@desci-server/CENTRALIZED_DATA.md`:
- Around line 119-125: The Deployed URLs table in CENTRALIZED_DATA.md exposes a
production Test UI URL which conflicts with the requirement to gate the /test
route from production; update the documentation to either remove the Prod row
value (`https://nodes-api.desci.com/test/centralizedData.html`) or add a
clarifying note next to the Prod entry (or under the "Deployed URLs" heading)
stating that the /test route is disabled/gated in production and the listed URL
is intentionally non-functional in Prod, ensuring the table accurately reflects
the runtime gating behavior.
In `@desci-server/src/controllers/nodes/draftCreate.ts`:
- Around line 104-122: The current draft creation can write an empty string into
manifestDocumentId which breaks downstream code; update draftCreate.ts so you
only call prisma.node.update({ data: { manifestDocumentId } }) when
result.documentId is a non-empty string (i.e., truthy and length > 0) and
otherwise leave manifestDocumentId undefined/null on the node; keep the existing
logging paths (Automerge document created / creation returned empty result /
creation failed) but ensure you do not persist an empty string. Also audit
callers like getLatestManifestFromNode, getDraftManifest, getNodeManifest, and
dispatchAction to expect possible undefined/null manifestDocumentId (add guards
where they assume a DocumentId) so downstream casts do not receive an empty
string.
In `@desci-server/src/routes/v1/auth.ts`:
- Around line 50-51: The file registers the same route twice:
router.get('/orcid/validate', validateOrcid) appears on consecutive lines;
remove the duplicate registration so the route is only registered once (keep a
single router.get('/orcid/validate', validateOrcid) and delete the redundant
line), referencing the validateOrcid handler to ensure the route still points to
the correct function.
In `@desci-server/src/services/r2.ts`:
- Around line 79-100: The current implementation in the do/while loop
accumulates every object into the in-memory results array (results) while
calling r2Client.send(new ListObjectsV2Command(...)), which can OOM for large
prefixes; change the API of the enclosing service function to stream or return a
paginated response instead of materializing all entries: stop pushing every obj
into results, and instead emit or return each page (the response.Contents) to
the caller along with response.NextContinuationToken/IsTruncated so callers can
request further pages; update the function signature to return either an async
iterator/stream of pages or a {items:..., continuationToken:...} page object and
adjust callers to iterate/paginate rather than expecting a full list.
---
Outside diff comments:
In `@desci-server/src/server.ts`:
- Around line 190-192: The httpServer getter currently returns itself causing
infinite recursion; change the getter in the class that defines httpServer so it
returns the underlying server instance (this.server) instead of this.httpServer,
i.e., update the httpServer getter implementation to return this.server and
ensure any backing field named server exists and is used consistently.
---
Nitpick comments:
In `@desci-server/scripts/dev-local.sh`:
- Around line 87-89: The current pipeline using `npx tsc 2>&1 | grep -c "error
TS"` always prints "(0 type errors, non-blocking)" even on a clean build; change
it to capture the grep count into a variable (run `npx tsc` and pipe to `grep -c
"error TS"`), then branch on that variable: if zero print a clear "TypeScript
compiled successfully" message, otherwise print the existing "(N type errors,
non-blocking)" message; keep the final `exec node ./dist/index.js` behavior
unchanged.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 6794dad7-1a10-4db4-9130-e767f86a3b1f
⛔ Files ignored due to path filters (1)
desci-server/yarn.lockis excluded by!**/yarn.lock,!**/*.lock
📒 Files selected for processing (18)
.env.exampledesci-server/CENTRALIZED_DATA.mddesci-server/kubernetes/deployment_dev.yamldesci-server/kubernetes/deployment_prod.yamldesci-server/package.jsondesci-server/scripts/dev-local.shdesci-server/src/controllers/auth/magic.tsdesci-server/src/controllers/data/centralizedTree.tsdesci-server/src/controllers/data/downloadCentralized.tsdesci-server/src/controllers/data/downloadCentralizedZip.tsdesci-server/src/controllers/data/uploadCentralized.tsdesci-server/src/controllers/nodes/draftCreate.tsdesci-server/src/routes/v1/auth.tsdesci-server/src/routes/v1/data.tsdesci-server/src/server.tsdesci-server/src/services/r2.tsdesci-server/src/test/centralizedData.htmldocker-compose.dev.yml
🚧 Files skipped from review as they are similar to previous changes (9)
- desci-server/src/controllers/data/downloadCentralized.ts
- desci-server/src/controllers/data/uploadCentralized.ts
- desci-server/kubernetes/deployment_dev.yaml
- desci-server/src/controllers/data/downloadCentralizedZip.ts
- desci-server/package.json
- desci-server/kubernetes/deployment_prod.yaml
- docker-compose.dev.yml
- desci-server/src/test/centralizedData.html
- desci-server/src/controllers/auth/magic.ts
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (1)
desci-server/src/services/r2.ts (1)
51-52:⚠️ Potential issue | 🟠 MajorAvoid logging raw R2 object keys for private data.
The R2 keys are derived from user-provided filenames/paths. Logging them directly exposes private dataset structure in application logs. Consider logging a redacted form, a hash, or omitting the key entirely.
Also applies to line 79.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@desci-server/src/services/r2.ts` around lines 51 - 52, The code logs the raw R2 object key via logger.info({ key }, 'Uploaded to R2') which can leak private filenames; update the logging to avoid printing the raw "key" variable by either omitting it or replacing it with a redacted form (e.g., a hash or truncated mask) before calling logger.info; locate the logger.info calls in this file (the instances that pass the "key" variable) and replace them with a call that logs a safe identifier (e.g., compute and log a SHA256/hex of key or log only the first/last N chars or a static redaction token) while retaining context in the message.
🧹 Nitpick comments (6)
desci-server/CENTRALIZED_DATA.md (1)
34-37: Consider adding language identifiers to fenced code blocks.Several code blocks (lines 34, 41, 57, 62, 82, 93) lack language specifiers. Adding identifiers like
http,bash, ortextwould improve syntax highlighting and satisfy markdown lint rules.Example fix for endpoint blocks
-``` +```http POST /v1/auth/magic { email } → sends verification code🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@desci-server/CENTRALIZED_DATA.md` around lines 34 - 37, The fenced code blocks showing HTTP endpoint examples such as "POST /v1/auth/magic { email } → sends verification code" and similar blocks around the file should include language identifiers for syntax highlighting and markdown-lint compliance; update those backtick-fenced blocks (e.g., the endpoint examples at lines showing POST /v1/auth/magic and the other blocks referenced) to use an appropriate language tag like ```http or ```text so the blocks render correctly and pass lint rules.desci-server/src/controllers/data/downloadCentralized.ts (1)
57-65: Add error handling for stream pipe failures.If the R2 stream encounters an error mid-transfer (network issue, client disconnect), the error is not caught, which could leave the response in an inconsistent state.
Suggested improvement
- stream.pipe(res); + stream.on('error', (err) => { + logger.error({ err }, 'R2 stream error during download'); + if (!res.headersSent) { + res.status(500).json({ ok: false, message: 'Download failed' }); + } else { + res.destroy(); + } + }); + stream.pipe(res); } catch (err: any) {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@desci-server/src/controllers/data/downloadCentralized.ts` around lines 57 - 65, The current download handler pipes the R2 Readable (stream) to the Express response via stream.pipe(res) but doesn't handle stream or response errors/aborts, so mid-transfer failures are uncaught; update the code around stream.pipe(res) and the catch block (referencing stream.pipe(res), the surrounding try/catch and logger.error) to attach listeners: listen for 'error' on the R2 stream to log the error (using logger.error) and respond/cleanup (destroy/unpipe the stream and end the response if not finished), listen for 'close'/'aborted' on res to destroy the stream to stop work, and ensure any listeners are removed after finish to avoid leaks; also convert any uncaught stream errors to a 500 JSON response when possible and include the error details in logs.desci-server/prisma/schema.prisma (1)
584-601: NodeDataGrant model is well-designed.The soft-revoke pattern (
revokedAtinstead of deletion) is a good choice for audit trails. The composite unique constraint on[nodeUUID, granteeId]correctly enforces one grant per user per node.Consider adding an index on
grantedByIdif you anticipate queries like "list all grants I've given" becoming common:@@index([grantedById])🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@desci-server/prisma/schema.prisma` around lines 584 - 601, The schema's NodeDataGrant model is missing an index on grantedById which will slow queries that list grants by the grantor; update the NodeDataGrant model (the block defining NodeDataGrant with fields id, createdAt, updatedAt, nodeUUID, granteeId, grantedById, etc.) to add an @@index([grantedById]) directive alongside the existing @@unique and @@index entries so queries like "list all grants I've given" are efficient.desci-server/src/controllers/data/uploadCentralized.ts (2)
60-66: Unsafe filename rejection silently skips files.When a file has an unsafe name, it's skipped with a warning log but the upload continues. The response will show fewer files uploaded than provided. Consider returning an error for the specific file in the response so clients know which files were rejected.
🔧 Suggested improvement
+ const rejected: { name: string; reason: string }[] = []; + try { for (const file of files) { if (UNSAFE_PATH_PATTERN.test(file.originalname) || file.originalname.startsWith('/')) { logger.warn({ fileName: file.originalname }, 'Rejected unsafe filename'); + rejected.push({ name: file.originalname, reason: 'unsafe filename' }); await fs.promises.unlink(file.path).catch(() => {}); continue; } // ... rest of loop } - return res.status(200).json({ ok: true, files: uploaded }); + return res.status(200).json({ ok: true, files: uploaded, rejected });🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@desci-server/src/controllers/data/uploadCentralized.ts` around lines 60 - 66, The loop that checks UNSAFE_PATH_PATTERN currently logs and unlinks unsafe files but silently skips them; update the upload handling in the same function (the for (const file of files) loop that calls UNSAFE_PATH_PATTERN.test and logger.warn) to record each rejected file (e.g., push { fileName: file.originalname, error: 'unsafe filename' } into a rejectedFiles array or add an error entry into the per-file results array) before unlinking, and ensure the final response includes that rejectedFiles/errors list alongside successfully uploaded files so clients can see which files were rejected and why.
73-90: Streaming logic doesn't provide memory benefits for large files.The streaming branch for files >50MB still accumulates all chunks in memory (
chunks.push(chunk)) before callingBuffer.concat. This negates the memory-efficiency benefit of streaming.For true streaming uploads, consider piping directly to R2 with a pass-through hash stream, or accepting the current implementation if R2 SDK requires a complete buffer.
🔧 If R2 SDK supports stream uploads
if (file.size > STREAM_THRESHOLD) { const hashStream = crypto.createHash('sha256'); - const chunks: Buffer[] = []; - const collector = new Transform({ - transform(chunk, _encoding, cb) { - hashStream.update(chunk); - chunks.push(chunk); - cb(null, chunk); - }, - }); const readStream = fs.createReadStream(file.path); - await pipeline(readStream, collector); + // Compute hash first + await pipeline(readStream, hashStream); hash = hashStream.digest('hex'); - const fileBuffer = Buffer.concat(chunks); + // Then read file for upload (or use stream if R2 SDK supports it) + const fileBuffer = await fs.promises.readFile(file.path); await uploadToR2(r2Key, fileBuffer, {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@desci-server/src/controllers/data/uploadCentralized.ts` around lines 73 - 90, The streaming branch for files > STREAM_THRESHOLD is defeating memory savings by collecting chunks into the chunks array and calling Buffer.concat; modify the logic so the Transform "collector" (or a PassThrough) pipes directly into uploadToR2 without accumulating buffers while still updating the crypto.createHash('sha256') as data flows, compute hashStream.digest('hex') after the stream ends, and pass the resulting stream (or PassThrough) to uploadToR2 (r2Key, stream, { 'content-hash': hash, 'mime-type': ... }); if the uploadToR2 API requires a full Buffer, keep the current fallback but document/branch clearly so large-file streaming uses the stream path.desci-server/src/controllers/nodes/dataGrants.ts (1)
131-155:listMyGrantsmay return grants for deleted nodes.Consider filtering out grants where the associated node is deleted:
🔧 Suggested filter
const grants = await prisma.nodeDataGrant.findMany({ - where: { granteeId: user.id, revokedAt: null }, + where: { granteeId: user.id, revokedAt: null, node: { isDeleted: false } }, include: { node: { select: { uuid: true, title: true } }, grantedBy: { select: { email: true, name: true } }, }, orderBy: { createdAt: 'desc' }, });🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@desci-server/src/controllers/nodes/dataGrants.ts` around lines 131 - 155, The current listMyGrants returns grants even if the related node was soft-deleted; update the prisma query inside listMyGrants to only fetch grants whose node is not deleted by adding a relation filter (e.g., include node: { deletedAt: null } in the where clause) and defensively handle the mapping (ensure g.node exists before using g.node.uuid/title) so deleted or missing nodes are excluded from the response.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@desci-server/src/controllers/auth/magic.ts`:
- Around line 54-63: The current logic fetches a development token into devCode
unconditionally in non-production, which can be returned even if the send failed
(ok is false); modify the block around prismaClient.magicLink.findFirst so that
you only assign or include devCode when ok is true (i.e., after the send
succeeded). Concretely: after determining ok, if process.env.NODE_ENV !==
'production' and ok is true, call prismaClient.magicLink.findFirst({ where: {
email: { equals: cleanEmail, mode: 'insensitive' } }, orderBy: { id: 'desc' } })
and set devCode = latest?.token, then return res.send({ ok: !!ok, ...(devCode ?
{ devCode } : {}) }) as before so stale tokens are not exposed when ok is false.
In `@desci-server/src/controllers/nodes/dataGrants.ts`:
- Around line 71-73: The prisma.node.findFirst lookup in revokeDataGrant and
listDataGrants is missing the same isDeleted: false condition used in
createDataGrant, which allows operating on deleted nodes; update the where
clause in the node lookup inside revokeDataGrant and listDataGrants to include
isDeleted: false (matching the filter in createDataGrant) so both
revokeDataGrant and listDataGrants ignore deleted nodes when resolving the node
by normalizedUuid and owner.id.
In `@desci-server/src/test/centralizedData.html`:
- Around line 1166-1170: testTreeDeniedNoAuth currently treats 200 and 403 as
success so it never fails; update the test so that it only returns success when
access is denied (i.e. resp.status === 403) instead of allowing resp.ok, and if
you also need to verify published nodes return 200 create a separate test (e.g.,
testTreeAllowedWithAuth or testTreePublishedReturns200) that asserts resp.ok for
the published-node scenario; locate the change in the async function
testTreeDeniedNoAuth and adjust the return condition accordingly.
---
Duplicate comments:
In `@desci-server/src/services/r2.ts`:
- Around line 51-52: The code logs the raw R2 object key via logger.info({ key
}, 'Uploaded to R2') which can leak private filenames; update the logging to
avoid printing the raw "key" variable by either omitting it or replacing it with
a redacted form (e.g., a hash or truncated mask) before calling logger.info;
locate the logger.info calls in this file (the instances that pass the "key"
variable) and replace them with a call that logs a safe identifier (e.g.,
compute and log a SHA256/hex of key or log only the first/last N chars or a
static redaction token) while retaining context in the message.
---
Nitpick comments:
In `@desci-server/CENTRALIZED_DATA.md`:
- Around line 34-37: The fenced code blocks showing HTTP endpoint examples such
as "POST /v1/auth/magic { email } → sends verification code" and
similar blocks around the file should include language identifiers for syntax
highlighting and markdown-lint compliance; update those backtick-fenced blocks
(e.g., the endpoint examples at lines showing POST /v1/auth/magic and the other
blocks referenced) to use an appropriate language tag like ```http or ```text so
the blocks render correctly and pass lint rules.
In `@desci-server/prisma/schema.prisma`:
- Around line 584-601: The schema's NodeDataGrant model is missing an index on
grantedById which will slow queries that list grants by the grantor; update the
NodeDataGrant model (the block defining NodeDataGrant with fields id, createdAt,
updatedAt, nodeUUID, granteeId, grantedById, etc.) to add an
@@index([grantedById]) directive alongside the existing @@unique and @@index
entries so queries like "list all grants I've given" are efficient.
In `@desci-server/src/controllers/data/downloadCentralized.ts`:
- Around line 57-65: The current download handler pipes the R2 Readable (stream)
to the Express response via stream.pipe(res) but doesn't handle stream or
response errors/aborts, so mid-transfer failures are uncaught; update the code
around stream.pipe(res) and the catch block (referencing stream.pipe(res), the
surrounding try/catch and logger.error) to attach listeners: listen for 'error'
on the R2 stream to log the error (using logger.error) and respond/cleanup
(destroy/unpipe the stream and end the response if not finished), listen for
'close'/'aborted' on res to destroy the stream to stop work, and ensure any
listeners are removed after finish to avoid leaks; also convert any uncaught
stream errors to a 500 JSON response when possible and include the error details
in logs.
In `@desci-server/src/controllers/data/uploadCentralized.ts`:
- Around line 60-66: The loop that checks UNSAFE_PATH_PATTERN currently logs and
unlinks unsafe files but silently skips them; update the upload handling in the
same function (the for (const file of files) loop that calls
UNSAFE_PATH_PATTERN.test and logger.warn) to record each rejected file (e.g.,
push { fileName: file.originalname, error: 'unsafe filename' } into a
rejectedFiles array or add an error entry into the per-file results array)
before unlinking, and ensure the final response includes that
rejectedFiles/errors list alongside successfully uploaded files so clients can
see which files were rejected and why.
- Around line 73-90: The streaming branch for files > STREAM_THRESHOLD is
defeating memory savings by collecting chunks into the chunks array and calling
Buffer.concat; modify the logic so the Transform "collector" (or a PassThrough)
pipes directly into uploadToR2 without accumulating buffers while still updating
the crypto.createHash('sha256') as data flows, compute hashStream.digest('hex')
after the stream ends, and pass the resulting stream (or PassThrough) to
uploadToR2 (r2Key, stream, { 'content-hash': hash, 'mime-type': ... }); if the
uploadToR2 API requires a full Buffer, keep the current fallback but
document/branch clearly so large-file streaming uses the stream path.
In `@desci-server/src/controllers/nodes/dataGrants.ts`:
- Around line 131-155: The current listMyGrants returns grants even if the
related node was soft-deleted; update the prisma query inside listMyGrants to
only fetch grants whose node is not deleted by adding a relation filter (e.g.,
include node: { deletedAt: null } in the where clause) and defensively handle
the mapping (ensure g.node exists before using g.node.uuid/title) so deleted or
missing nodes are excluded from the response.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 39e419e5-03c8-4fa6-a7f2-fea6495d5514
📒 Files selected for processing (19)
.env.exampledesci-server/CENTRALIZED_DATA.mddesci-server/prisma/migrations/20260317105928_add_node_data_grant/migration.sqldesci-server/prisma/schema.prismadesci-server/scripts/dev-local.shdesci-server/src/controllers/auth/magic.tsdesci-server/src/controllers/data/centralizedTree.tsdesci-server/src/controllers/data/downloadCentralized.tsdesci-server/src/controllers/data/downloadCentralizedZip.tsdesci-server/src/controllers/data/uploadCentralized.tsdesci-server/src/controllers/nodes/dataGrants.tsdesci-server/src/controllers/nodes/draftCreate.tsdesci-server/src/routes/v1/auth.tsdesci-server/src/routes/v1/data.tsdesci-server/src/routes/v1/nodes.tsdesci-server/src/server.tsdesci-server/src/services/centralizedDataAccess.tsdesci-server/src/services/r2.tsdesci-server/src/test/centralizedData.html
🚧 Files skipped from review as they are similar to previous changes (3)
- desci-server/src/routes/v1/auth.ts
- desci-server/scripts/dev-local.sh
- desci-server/src/controllers/data/centralizedTree.ts
…s + test UI overhaul Add user-level access grants for centralized data so workflows can authenticate as users who have been explicitly granted access by the node owner, solving the problem of shareId exposure in FDOs. New features: - NodeDataGrant Prisma model with soft-revoke (revokedAt) - Shared access check helper (centralizedDataAccess.ts) replacing duplicated auth logic in download endpoints - Grant management CRUD endpoints (POST/DELETE/GET /v1/nodes/grant) - Tree endpoint gated for unpublished nodes - Test UI: "Shared With Me" panel, grant management UI, curl reference Security fixes: - Path traversal validation on upload contextPath and filenames - Content-Disposition header injection sanitization (RFC5987) - Helmet + test UI gated from production (NODE_ENV check) - R2 getStreamFromR2 null body check - httpServer getter infinite recursion fix - Real R2 credentials removed from .env.example Bug fixes: - Duplicate /orcid/validate route removed - Empty manifestDocumentId guard in draftCreate - GOOGLE_CLIENT_ID_AUTH → GOOGLE_CLIENT_ID - magic.ts orderBy consistency (id desc) Improvements: - listR2Objects converted to async generator (listR2ObjectsPages) - Streaming upload for files >50MB - dev-local.sh: platform-aware Sentry binary, ipfs_public/ipfs_guest services, improved TypeScript error reporting
- magic.ts: only fetch devCode when send succeeded (ok is true) - dataGrants: add isDeleted:false to revokeDataGrant/listDataGrants, filter deleted nodes from listMyGrants - downloadCentralized: add stream error + response close handlers - uploadCentralized: report rejected files in response, fix streaming branch to not accumulate chunks in memory (two-pass: hash then stream) - r2.ts: redact file keys in log output (show only prefix) - schema: add @@index([grantedById]) to NodeDataGrant - testTreeDeniedNoAuth: only pass on 403 (not 200) - CENTRALIZED_DATA.md: add language tags to fenced code blocks
4c3dd6e to
3ce6d79
Compare
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@desci-server/src/controllers/data/downloadCentralized.ts`:
- Around line 31-38: Move the await checkCentralizedDataAccess(normalizedUuid,
shareId, user) call into the existing try/catch inside the downloadCentralized
controller so any thrown errors are caught by the existing catch handler;
specifically, call and await checkCentralizedDataAccess within the try block,
preserve the hasAccess check and the res.status(403).json({ ok: false, message:
... }) early-return behavior, and rely on the current catch block to handle
unexpected exceptions from checkCentralizedDataAccess.
In `@desci-server/src/controllers/data/uploadCentralized.ts`:
- Around line 34-38: The validation currently assumes req.body.contextPath is a
string and calls contextPath.startsWith(...) which will throw for non-strings;
update the check around contextPath (used in this file and the variable
contextPath and the UNSAFE_PATH_PATTERN) to first ensure typeof
req.body.contextPath === 'string' (or coerce safely) before assigning/using
contextPath, and return the same 400 validation response for non-string values;
adjust the if condition that uses contextPath.startsWith('/') and
UNSAFE_PATH_PATTERN.test(contextPath) to only run when contextPath is a string
to avoid a 500 error.
- Around line 12-13: The multer setup (localUpload and uploadMiddleware) has no
limits and can exhaust tmp storage; update the multer configuration to include a
limits object (e.g., limits: { fileSize: <maxBytes>, files: <maxFiles> }) or use
multer(...).array('files', <maxFiles>) to cap per-request file count and
per-file size, keeping preservePath and dest as-is; adjust <maxBytes>/<maxFiles>
to values appropriate for your app and ensure uploadMiddleware still references
the same symbol (uploadMiddleware) after the change.
- Line 36: Replace the hex escape sequence in the UNSAFE_PATH_PATTERN regex with
a unicode escape to satisfy Biome linting: update the constant
UNSAFE_PATH_PATTERN (currently /(\.\.|\\|[\x00-\x1f])/) so the control-character
range uses unicode escapes (e.g., [\u0000-\u001f]) while keeping the same
overall pattern and behavior.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 9340fbc9-9f42-4073-8dd1-092fbca1fc48
📒 Files selected for processing (9)
desci-server/CENTRALIZED_DATA.mddesci-server/prisma/migrations/20260317112936_add_granted_by_index/migration.sqldesci-server/prisma/schema.prismadesci-server/src/controllers/auth/magic.tsdesci-server/src/controllers/data/downloadCentralized.tsdesci-server/src/controllers/data/uploadCentralized.tsdesci-server/src/controllers/nodes/dataGrants.tsdesci-server/src/services/r2.tsdesci-server/src/test/centralizedData.html
🚧 Files skipped from review as they are similar to previous changes (4)
- desci-server/src/controllers/auth/magic.ts
- desci-server/src/test/centralizedData.html
- desci-server/src/controllers/nodes/dataGrants.ts
- desci-server/CENTRALIZED_DATA.md
| const localUpload = multer({ dest: os.tmpdir(), preservePath: true }); | ||
| const uploadMiddleware = localUpload.array('files'); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
fd -t f "uploadCentralized.ts" --exec wc -l {}Repository: desci-labs/nodes
Length of output: 120
🏁 Script executed:
fd -t f "uploadCentralized.ts" --exec cat -n {}Repository: desci-labs/nodes
Length of output: 5286
🏁 Script executed:
rg "CENTRALIZED_UPLOAD_MAX" -t ts -t jsRepository: desci-labs/nodes
Length of output: 42
🏁 Script executed:
rg "process\.env\." desci-server/src/controllers/data/uploadCentralized.ts -A 2 -B 2Repository: desci-labs/nodes
Length of output: 42
Add upload limits to prevent tmp-dir exhaustion.
multer({ dest: os.tmpdir(), preservePath: true }) currently has no limits on file count or file size. A single request can write unlimited data to the temporary directory before files are uploaded to R2, risking disk exhaustion and denial of service.
Suggested fix
-const localUpload = multer({ dest: os.tmpdir(), preservePath: true });
+const localUpload = multer({
+ dest: os.tmpdir(),
+ preservePath: true,
+ limits: {
+ files: Number(process.env.CENTRALIZED_UPLOAD_MAX_FILES ?? 100),
+ fileSize: Number(process.env.CENTRALIZED_UPLOAD_MAX_FILE_SIZE ?? 1024 * 1024 * 1024), // 1 GiB default
+ },
+});📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const localUpload = multer({ dest: os.tmpdir(), preservePath: true }); | |
| const uploadMiddleware = localUpload.array('files'); | |
| const localUpload = multer({ | |
| dest: os.tmpdir(), | |
| preservePath: true, | |
| limits: { | |
| files: Number(process.env.CENTRALIZED_UPLOAD_MAX_FILES ?? 100), | |
| fileSize: Number(process.env.CENTRALIZED_UPLOAD_MAX_FILE_SIZE ?? 1024 * 1024 * 1024), // 1 GiB default | |
| }, | |
| }); | |
| const uploadMiddleware = localUpload.array('files'); |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@desci-server/src/controllers/data/uploadCentralized.ts` around lines 12 - 13,
The multer setup (localUpload and uploadMiddleware) has no limits and can
exhaust tmp storage; update the multer configuration to include a limits object
(e.g., limits: { fileSize: <maxBytes>, files: <maxFiles> }) or use
multer(...).array('files', <maxFiles>) to cap per-request file count and
per-file size, keeping preservePath and dest as-is; adjust <maxBytes>/<maxFiles>
to values appropriate for your app and ensure uploadMiddleware still references
the same symbol (uploadMiddleware) after the change.
| const contextPath = req.body.contextPath || ''; | ||
|
|
||
| const UNSAFE_PATH_PATTERN = /(\.\.|\\|[\x00-\x1f])/; | ||
| if (contextPath && (contextPath.startsWith('/') || UNSAFE_PATH_PATTERN.test(contextPath))) { | ||
| return res.status(400).send({ ok: false, message: 'Invalid contextPath: must not contain path traversal sequences, backslashes, or control characters' }); |
There was a problem hiding this comment.
Guard contextPath type before string operations.
If req.body.contextPath is non-string (e.g., array/object), contextPath.startsWith(...) throws and returns a 500 instead of a 400 validation error.
Suggested fix
- const contextPath = req.body.contextPath || '';
+ const rawContextPath = req.body.contextPath;
+ if (rawContextPath != null && typeof rawContextPath !== 'string') {
+ return res.status(400).send({ ok: false, message: 'Invalid contextPath: must be a string' });
+ }
+ const contextPath = rawContextPath ?? '';📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const contextPath = req.body.contextPath || ''; | |
| const UNSAFE_PATH_PATTERN = /(\.\.|\\|[\x00-\x1f])/; | |
| if (contextPath && (contextPath.startsWith('/') || UNSAFE_PATH_PATTERN.test(contextPath))) { | |
| return res.status(400).send({ ok: false, message: 'Invalid contextPath: must not contain path traversal sequences, backslashes, or control characters' }); | |
| const rawContextPath = req.body.contextPath; | |
| if (rawContextPath != null && typeof rawContextPath !== 'string') { | |
| return res.status(400).send({ ok: false, message: 'Invalid contextPath: must be a string' }); | |
| } | |
| const contextPath = rawContextPath ?? ''; | |
| const UNSAFE_PATH_PATTERN = /(\.\.|\\|[\x00-\x1f])/; | |
| if (contextPath && (contextPath.startsWith('/') || UNSAFE_PATH_PATTERN.test(contextPath))) { | |
| return res.status(400).send({ ok: false, message: 'Invalid contextPath: must not contain path traversal sequences, backslashes, or control characters' }); |
🧰 Tools
🪛 Biome (2.4.7)
[error] 36-36: Unexpected control character in a regular expression.
(lint/suspicious/noControlCharactersInRegex)
[error] 36-36: Unexpected control character in a regular expression.
(lint/suspicious/noControlCharactersInRegex)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@desci-server/src/controllers/data/uploadCentralized.ts` around lines 34 - 38,
The validation currently assumes req.body.contextPath is a string and calls
contextPath.startsWith(...) which will throw for non-strings; update the check
around contextPath (used in this file and the variable contextPath and the
UNSAFE_PATH_PATTERN) to first ensure typeof req.body.contextPath === 'string'
(or coerce safely) before assigning/using contextPath, and return the same 400
validation response for non-string values; adjust the if condition that uses
contextPath.startsWith('/') and UNSAFE_PATH_PATTERN.test(contextPath) to only
run when contextPath is a string to avoid a 500 error.
| const node = (req as any).node; | ||
| const contextPath = req.body.contextPath || ''; | ||
|
|
||
| const UNSAFE_PATH_PATTERN = /(\.\.|\\|[\x00-\x1f])/; |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# First, let's find and examine the file
find . -name "uploadCentralized.ts" -type fRepository: desci-labs/nodes
Length of output: 116
🏁 Script executed:
#!/bin/bash
# Check the file content around line 36
cat -n desci-server/src/controllers/data/uploadCentralized.ts | sed -n '30,45p'Repository: desci-labs/nodes
Length of output: 850
🏁 Script executed:
#!/bin/bash
# Look for Biome configuration in the repository
find . -name "biome.json" -o -name "biome.jsonc" -o -name ".biomerc*" | head -10Repository: desci-labs/nodes
Length of output: 72
🏁 Script executed:
#!/bin/bash
# Check the Biome configuration
cat biome.jsonRepository: desci-labs/nodes
Length of output: 1834
🏁 Script executed:
#!/bin/bash
# Search for any lint rules related to control characters or regex in Biome config
cat biome.json | jq . 2>/dev/null || cat biome.jsonRepository: desci-labs/nodes
Length of output: 1834
🏁 Script executed:
#!/bin/bash
# Let's verify that the regex patterns are functionally equivalent
# Testing if \x00-\x1f and \u0000-\u001f produce the same character ranges
python3 << 'EOF'
import re
# Create regex patterns using hex and unicode escapes
hex_pattern = r'[\x00-\x1f]'
unicode_pattern = r'[\u0000-\u001f]'
# Create actual compiled regex objects
# Note: In Python, we need to use raw strings carefully
hex_regex = re.compile('[\x00-\x1f]')
unicode_regex = re.compile('[\u0000-\u001f]')
# Test characters in the range
test_chars = [
'\x00', # null
'\x0a', # newline
'\x1f', # unit separator (last control char)
'\x20', # space (first non-control)
'a' # regular char
]
print("Testing character matching:")
print(f"{'Char':<10} {'Hex Match':<15} {'Unicode Match':<15}")
print("-" * 40)
all_match = True
for char in test_chars:
hex_match = bool(hex_regex.search(char))
unicode_match = bool(unicode_regex.search(char))
match_status = "✓" if hex_match == unicode_match else "✗"
print(f"{repr(char):<10} {str(hex_match):<15} {str(unicode_match):<15} {match_status}")
if hex_match != unicode_match:
all_match = False
print("\nBoth patterns are functionally equivalent:" if all_match else "\nPatterns differ!")
EOFRepository: desci-labs/nodes
Length of output: 440
Use unicode escapes instead of hex escapes in regex to comply with Biome linting.
The regex pattern with hex escape sequences [\x00-\x1f] triggers Biome's control character lint rule. Replace with unicode escapes [\u0000-\u001f] for better readability and compliance—both patterns are functionally identical.
Suggested fix
- const UNSAFE_PATH_PATTERN = /(\.\.|\\|[\x00-\x1f])/;
+ const UNSAFE_PATH_PATTERN = /(\.\.|\\|[\u0000-\u001f])/;📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const UNSAFE_PATH_PATTERN = /(\.\.|\\|[\x00-\x1f])/; | |
| const UNSAFE_PATH_PATTERN = /(\.\.|\\|[\u0000-\u001f])/; |
🧰 Tools
🪛 Biome (2.4.7)
[error] 36-36: Unexpected control character in a regular expression.
(lint/suspicious/noControlCharactersInRegex)
[error] 36-36: Unexpected control character in a regular expression.
(lint/suspicious/noControlCharactersInRegex)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@desci-server/src/controllers/data/uploadCentralized.ts` at line 36, Replace
the hex escape sequence in the UNSAFE_PATH_PATTERN regex with a unicode escape
to satisfy Biome linting: update the constant UNSAFE_PATH_PATTERN (currently
/(\.\.|\\|[\x00-\x1f])/) so the control-character range uses unicode escapes
(e.g., [\u0000-\u001f]) while keeping the same overall pattern and behavior.
There was a problem hiding this comment.
Actionable comments posted: 9
♻️ Duplicate comments (4)
desci-server/src/controllers/data/uploadCentralized.ts (3)
34-38:⚠️ Potential issue | 🟠 MajorValidate
contextPathis a string before using string methods.A non-string multipart value here still falls through to
startsWith()/.test(), which turns a bad request into a 500 instead of a 400.Minimal guard
- const contextPath = req.body.contextPath || ''; + const rawContextPath = req.body.contextPath; + if (rawContextPath != null && typeof rawContextPath !== 'string') { + return res.status(400).send({ ok: false, message: 'Invalid contextPath: must be a string' }); + } + const contextPath = rawContextPath ?? '';🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@desci-server/src/controllers/data/uploadCentralized.ts` around lines 34 - 38, The handler currently assumes req.body.contextPath is a string and calls contextPath.startsWith and UNSAFE_PATH_PATTERN.test, which throws on non-strings; update the validation around the contextPath variable in uploadCentralized.ts to first check typeof contextPath === 'string' (or explicitly coerce only after that check) and return a 400 error for non-string values before running contextPath.startsWith('/') or UNSAFE_PATH_PATTERN.test(contextPath), referencing the existing contextPath variable and UNSAFE_PATH_PATTERN to locate where to add the guard.
12-13:⚠️ Potential issue | 🟠 MajorCap the temp upload size/count before Multer writes to disk.
This middleware still accepts an unbounded number of files and per-file size, so a single request can exhaust
os.tmpdir()before the controller gets a chance to reject it.Add Multer limits
-const localUpload = multer({ dest: os.tmpdir(), preservePath: true }); +const localUpload = multer({ + dest: os.tmpdir(), + preservePath: true, + limits: { + files: Number(process.env.CENTRALIZED_UPLOAD_MAX_FILES ?? 100), + fileSize: Number(process.env.CENTRALIZED_UPLOAD_MAX_FILE_SIZE ?? 1024 * 1024 * 1024), + }, +});🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@desci-server/src/controllers/data/uploadCentralized.ts` around lines 12 - 13, The multer configuration (localUpload / uploadMiddleware) currently allows an unbounded number of files and per-file size which can exhaust os.tmpdir(); update the localUpload initialization to include a sensible limits object (e.g., limits: { files: <maxFiles>, fileSize: <maxBytesPerFile> } or use configured constants) so Multer enforces the cap before writing to disk, and keep uploadMiddleware = localUpload.array('files') unchanged; ensure you choose appropriate maxFiles and maxBytesPerFile values (or read them from config/env) and handle the resulting Multer errors in the controller.
36-36:⚠️ Potential issue | 🟠 MajorBiome will reject
UNSAFE_PATH_PATTERNas written.Line 36 still uses control-character hex escapes that Biome flags as
noControlCharactersInRegex.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@desci-server/src/controllers/data/uploadCentralized.ts` at line 36, UNSAFE_PATH_PATTERN uses hex control-character escapes ([\x00-\x1f]) which Biome flags; replace that character class with a Unicode property escape (e.g. use \p{Cc} to match control characters) and enable the Unicode flag on the regex (change /.../ to /.../u) so the pattern becomes equivalent but Biome-safe; update the constant UNSAFE_PATH_PATTERN accordingly and keep the existing checks for ".." and backslash intact.desci-server/src/controllers/data/downloadCentralized.ts (1)
31-38:⚠️ Potential issue | 🟠 MajorWrap the access check in the existing error boundary.
checkCentralizedDataAccess()still runs before thetry, so a thrown auth/data-grant lookup skips the controller's catch path and bubbles out as an unhandled failure.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@desci-server/src/controllers/data/downloadCentralized.ts` around lines 31 - 38, The call to checkCentralizedDataAccess (which returns hasAccess) is executed outside the controller's try block so any exceptions it throws bypass the controller's catch; move the checkCentralizedDataAccess invocation (using normalizedUuid, shareId, user) inside the existing try block or wrap it in the same try so thrown errors are caught by the controller's catch, and retain the existing behavior that sends res.status(403).json(...) when hasAccess is false; ensure the catch then handles and logs unexpected errors consistently with the rest of the controller.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@desci-server/scripts/dev-local.sh`:
- Line 2: The script's error handling is masking failures because pipelines with
tail can return a zero exit status; update the shell options to enable pipefail
so upstream command failures (e.g., npx prisma generate and npx prisma migrate
deploy) cause the script to exit. Replace the top-level "set -e" with a form
that enables pipefail (e.g., add "-o pipefail" or use "set -euo pipefail") and
also update the later "set" invocation around the pipelines (the lines near the
prisma commands/tail usage) to include pipefail so the pipeline exit status
reflects failures.
In `@desci-server/src/controllers/data/centralizedTree.ts`:
- Around line 91-108: The Prisma query prisma.node.findFirst(...) and the access
check call checkCentralizedDataAccess(...) should be executed inside the
existing try block so any thrown errors are caught and handled by the structured
500 response; move the retrieval of node (using normalizedUuid) and the
subsequent isPublished logic (including reading req.query.shareId and invoking
checkCentralizedDataAccess) into the try scope and ensure that on unexpected
exceptions you return the same res.status(500).json(...) path as other try/catch
branches.
- Around line 114-124: The path filter currently uses
startsWith(`${normalizedUuid}/${pathFilter}`) which matches partial names (e.g.,
"foo" matches "foobar.txt"); change the filtering to anchor to a directory
boundary by normalizing the filterPrefix to include a trailing slash (e.g.,
`${normalizedUuid}/${pathFilter}/`) when treating pathFilter as a directory, or
add an exact-file branch that checks equality against the full key when
pathFilter refers to a file; update the computation of filterPrefix,
filteredObjects, and treePrefix/normalizedTreePrefix (the variables and
functions involved are pathFilter, normalizedUuid, filteredObjects, treePrefix,
normalizedTreePrefix, and buildTreeSimple) so buildTreeSimple receives only keys
rooted at the correct directory or the single exact file.
In `@desci-server/src/controllers/data/downloadCentralized.ts`:
- Around line 45-47: The regex in the sanitizedFileName computation (the two
.replace calls on fileName) contains control-character hex escapes that trigger
Biome's noControlCharactersInRegex rule; either add a Biome ignore directive
immediately above the offending line (e.g., an inline biome-ignore for
noControlCharactersInRegex) or refactor the sanitization to avoid explicit
control-character ranges (for example, use an explicit allowlist of safe
printable characters or Unicode property classes) so the replace calls no longer
contain \x00-\x1f or similar ranges; update the code around
sanitizedFileName/fileName accordingly.
In `@desci-server/src/controllers/data/downloadCentralizedZip.ts`:
- Around line 29-36: Move the await checkCentralizedDataAccess(normalizedUuid,
shareId, user) call inside the existing try block so any exceptions from
auth/data-grant lookup are caught by the controller's catch; specifically, wrap
the invocation of checkCentralizedDataAccess (and the subsequent hasAccess check
that sends the 403 via res.status(403).json(...)) inside the try in
downloadCentralizedZip, preserving the existing 403 response when hasAccess is
false and ensuring thrown errors flow to the catch handler.
- Around line 68-74: The code currently appends untrusted relativePath into
archive (see variables relativePath and archive.append inside the loop that
iterates objects and calls getStreamFromR2), so sanitize and validate the entry
name before appending: convert backslashes to forward slashes, run
path.posix.normalize on the result, strip any leading slashes, remove control
characters (e.g. non-printable chars via a regex), then reject or skip the entry
if normalize produces empty string or contains any ".." path segment (or starts
with "..") to prevent traversal; only after these checks use the sanitized name
when calling archive.append(stream, { name: safeName }). Ensure this logic lives
just before archive.append in downloadCentralizedZip.ts.
- Around line 48-50: The regexes use control-character hex escapes that violate
Biome's noControlCharactersInRegex rule; update the sanitize logic (the
sanitizedZipName assignment that reads zipName...) to use Unicode escapes
instead, e.g. replace /[\x00-\x1f\x7f"\\]/g with /[\u0000-\u001F\u007F"\\]/g and
/[^\x20-\x7e]/g with /[^\u0020-\u007E]/g, and make the same change in
downloadCentralized.ts so the file lints cleanly under the recommended Biome
rules.
In `@desci-server/src/controllers/nodes/draftCreate.ts`:
- Around line 39-52: The current try/catch wraps both makeManifest and
addBufferToIpfs so manifest construction errors are misclassified as IPFS
outages; split the logic so makeManifest(...) is called first and allowed to
throw (so invalid input/regressions surface) and then call addBufferToIpfs(...)
inside its own try/catch. Concretely: call makeManifest({ title, researchFields,
defaultLicense: defaultLicense || '', ipfsNode: getNodeToUse(isGuest) }) and
assign researchObject from manifestResult; then in a separate try block call
addBufferToIpfs(manifestResult.manifest, '', getNodeToUse(isGuest)) and on
failure catch only IPFS errors (log via logger.warn with err) and leave hash/uri
unset/default. Ensure references to makeManifest, manifestResult,
addBufferToIpfs, researchObject, hash, uri and getNodeToUse(isGuest) are updated
accordingly.
In `@desci-server/src/routes/v1/data.ts`:
- Line 32: The current middleware order runs Multer (centralizedUploadHandler)
before authorization (ensureNodeAccess), causing orphan temp files on rejection;
either move ensureNodeAccess to run before centralizedUploadHandler in the
router registration (so Multer never writes unless authorized), or add cleanup
logic inside ensureNodeAccess to delete any uploaded temp files (inspect
req.file/req.files or the Multer temp paths and unlink them) before returning
the missing-uuid or unauthorized errors referenced in authorisation.ts (lines
referenced around the missing uuid check and unauthorized path). Ensure
uploadCentralized remains the final handler and that cleanup uses the same temp
file locations Multer writes to.
---
Duplicate comments:
In `@desci-server/src/controllers/data/downloadCentralized.ts`:
- Around line 31-38: The call to checkCentralizedDataAccess (which returns
hasAccess) is executed outside the controller's try block so any exceptions it
throws bypass the controller's catch; move the checkCentralizedDataAccess
invocation (using normalizedUuid, shareId, user) inside the existing try block
or wrap it in the same try so thrown errors are caught by the controller's
catch, and retain the existing behavior that sends res.status(403).json(...)
when hasAccess is false; ensure the catch then handles and logs unexpected
errors consistently with the rest of the controller.
In `@desci-server/src/controllers/data/uploadCentralized.ts`:
- Around line 34-38: The handler currently assumes req.body.contextPath is a
string and calls contextPath.startsWith and UNSAFE_PATH_PATTERN.test, which
throws on non-strings; update the validation around the contextPath variable in
uploadCentralized.ts to first check typeof contextPath === 'string' (or
explicitly coerce only after that check) and return a 400 error for non-string
values before running contextPath.startsWith('/') or
UNSAFE_PATH_PATTERN.test(contextPath), referencing the existing contextPath
variable and UNSAFE_PATH_PATTERN to locate where to add the guard.
- Around line 12-13: The multer configuration (localUpload / uploadMiddleware)
currently allows an unbounded number of files and per-file size which can
exhaust os.tmpdir(); update the localUpload initialization to include a sensible
limits object (e.g., limits: { files: <maxFiles>, fileSize: <maxBytesPerFile> }
or use configured constants) so Multer enforces the cap before writing to disk,
and keep uploadMiddleware = localUpload.array('files') unchanged; ensure you
choose appropriate maxFiles and maxBytesPerFile values (or read them from
config/env) and handle the resulting Multer errors in the controller.
- Line 36: UNSAFE_PATH_PATTERN uses hex control-character escapes ([\x00-\x1f])
which Biome flags; replace that character class with a Unicode property escape
(e.g. use \p{Cc} to match control characters) and enable the Unicode flag on the
regex (change /.../ to /.../u) so the pattern becomes equivalent but Biome-safe;
update the constant UNSAFE_PATH_PATTERN accordingly and keep the existing checks
for ".." and backslash intact.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: b0a7db1a-117b-435b-866f-42d0b8722f4e
📒 Files selected for processing (20)
.env.exampledesci-server/CENTRALIZED_DATA.mddesci-server/prisma/migrations/20260317105928_add_node_data_grant/migration.sqldesci-server/prisma/migrations/20260317112936_add_granted_by_index/migration.sqldesci-server/prisma/schema.prismadesci-server/scripts/dev-local.shdesci-server/src/controllers/auth/magic.tsdesci-server/src/controllers/data/centralizedTree.tsdesci-server/src/controllers/data/downloadCentralized.tsdesci-server/src/controllers/data/downloadCentralizedZip.tsdesci-server/src/controllers/data/uploadCentralized.tsdesci-server/src/controllers/nodes/dataGrants.tsdesci-server/src/controllers/nodes/draftCreate.tsdesci-server/src/routes/v1/auth.tsdesci-server/src/routes/v1/data.tsdesci-server/src/routes/v1/nodes.tsdesci-server/src/server.tsdesci-server/src/services/centralizedDataAccess.tsdesci-server/src/services/r2.tsdesci-server/src/test/centralizedData.html
🚧 Files skipped from review as they are similar to previous changes (10)
- desci-server/src/services/centralizedDataAccess.ts
- desci-server/CENTRALIZED_DATA.md
- desci-server/prisma/migrations/20260317105928_add_node_data_grant/migration.sql
- desci-server/src/routes/v1/nodes.ts
- desci-server/src/controllers/auth/magic.ts
- desci-server/src/controllers/nodes/dataGrants.ts
- desci-server/src/test/centralizedData.html
- desci-server/src/services/r2.ts
- desci-server/src/routes/v1/auth.ts
- desci-server/prisma/migrations/20260317112936_add_granted_by_index/migration.sql
| @@ -0,0 +1,103 @@ | |||
| #!/bin/bash | |||
| set -e | |||
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -e
echo "Without pipefail:"
bash -lc 'set -e; false | tail -1; echo "pipeline survived"' || true
echo
echo "With pipefail:"
bash -lc 'set -e -o pipefail; false | tail -1; echo "unreachable"' || echo "pipeline stopped as expected"Repository: desci-labs/nodes
Length of output: 355
🏁 Script executed:
cat -n desci-server/scripts/dev-local.sh | head -80Repository: desci-labs/nodes
Length of output: 3382
Prisma failures are currently masked by the tail pipelines.
set -e alone does not stop the script if npx prisma generate or npx prisma migrate deploy fails. When a command fails but is piped to another command, only the final command's exit code matters—tail will exit 0 even if Prisma fails, allowing the script to continue with a broken setup.
Add pipefail to the set options (lines 2 and 67-69):
Make the pipelines fail correctly
-#!/bin/bash
-set -e
+#!/bin/bash
+set -euo pipefailWithout pipefail, a failing Prisma command masked by tail will allow the script to proceed. With pipefail, the script will correctly exit when Prisma fails.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| set -e | |
| #!/bin/bash | |
| set -euo pipefail |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@desci-server/scripts/dev-local.sh` at line 2, The script's error handling is
masking failures because pipelines with tail can return a zero exit status;
update the shell options to enable pipefail so upstream command failures (e.g.,
npx prisma generate and npx prisma migrate deploy) cause the script to exit.
Replace the top-level "set -e" with a form that enables pipefail (e.g., add "-o
pipefail" or use "set -euo pipefail") and also update the later "set" invocation
around the pipelines (the lines near the prisma commands/tail usage) to include
pipefail so the pipeline exit status reflects failures.
| const node = await prisma.node.findFirst({ | ||
| where: { uuid: normalizedUuid }, | ||
| include: { versions: { where: { transactionId: { not: null } }, take: 1 } }, | ||
| }); | ||
|
|
||
| if (!node) { | ||
| return res.status(404).json({ ok: false, message: 'Node not found' }); | ||
| } | ||
|
|
||
| const isPublished = node.versions.length > 0; | ||
| if (!isPublished) { | ||
| const shareId = req.query.shareId as string | undefined; | ||
| const user = (req as any).user; | ||
| const { hasAccess } = await checkCentralizedDataAccess(normalizedUuid, shareId, user); | ||
| if (!hasAccess) { | ||
| return res.status(403).json({ ok: false, message: 'Access denied. This node is unpublished — provide a valid shareId, authenticate as owner, or request a data grant.' }); | ||
| } | ||
| } |
There was a problem hiding this comment.
Move the Prisma lookup and access check under the try.
prisma.node.findFirst() and checkCentralizedDataAccess() can both throw here. Right now those failures skip the catch block entirely and never hit the structured 500 response.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@desci-server/src/controllers/data/centralizedTree.ts` around lines 91 - 108,
The Prisma query prisma.node.findFirst(...) and the access check call
checkCentralizedDataAccess(...) should be executed inside the existing try block
so any thrown errors are caught and handled by the structured 500 response; move
the retrieval of node (using normalizedUuid) and the subsequent isPublished
logic (including reading req.query.shareId and invoking
checkCentralizedDataAccess) into the try scope and ensure that on unexpected
exceptions you return the same res.status(500).json(...) path as other try/catch
branches.
| const pathFilter = req.query.path as string | undefined; | ||
| let filteredObjects = objects; | ||
| if (pathFilter) { | ||
| const filterPrefix = `${normalizedUuid}/${pathFilter}`; | ||
| filteredObjects = objects.filter((obj) => obj.key.startsWith(filterPrefix)); | ||
| } | ||
|
|
||
| const treePrefix = pathFilter ? `${normalizedUuid}/${pathFilter}` : prefix; | ||
| // Ensure treePrefix ends with / | ||
| const normalizedTreePrefix = treePrefix.endsWith('/') ? treePrefix : treePrefix + '/'; | ||
| const tree = buildTreeSimple(filteredObjects, normalizedTreePrefix); |
There was a problem hiding this comment.
Anchor path filters to a directory boundary.
With startsWith(${normalizedUuid}/${pathFilter}), a request for path=foo also pulls in siblings like foobar.txt, and buildTreeSimple() then slices them against foo/, returning mangled paths. Normalize the filter to .../foo/ or add a separate exact-file branch.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@desci-server/src/controllers/data/centralizedTree.ts` around lines 114 - 124,
The path filter currently uses startsWith(`${normalizedUuid}/${pathFilter}`)
which matches partial names (e.g., "foo" matches "foobar.txt"); change the
filtering to anchor to a directory boundary by normalizing the filterPrefix to
include a trailing slash (e.g., `${normalizedUuid}/${pathFilter}/`) when
treating pathFilter as a directory, or add an exact-file branch that checks
equality against the full key when pathFilter refers to a file; update the
computation of filterPrefix, filteredObjects, and
treePrefix/normalizedTreePrefix (the variables and functions involved are
pathFilter, normalizedUuid, filteredObjects, treePrefix, normalizedTreePrefix,
and buildTreeSimple) so buildTreeSimple receives only keys rooted at the correct
directory or the single exact file.
| const sanitizedFileName = fileName | ||
| .replace(/[\x00-\x1f\x7f"\\]/g, '_') | ||
| .replace(/[^\x20-\x7e]/g, '_') |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
find . -name "downloadCentralized.ts" -type fRepository: desci-labs/nodes
Length of output: 118
🏁 Script executed:
cat -n desci-server/src/controllers/data/downloadCentralized.ts | head -60Repository: desci-labs/nodes
Length of output: 2828
🌐 Web query:
Biome noControlCharactersInRegex rule regex control characters
💡 Result:
Biome’s lint/suspicious/noControlCharactersInRegex rule flags control characters (ASCII 0–31, i.e. U+0000–U+001F) inside regex literals because they’re usually accidental/unreadable in JS regexes. (biomejs.dev)
What it disallows
In a regex literal like /.../, it reports:
\x00…\x1F\u0000…\u001F\u{0}…\u{1F}(with theuflag)- Any raw unescaped character in the range U+0000–U+001F (biomejs.dev)
What it allows
Common “control escapes” such as \t and \n are explicitly allowed. (biomejs.dev)
Example
Invalid (flagged):
var pattern = /\x0C/;(biomejs.dev)
Valid:
var pattern = /\n/;(biomejs.dev)
Config (biome.json)
{
"linter": {
"rules": {
"suspicious": {
"noControlCharactersInRegex": "error"
}
}
}
}
``` ([biomejs.dev](https://biomejs.dev/linter/rules/no-control-characters-in-regex/))
Citations:
- 1: https://biomejs.dev/linter/rules/no-control-characters-in-regex/
- 2: https://biomejs.dev/linter/rules/no-control-characters-in-regex/
- 3: https://biomejs.dev/linter/rules/no-control-characters-in-regex/
- 4: https://biomejs.dev/linter/rules/no-control-characters-in-regex/
- 5: https://biomejs.dev/linter/rules/no-control-characters-in-regex/
- 6: https://biomejs.dev/linter/rules/no-control-characters-in-regex/
---
🏁 Script executed:
```shell
find . -name "biome.json" -o -name "biome.jsonc" -o -name ".biomeignore" | head -20Repository: desci-labs/nodes
Length of output: 72
🏁 Script executed:
rg "noControlCharactersInRegex" --type json --type jsoncRepository: desci-labs/nodes
Length of output: 89
🏁 Script executed:
# Let's see what characters the regex is actually trying to filter
# \x00-\x1f = control characters (0-31)
# \x7f = DEL character (127)
# So the first regex removes control chars + DEL, then second removes non-ASCII printable
# The real question: what's the intended behavior and proper fix?
# Let's see if there are other similar patterns in the codebase
rg "\\\\x[0-9a-f]" --type ts --type jsRepository: desci-labs/nodes
Length of output: 1532
🏁 Script executed:
cat biome.jsonRepository: desci-labs/nodes
Length of output: 1834
🏁 Script executed:
# Check if the rule is disabled or configured in any way
cat biome.json | jq '.linter.rules' 2>/dev/null || cat biome.jsonRepository: desci-labs/nodes
Length of output: 1273
Remove control character hex escapes from regex to satisfy Biome linting.
Line 46 uses hex escapes \x00-\x1f which trigger Biome's noControlCharactersInRegex rule. However, converting to unicode escapes (e.g., \u0000-\u001F) won't resolve this—Biome also flags those ranges. Instead, add a biome-ignore directive above the line or refactor the regex to avoid character ranges that include control characters.
Current code
const sanitizedFileName = fileName
.replace(/[\x00-\x1f\x7f"\\]/g, '_')
.replace(/[^\x20-\x7e]/g, '_')
🧰 Tools
🪛 Biome (2.4.7)
[error] 46-46: Unexpected control character in a regular expression.
(lint/suspicious/noControlCharactersInRegex)
[error] 46-46: Unexpected control character in a regular expression.
(lint/suspicious/noControlCharactersInRegex)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@desci-server/src/controllers/data/downloadCentralized.ts` around lines 45 -
47, The regex in the sanitizedFileName computation (the two .replace calls on
fileName) contains control-character hex escapes that trigger Biome's
noControlCharactersInRegex rule; either add a Biome ignore directive immediately
above the offending line (e.g., an inline biome-ignore for
noControlCharactersInRegex) or refactor the sanitization to avoid explicit
control-character ranges (for example, use an explicit allowlist of safe
printable characters or Unicode property classes) so the replace calls no longer
contain \x00-\x1f or similar ranges; update the code around
sanitizedFileName/fileName accordingly.
| const { hasAccess } = await checkCentralizedDataAccess(normalizedUuid, shareId, user); | ||
|
|
||
| if (!hasAccess) { | ||
| res.status(403).json({ ok: false, message: 'Access denied. Provide a valid shareId, authenticate as the node owner, or request a data grant.' }); | ||
| return; | ||
| } | ||
|
|
||
| try { |
There was a problem hiding this comment.
Wrap the access check in the zip handler's try.
checkCentralizedDataAccess() is awaited before the error boundary, so a thrown auth/data-grant lookup bypasses the controller's catch path.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@desci-server/src/controllers/data/downloadCentralizedZip.ts` around lines 29
- 36, Move the await checkCentralizedDataAccess(normalizedUuid, shareId, user)
call inside the existing try block so any exceptions from auth/data-grant lookup
are caught by the controller's catch; specifically, wrap the invocation of
checkCentralizedDataAccess (and the subsequent hasAccess check that sends the
403 via res.status(403).json(...)) inside the try in downloadCentralizedZip,
preserving the existing 403 response when hasAccess is false and ensuring thrown
errors flow to the catch handler.
| const sanitizedZipName = zipName | ||
| .replace(/[\x00-\x1f\x7f"\\]/g, '_') | ||
| .replace(/[^\x20-\x7e]/g, '_') |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's examine the file and the specific lines mentioned
head -60 desci-server/src/controllers/data/downloadCentralizedZip.ts | tail -20Repository: desci-labs/nodes
Length of output: 927
🏁 Script executed:
# Get more context around lines 48-50
sed -n '40,60p' desci-server/src/controllers/data/downloadCentralizedZip.tsRepository: desci-labs/nodes
Length of output: 959
🏁 Script executed:
# Check if Biome is configured in the project and what rules are enabled
find . -name "biome.json*" -o -name ".biomerc*" | head -5Repository: desci-labs/nodes
Length of output: 72
🏁 Script executed:
# Search for other centralized-data handlers mentioned
fd "centralized" desci-server/src/controllers/data/ -type fRepository: desci-labs/nodes
Length of output: 230
🏁 Script executed:
# Look for similar regex patterns in centralized-data handlers
rg "noControlCharacters" --type ts --type jsRepository: desci-labs/nodes
Length of output: 42
🏁 Script executed:
# Check biome.json configuration
cat biome.jsonRepository: desci-labs/nodes
Length of output: 1834
🏁 Script executed:
# Find other handlers in the data directory
ls -la desci-server/src/controllers/data/Repository: desci-labs/nodes
Length of output: 1134
🏁 Script executed:
# Search for similar regex patterns with control characters in other files
rg "\\\\x00-\\\\x1f" desci-server/src/controllers/Repository: desci-labs/nodes
Length of output: 372
🏁 Script executed:
# Look for noControlCharacters rule mentions in biome config or docs
rg "noControlCharacters" .Repository: desci-labs/nodes
Length of output: 42
🏁 Script executed:
# Let's check if there are any linter errors or configuration that might indicate the issue
# First, check the entire downloadCentralizedZip.ts file
cat desci-server/src/controllers/data/downloadCentralizedZip.tsRepository: desci-labs/nodes
Length of output: 3199
🏁 Script executed:
# Check downloadCentralized.ts to see the comparison
cat desci-server/src/controllers/data/downloadCentralized.tsRepository: desci-labs/nodes
Length of output: 2998
🏁 Script executed:
# Search for any comments or issues about regex/control characters in the repo
rg -i "control.?char|biome.*regex|regex.*biome" desci-server/Repository: desci-labs/nodes
Length of output: 396
🌐 Web query:
Biome noControlCharactersInRegex rule regex security
💡 Result:
Biome’s lint/suspicious/noControlCharactersInRegex rule flags control characters (ASCII 0–31 / U+0000–U+001F) in regular-expression literals, because they’re hidden/unusual and are usually a mistake or accidental copy/paste. It disallows: \x00–\x1F, \u0000–\u001F, \u{0}–\u{1F} (with u), and raw unescaped control chars in the regex. It allows common “control escapes” like \t and \n. [1]
Security relevance: control characters are invisible and can obfuscate a pattern, making reviews/audits miss what the regex really matches (and potentially enabling filter/validation bypasses). This is why both Biome and ESLint treat them as “probably a mistake” and recommend the rule by default. [1][2]
How to configure (Biome):
{
"linter": {
"rules": {
"suspicious": {
"noControlCharactersInRegex": "error"
}
}
}
}[1]
Equivalent ESLint rule: no-control-regex. [1][2]
Sources
- Biome rule docs:
noControlCharactersInRegexhttps://biomejs.dev/linter/rules/no-control-characters-in-regex/ [1] - ESLint rule docs:
no-control-regexhttps://eslint.org/docs/latest/rules/no-control-regex [2]
Biome will fail on the noControlCharactersInRegex rule at lines 49–50.
The regex patterns /[\x00-\x1f\x7f"\\]/g and /[^\x20-\x7e]/g contain control character escapes that Biome's recommended linting rules disallow. Since biome.json enables "recommended": true, this file will not lint cleanly without fixing these patterns. The same issue exists in downloadCentralized.ts.
🧰 Tools
🪛 Biome (2.4.7)
[error] 49-49: Unexpected control character in a regular expression.
(lint/suspicious/noControlCharactersInRegex)
[error] 49-49: Unexpected control character in a regular expression.
(lint/suspicious/noControlCharactersInRegex)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@desci-server/src/controllers/data/downloadCentralizedZip.ts` around lines 48
- 50, The regexes use control-character hex escapes that violate Biome's
noControlCharactersInRegex rule; update the sanitize logic (the sanitizedZipName
assignment that reads zipName...) to use Unicode escapes instead, e.g. replace
/[\x00-\x1f\x7f"\\]/g with /[\u0000-\u001F\u007F"\\]/g and /[^\x20-\x7e]/g with
/[^\u0020-\u007E]/g, and make the same change in downloadCentralized.ts so the
file lints cleanly under the recommended Biome rules.
| for (const obj of objects) { | ||
| const relativePath = obj.key.slice(prefix.length); | ||
| if (!relativePath) continue; | ||
|
|
||
| try { | ||
| const { stream } = await getStreamFromR2(obj.key); | ||
| archive.append(stream, { name: relativePath }); |
There was a problem hiding this comment.
Don't trust stored R2 keys as ZIP entry names.
relativePath is copied straight into the archive. If the bucket already contains a key with ../, a leading slash, backslashes, or control characters, this endpoint will generate a traversal entry inside the ZIP.
Guard the archive entry name before appending
for (const obj of objects) {
const relativePath = obj.key.slice(prefix.length);
if (!relativePath) continue;
+ if (
+ relativePath.startsWith('/') ||
+ relativePath.includes('\\') ||
+ /(^|\/)\.\.(\/|$)|[\u0000-\u001f\u007f]/.test(relativePath)
+ ) {
+ logger.warn({ key: obj.key }, 'Skipping unsafe zip entry name');
+ continue;
+ }
try {
const { stream } = await getStreamFromR2(obj.key);📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| for (const obj of objects) { | |
| const relativePath = obj.key.slice(prefix.length); | |
| if (!relativePath) continue; | |
| try { | |
| const { stream } = await getStreamFromR2(obj.key); | |
| archive.append(stream, { name: relativePath }); | |
| for (const obj of objects) { | |
| const relativePath = obj.key.slice(prefix.length); | |
| if (!relativePath) continue; | |
| if ( | |
| relativePath.startsWith('/') || | |
| relativePath.includes('\\') || | |
| /(^|\/)\.\.(\/|$)|[\u0000-\u001f\u007f]/.test(relativePath) | |
| ) { | |
| logger.warn({ key: obj.key }, 'Skipping unsafe zip entry name'); | |
| continue; | |
| } | |
| try { | |
| const { stream } = await getStreamFromR2(obj.key); | |
| archive.append(stream, { name: relativePath }); |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@desci-server/src/controllers/data/downloadCentralizedZip.ts` around lines 68
- 74, The code currently appends untrusted relativePath into archive (see
variables relativePath and archive.append inside the loop that iterates objects
and calls getStreamFromR2), so sanitize and validate the entry name before
appending: convert backslashes to forward slashes, run path.posix.normalize on
the result, strip any leading slashes, remove control characters (e.g.
non-printable chars via a regex), then reject or skip the entry if normalize
produces empty string or contains any ".." path segment (or starts with "..") to
prevent traversal; only after these checks use the sanitized name when calling
archive.append(stream, { name: safeName }). Ensure this logic lives just before
archive.append in downloadCentralizedZip.ts.
| try { | ||
| const manifestResult = await makeManifest({ | ||
| title, | ||
| researchFields, | ||
| defaultLicense: defaultLicense || '', | ||
| ipfsNode: getNodeToUse(isGuest), | ||
| }); | ||
| researchObject = manifestResult.researchObject; | ||
| const ipfsResult = await addBufferToIpfs(manifestResult.manifest, '', getNodeToUse(isGuest)); | ||
| hash = ipfsResult.cid; | ||
| uri = `${hash}`; | ||
| } catch (ipfsErr) { | ||
| logger.warn({ err: ipfsErr }, 'IPFS unavailable, creating node without manifest'); | ||
| } |
There was a problem hiding this comment.
Separate manifest building from the IPFS-availability fallback.
This catch wraps both makeManifest() and addBufferToIpfs(). If manifest construction throws because of invalid input or a local regression, the request still creates the node and logs it as an IPFS outage. Only the external IPFS write should be fail-soft here.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@desci-server/src/controllers/nodes/draftCreate.ts` around lines 39 - 52, The
current try/catch wraps both makeManifest and addBufferToIpfs so manifest
construction errors are misclassified as IPFS outages; split the logic so
makeManifest(...) is called first and allowed to throw (so invalid
input/regressions surface) and then call addBufferToIpfs(...) inside its own
try/catch. Concretely: call makeManifest({ title, researchFields,
defaultLicense: defaultLicense || '', ipfsNode: getNodeToUse(isGuest) }) and
assign researchObject from manifestResult; then in a separate try block call
addBufferToIpfs(manifestResult.manifest, '', getNodeToUse(isGuest)) and on
failure catch only IPFS errors (log via logger.warn with err) and leave hash/uri
unset/default. Ensure references to makeManifest, manifestResult,
addBufferToIpfs, researchObject, hash, uri and getNodeToUse(isGuest) are updated
accordingly.
| router.post('/google/import', [ensureUser, ensureNodeAccess], googleImport); | ||
|
|
||
| // Centralized data (R2) routes | ||
| router.post('/uploadCentralized', [ensureUser, centralizedUploadHandler, ensureNodeAccess], uploadCentralized); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check if centralizedUploadHandler has any cleanup logic on error
ast-grep --pattern $'centralizedUploadHandler'
# Look for Multer configuration and temp file handling
rg -n -C5 'multer' --type=ts -g '!node_modules'Repository: desci-labs/nodes
Length of output: 17246
🏁 Script executed:
cat -n desci-server/src/controllers/data/uploadCentralized.tsRepository: desci-labs/nodes
Length of output: 5286
🏁 Script executed:
rg -n 'ensureNodeAccess' --type=ts -A 10 | head -100Repository: desci-labs/nodes
Length of output: 7692
🏁 Script executed:
rg -n 'fs.unlink|fs.rm|cleanup|unlinkSync' desci-server/src/controllers/data/uploadCentralized.tsRepository: desci-labs/nodes
Length of output: 42
🏁 Script executed:
sed -n '51,120p' desci-server/src/middleware/authorisation.tsRepository: desci-labs/nodes
Length of output: 2245
Multer runs before authorization — orphan temp files on rejection.
The middleware order [ensureUser, centralizedUploadHandler, ensureNodeAccess] means Multer writes uploaded files to disk before ensureNodeAccess verifies the user owns the node. If ensureNodeAccess rejects the request (missing uuid on line 67, or unauthorized on line 78 of authorisation.ts), the temp files written to os.tmpdir() remain on disk since uploadCentralized is never reached to clean them.
Consider either:
- Moving
ensureNodeAccessbeforecentralizedUploadHandler, or - Adding cleanup logic in
ensureNodeAccessto remove orphan uploads before returning an error.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@desci-server/src/routes/v1/data.ts` at line 32, The current middleware order
runs Multer (centralizedUploadHandler) before authorization (ensureNodeAccess),
causing orphan temp files on rejection; either move ensureNodeAccess to run
before centralizedUploadHandler in the router registration (so Multer never
writes unless authorized), or add cleanup logic inside ensureNodeAccess to
delete any uploaded temp files (inspect req.file/req.files or the Multer temp
paths and unlink them) before returning the missing-uuid or unauthorized errors
referenced in authorisation.ts (lines referenced around the missing uuid check
and unauthorized path). Ensure uploadCentralized remains the final handler and
that cleanup uses the same temp file locations Multer writes to.
Before the deployed dev environment will work, we need to add these to Vault at secrets/desci-server/dev/app:
Once deployed, the test UI will be at:
https://nodes-api-dev.desci.com/test/centralizedData.html
Summary by CodeRabbit
New Features
Tests
Chores
Documentation
Bug Fixes