Conversation
WalkthroughThis pull request migrates URL parsing logic from Node.js legacy Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Comment |
|
|
||
| urlParts.search = "?host=" + encodeURIComponent(urlParts.host || ""); | ||
| urlParts.host = null as any; | ||
| const originalHost = urlObj.host; |
There was a problem hiding this comment.
Bug: originalHost is captured from urlObj.host after urlObj.hostname and urlObj.port are modified, resulting in the new host being captured instead of the original one.
Severity: HIGH
Suggested Fix
Capture the original host into a variable immediately after the URL object is created with new URL(urlRemote), before any properties like hostname or port are modified.
Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.
Location: src/Cache.ts#L584
Potential issue: In the `forceRedirect` function, `urlObj.hostname` and `urlObj.port`
are modified based on `options`. Due to the dynamic nature of the WHATWG `URL` API, the
`urlObj.host` property automatically reflects these changes. The code captures
`urlObj.host` into the `originalHost` variable *after* these modifications have
occurred. This causes `originalHost` to contain the new, modified host instead of the
original one. This incorrect value is then used in the `?host=` query parameter,
breaking downstream logic that depends on receiving the original host.
Did we get this right? 👍 / 👎 to inform future reviews.
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/Address.ts`:
- Around line 8-35: The fallback path currently uses new URL(urlRemote,
"http://dummy") which injects a dummy host and declares origin with var; instead
detect scheme-less inputs and normalize before parsing: if urlRemote does not
start with "http://", "https://", or "//" prepend "http://" (or the desired
default scheme) and then construct new URL(normalized); also replace
function-scoped var origin with let/const and keep the same origin-building
logic (use urlObj.host and urlObj.pathname) so the host is not replaced by a
dummy value and cache keys stay stable.
In `@src/Cache.ts`:
- Around line 572-586: The code captures originalHost after mutating
urlObj.hostname/port, causing the ?host query to contain the forced host; move
the capture of originalHost to immediately after urlObj is created (before any
changes to urlObj.hostname or urlObj.port) so it holds the true original remote
host, and remove the redundant manual host reconstruction that follows (relying
on the WHATWG URL API to keep urlObj.host consistent when hostname/port are
changed); update references around urlObj, originalHost, options.forceHost and
options.forcePort accordingly.
In `@test/serve.ts`:
- Around line 88-89: Wrap the URL construction that creates urlObj (const urlObj
= new URL(req.url || "/", `http://${req.headers.host || "localhost"}`)) in a
try-catch to guard against malformed req.url or req.headers.host; on error call
reportError with a 400 response and a brief message and return immediately so
parseArgs is never called on an invalid urlObj. Ensure the catch captures the
thrown error (e) and includes enough context in the reportError call to identify
the bad URL/host and stops further request handling.
- Around line 107-109: Assigning undefined to URL.host (variable urlObj.host)
makes the WHATWG URL stringify it as "undefined" and corrupts the cache lookup;
change the assignment to use a safe empty-string fallback (use host || "" when
setting urlObj.host) so missing host won't produce the literal "undefined" and
will keep the cache key consistent for the subsequent cache lookup that uses
urlObj.
| try { | ||
| const urlObj = new URL(urlRemote); | ||
| var origin = urlObj.host || ""; | ||
|
|
||
| if ((urlParts.pathname || "").charAt(0) != "/") origin += "/"; | ||
| if ((urlObj.pathname || "").charAt(0) != "/") origin += "/"; | ||
|
|
||
| origin += urlParts.pathname; | ||
| return [ | ||
| urlParts.protocol || "http:", | ||
| "//", | ||
| url.resolve("", origin), | ||
| urlParts.search || "" | ||
| ].join(""); | ||
| origin += urlObj.pathname; | ||
| return [ | ||
| urlObj.protocol || "http:", | ||
| "//", | ||
| url.resolve("", origin), | ||
| urlObj.search || "" | ||
| ].join(""); | ||
| } catch (e) { | ||
| // If URL is invalid, try to construct it with a base | ||
| const urlObj = new URL(urlRemote, "http://dummy"); | ||
| var origin = urlObj.host || ""; | ||
|
|
||
| if ((urlObj.pathname || "").charAt(0) != "/") origin += "/"; | ||
|
|
||
| origin += urlObj.pathname; | ||
| return [ | ||
| urlObj.protocol || "http:", | ||
| "//", | ||
| url.resolve("", origin), | ||
| urlObj.search || "" | ||
| ].join(""); | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
find . -name "Address.ts" -o -name "address.ts" | head -20Repository: loanlink-nl/cget
Length of output: 76
🏁 Script executed:
cat -n src/Address.ts | head -50Repository: loanlink-nl/cget
Length of output: 1741
🏁 Script executed:
rg -t ts -t js "sanitizeUrl" --max-count=20 -B 2 -A 2Repository: loanlink-nl/cget
Length of output: 1104
🏁 Script executed:
fd -e test.ts -e test.js -e spec.ts -e spec.js | head -20Repository: loanlink-nl/cget
Length of output: 97
🏁 Script executed:
cat -n src/Address.test.ts | head -100Repository: loanlink-nl/cget
Length of output: 3253
🏁 Script executed:
cat -n src/Address.test.ts | tail -100Repository: loanlink-nl/cget
Length of output: 3535
🏁 Script executed:
rg "new URL" -B 3 -A 3 -t tsRepository: loanlink-nl/cget
Length of output: 1430
🏁 Script executed:
sed -n '38,150p' src/Address.tsRepository: loanlink-nl/cget
Length of output: 2349
Fallback path injects dummy host into sanitized URLs, corrupting URL identity.
Lines 23-35 use new URL(urlRemote, "http://dummy") as a fallback for invalid URLs. When triggered (e.g., scheme-less input like "www.example.org"), the URL object's host becomes "dummy" and the input becomes a pathname. This corrupts the URL identity, breaking cache keys and remote fetch operations. The block also redeclares origin at line 24 using function-scoped var.
Suggested fix
export function sanitizeUrl(urlRemote: string) {
+ const toSanitized = (urlObj: URL, stripDummyHost = false) => {
+ let origin =
+ stripDummyHost && urlObj.host === "dummy" ? "" : urlObj.host || "";
+
+ if ((urlObj.pathname || "").charAt(0) != "/") origin += "/";
+ origin += urlObj.pathname;
+
+ return [
+ urlObj.protocol || "http:",
+ "//",
+ url.resolve("", origin),
+ urlObj.search || ""
+ ].join("");
+ };
+
try {
- const urlObj = new URL(urlRemote);
- var origin = urlObj.host || "";
-
- if ((urlObj.pathname || "").charAt(0) != "/") origin += "/";
-
- origin += urlObj.pathname;
- return [
- urlObj.protocol || "http:",
- "//",
- url.resolve("", origin),
- urlObj.search || ""
- ].join("");
+ return toSanitized(new URL(urlRemote));
} catch (e) {
- // If URL is invalid, try to construct it with a base
- const urlObj = new URL(urlRemote, "http://dummy");
- var origin = urlObj.host || "";
-
- if ((urlObj.pathname || "").charAt(0) != "/") origin += "/";
-
- origin += urlObj.pathname;
- return [
- urlObj.protocol || "http:",
- "//",
- url.resolve("", origin),
- urlObj.search || ""
- ].join("");
+ // If URL is invalid, try to construct it with a base.
+ return toSanitized(new URL(urlRemote, "http://dummy"), true);
}
}🧰 Tools
🪛 Biome (2.4.4)
[error] 24-24: Shouldn't redeclare 'origin'. Consider to delete it or rename it.
(lint/suspicious/noRedeclare)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/Address.ts` around lines 8 - 35, The fallback path currently uses new
URL(urlRemote, "http://dummy") which injects a dummy host and declares origin
with var; instead detect scheme-less inputs and normalize before parsing: if
urlRemote does not start with "http://", "https://", or "//" prepend "http://"
(or the desired default scheme) and then construct new URL(normalized); also
replace function-scoped var origin with let/const and keep the same
origin-building logic (use urlObj.host and urlObj.pathname) so the host is not
replaced by a dummy value and cache keys stay stable.
| if (options.forceHost && urlObj.hostname != options.forceHost) { | ||
| urlObj.hostname = options.forceHost; | ||
| changed = true; | ||
| } | ||
|
|
||
| if (options.forcePort && urlParts.port != "" + options.forcePort) { | ||
| urlParts.port = "" + options.forcePort; | ||
| if (options.forcePort && urlObj.port != "" + options.forcePort) { | ||
| urlObj.port = "" + options.forcePort; | ||
| changed = true; | ||
| } | ||
|
|
||
| if (!changed) return urlRemote; | ||
|
|
||
| urlParts.search = "?host=" + encodeURIComponent(urlParts.host || ""); | ||
| urlParts.host = null as any; | ||
| const originalHost = urlObj.host; | ||
| urlObj.search = "?host=" + encodeURIComponent(originalHost || ""); | ||
| // Remove the host to use the modified hostname/port |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cat -n src/Cache.ts | sed -n '560,600p'Repository: loanlink-nl/cget
Length of output: 1463
🏁 Script executed:
cat > /tmp/test_url.js << 'EOF'
// Test WHATWG URL API behavior
const url = new URL("http://original.example.com:8080/path");
console.log("Original host:", url.host);
console.log("Original hostname:", url.hostname);
console.log("Original port:", url.port);
// Mutate hostname
url.hostname = "forced.example.com";
console.log("\nAfter hostname mutation:");
console.log("host:", url.host);
console.log("hostname:", url.hostname);
console.log("port:", url.port);
// Mutate port
url.port = "9090";
console.log("\nAfter port mutation:");
console.log("host:", url.host);
console.log("hostname:", url.hostname);
console.log("port:", url.port);
EOF
node /tmp/test_url.jsRepository: loanlink-nl/cget
Length of output: 348
Capture the original host before rewriting hostname/port.
At line 584, originalHost is captured after mutations at lines 572-580, so ?host= contains the forced host instead of the original remote host. Move the capture to before the mutations (after line 567).
Additionally, line 587's manual host reconstruction is redundant—the WHATWG URL API automatically updates the host property when hostname or port change. Remove it.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/Cache.ts` around lines 572 - 586, The code captures originalHost after
mutating urlObj.hostname/port, causing the ?host query to contain the forced
host; move the capture of originalHost to immediately after urlObj is created
(before any changes to urlObj.hostname or urlObj.port) so it holds the true
original remote host, and remove the redundant manual host reconstruction that
follows (relying on the WHATWG URL API to keep urlObj.host consistent when
hostname/port are changed); update references around urlObj, originalHost,
options.forceHost and options.forcePort accordingly.
| const urlObj = new URL(req.url || "/", `http://${req.headers.host || "localhost"}`); | ||
| var args = parseArgs(urlObj.search.substring(1)); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cat -n test/serve.ts | sed -n '80,120p'Repository: loanlink-nl/cget
Length of output: 1486
🏁 Script executed:
grep -n "reportError" test/serve.tsRepository: loanlink-nl/cget
Length of output: 198
🏁 Script executed:
grep -n "function.*reportError\|const reportError\|var reportError" test/serve.tsRepository: loanlink-nl/cget
Length of output: 84
🏁 Script executed:
cat -n test/serve.ts | sed -n '47,60p'Repository: loanlink-nl/cget
Length of output: 454
Guard URL construction to prevent unhandled exceptions from crashing the test server.
At line 88, new URL() can throw on malformed req.url or req.headers.host values. Wrap the call in a try-catch block and return a 400 response via reportError().
Proposed fix
- const urlObj = new URL(req.url || "/", `http://${req.headers.host || "localhost"}`);
- var args = parseArgs(urlObj.search.substring(1));
+ let urlObj: URL;
+ try {
+ urlObj = new URL(req.url || "/", `http://${req.headers.host || "localhost"}`);
+ } catch {
+ reportError(res, 400);
+ return;
+ }
+ var args = parseArgs(urlObj.search.substring(1));📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const urlObj = new URL(req.url || "/", `http://${req.headers.host || "localhost"}`); | |
| var args = parseArgs(urlObj.search.substring(1)); | |
| let urlObj: URL; | |
| try { | |
| urlObj = new URL(req.url || "/", `http://${req.headers.host || "localhost"}`); | |
| } catch { | |
| reportError(res, 400); | |
| return; | |
| } | |
| var args = parseArgs(urlObj.search.substring(1)); |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@test/serve.ts` around lines 88 - 89, Wrap the URL construction that creates
urlObj (const urlObj = new URL(req.url || "/", `http://${req.headers.host ||
"localhost"}`)) in a try-catch to guard against malformed req.url or
req.headers.host; on error call reportError with a 400 response and a brief
message and return immediately so parseArgs is never called on an invalid
urlObj. Ensure the catch captures the thrown error (e) and includes enough
context in the reportError call to identify the bad URL/host and stops further
request handling.
| urlObj.protocol = "http"; | ||
| urlObj.search = ""; | ||
| urlObj.host = host; |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cat -n test/serve.ts | sed -n '100,120p'Repository: loanlink-nl/cget
Length of output: 710
🏁 Script executed:
cat -n test/serve.ts | sed -n '70,110p'Repository: loanlink-nl/cget
Length of output: 1534
🌐 Web query:
WHATWG URL host property undefined assignment behavior
💡 Result:
Assigning undefined to URL.prototype.host does not clear the host.
URL.hostis an IDLUSVString, so the JS valueundefinedis first converted via Web IDL string conversion (i.e., effectivelyToString(undefined)→"undefined"), then to a scalar-value string. [2]- The
hostsetter then runs Basic URL parse on that resulting string, using the current URL record and the host state. [1]- So
url.host = undefinedbehaves the same asurl.host = "undefined"(it attempts to set the host to the domain"undefined"). [1][2]
- So
- If the URL has an opaque path (e.g.,
mailto:), thehostsetter just returns without changing anything. [1]
Assigning undefined to URL.host produces "undefined" as the literal host string.
When host is missing but the guard allows localhost connections, assigning undefined to urlObj.host at line 109 causes WHATWG URL to convert it to the string "undefined" (as per IDL USVString conversion rules). This corrupts the URL used in the cache lookup at line 112.
urlObj.protocol = "http";
urlObj.search = "";
- urlObj.host = host;
+ urlObj.host = host || "";Use host || "" to prevent cache key corruption.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| urlObj.protocol = "http"; | |
| urlObj.search = ""; | |
| urlObj.host = host; | |
| urlObj.protocol = "http"; | |
| urlObj.search = ""; | |
| urlObj.host = host || ""; |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@test/serve.ts` around lines 107 - 109, Assigning undefined to URL.host
(variable urlObj.host) makes the WHATWG URL stringify it as "undefined" and
corrupts the cache lookup; change the assignment to use a safe empty-string
fallback (use host || "" when setting urlObj.host) so missing host won't produce
the literal "undefined" and will keep the cache key consistent for the
subsequent cache lookup that uses urlObj.
| const originalHost = urlObj.host; | ||
| urlObj.search = "?host=" + encodeURIComponent(originalHost || ""); |
There was a problem hiding this comment.
🔴 originalHost captured after hostname/port mutation, preserving the new values instead of the original
In forceRedirect, the originalHost variable is captured on line 584 after urlObj.hostname and urlObj.port have already been mutated (lines 573, 578). This means originalHost contains the modified host (e.g., localhost:12345) instead of the original host (e.g., example.com). The ?host= query parameter is then set to the wrong value.
Root Cause and Impact
With url.parse, the old code saved the original host before setting host = null:
urlParts.search = "?host=" + encodeURIComponent(urlParts.host || ""); // original host
urlParts.host = null;In the new code, the WHATWG URL object updates .host reactively when .hostname or .port are set. So by the time line 584 executes:
const originalHost = urlObj.host; // ALREADY 'localhost:12345', not 'example.com'This breaks the test infrastructure's forceHost/forcePort mechanism, which relies on passing the original host in the ?host= query parameter so the test server knows which virtual host was originally requested.
Impact: The ?host= parameter will always contain the forced (new) host instead of the original host, breaking all redirect-based test scenarios.
Prompt for agents
In src/Cache.ts, in the forceRedirect method, the original host must be captured BEFORE mutating hostname and port. Move the line `const originalHost = urlObj.host;` to before the hostname/port modification block (i.e., before line 572). Specifically, insert `const originalHost = urlObj.host;` right after line 568 (`var changed = false;`), and remove line 584 (`const originalHost = urlObj.host;`).
Was this helpful? React with 👍 or 👎 to provide feedback.
| const urlObj = new URL(urlRemote, "http://dummy"); | ||
| var origin = urlObj.host || ""; | ||
|
|
||
| if ((urlObj.pathname || "").charAt(0) != "/") origin += "/"; | ||
|
|
||
| origin += urlObj.pathname; | ||
| return [ | ||
| urlObj.protocol || "http:", | ||
| "//", | ||
| url.resolve("", origin), | ||
| urlObj.search || "" | ||
| ].join(""); |
There was a problem hiding this comment.
🔴 sanitizeUrl catch branch leaks dummy host into output for protocol-less URLs
When sanitizeUrl receives a URL without a protocol (e.g., example.com/path), new URL() throws on line 9, and the catch branch on line 23 uses new URL(urlRemote, "http://dummy"). This resolves the input relative to the dummy base, causing urlObj.host to be "dummy" and the pathname to be /example.com/path.
Root Cause and Impact
The old code used url.parse(urlRemote, false, true) where the third argument (true) enabled slashesDenotingHost mode. For input example.com/path, this correctly parsed it as host: null, pathname: 'example.com/path', producing http:///example.com/path.
The new catch branch with new URL('example.com/path', 'http://dummy') instead produces:
host: 'dummy'pathname: '/example.com/path'- Final result:
http://dummy/example.com/path
This means any protocol-less URL passed to sanitizeUrl will have the fake host dummy injected into the output. While the test suite primarily tests full URLs with protocols (so existing tests pass), this is a behavioral regression for any real-world usage with bare hostnames or protocol-relative URLs like //example.com/path where the host is correctly handled but the protocol is inherited from the dummy base.
Impact: Protocol-less URLs (if any are used in practice) will be incorrectly resolved against a fake dummy host, producing invalid URLs.
Prompt for agents
In src/Address.ts, the catch branch of sanitizeUrl (lines 21-35) uses `new URL(urlRemote, 'http://dummy')` as a fallback, but this causes the dummy host to leak into the output for protocol-less inputs like 'example.com/path'. To replicate the old url.parse(urlRemote, false, true) behavior, you need to handle protocol-less URLs differently. One approach: manually prepend 'http://' to the input and re-parse with `new URL('http://' + urlRemote)` if the input doesn't start with '//', or use `new URL('http:' + urlRemote)` for protocol-relative URLs starting with '//'. This preserves the original host parsing semantics without injecting a dummy host.
Was this helpful? React with 👍 or 👎 to provide feedback.
| urlParts.host = host; | ||
| urlObj.protocol = "http"; | ||
| urlObj.search = ""; | ||
| urlObj.host = host; |
There was a problem hiding this comment.
🔴 Setting urlObj.host = undefined converts host to string "undefined" in serve.ts
In serve.ts line 109, urlObj.host = host is executed where host comes from args["host"] which may be undefined (when there's no ?host= query parameter). With the WHATWG URL API, setting .host to undefined coerces it to the string "undefined", producing URLs like http://undefined:12345/path.
Root Cause and Impact
The old code used urlParts.host = host with url.parse's result object. When host was undefined or null, url.format() would fall back to hostname and port properties, effectively keeping the original host.
With the WHATWG URL object, the .host setter converts any value to a string. So urlObj.host = undefined sets the host to literally "undefined". This happens in the localhost testing flow where no ?host= parameter is present but the request passes the localhost kludge check (lines 97-105).
The resulting URL http://undefined:12345/example.invalid/index.html is passed to new Address(...) which will fail to resolve to any valid cache path.
Impact: The test server will fail to serve any requests from localhost when no ?host= query parameter is provided.
| urlObj.host = host; | |
| if (host) urlObj.host = host; |
Was this helpful? React with 👍 or 👎 to provide feedback.
Summary by CodeRabbit