-
Notifications
You must be signed in to change notification settings - Fork 1
scanCsapiLinks() stores server-returned href values without URL scheme validation — defense-in-depth gap in helpers.ts #147
Description
Finding
scanCsapiLinks() stores server-returned href values directly into the resourceUrls map without validating the URL scheme. A malicious or misconfigured server returning href: "javascript:...", href: "data:...", or href: "//evil.com/systems" would cause the library to construct all CSAPI query URLs against those dangerous bases.
Review Source: Senior developer code review of clean-pr — docs/code-review/011-pending-p3-server-href-scheme-validation.md
Severity: P3-Minor
Category: Security / Defense-in-Depth
Ownership: Ours
Problem Statement
scanCsapiLinks() in src/ogc-api/csapi/helpers.ts (lines 129–171) scans server-returned link objects for CSAPI resource URLs using three conventions (ogc-cs: prefix, plain resource name, items with resource href). At each of the three storage points, the href value is placed directly into the result map without any URL scheme validation:
Affected code — scanCsapiLinks() (lines 129–171):
export function scanCsapiLinks(
links: Array<{ rel?: string; href?: string }>
): Map<string, string> {
const result = new Map<string, string>();
// ...
for (const link of links) {
const rel = link.rel;
const href = link.href;
// ...
// Convention 1: ogc-cs: prefixed
const match = rel.match(/^ogc-cs:(.+)$/);
if (match) {
result.set(match[1], typeof href === 'string' ? href : ''); // ← no scheme check
continue;
}
// Convention 2: plain resource name
if (knownTypes.has(rel)) {
result.set(rel, typeof href === 'string' ? href : ''); // ← no scheme check
continue;
}
// Convention 3: rel: "items" with resource type in href
if (rel === 'items' && typeof href === 'string') {
// ...normalize...
if (normalized && knownTypes.has(normalized)) {
result.set(normalized, href); // ← no scheme check
}
}
}
return result;
}Data flow: The returned map flows directly to CSAPIQueryBuilder via factory.ts → createCSAPIBuilder(), where it becomes the base URL for all 80+ query methods via buildResourceUrl(). Every constructed URL inherits whatever scheme was in the original href.
Scenario:
// A malicious/misconfigured server returns:
const links = [
{ rel: "ogc-cs:systems", href: "javascript:alert('xss')" },
{ rel: "datastreams", href: "data:text/html,<script>..." },
{ rel: "items", href: "//evil.com/observations" }
];
const resourceUrls = scanCsapiLinks(links);
// resourceUrls.get('systems') === "javascript:alert('xss')"
// This becomes the base URL for getSystems(), getSystemById(), etc.Impact: The risk is bounded — this library constructs URLs but does not fetch them itself. The real risk surfaces in consuming applications:
- SSRF if a server-side consumer follows the constructed URL
- XSS if a browser consumer renders the URL in a link
hreforsrcattribute - Data exfiltration if a
//evil.comprotocol-relative URL is used for API calls
Relative URLs are safe — they resolve against the trusted base URL that the consumer originally provided. Only absolute non-HTTP(S) URLs are dangerous.
Ownership Verification
$ git log upstream/main --oneline -- src/ogc-api/csapi/helpers.ts
(empty — zero commits on upstream/main)
$ git blame --line-porcelain src/ogc-api/csapi/helpers.ts | Select-String "^author " | Group-Object
Count: 225, Name: "author Sam-Bolling"
Conclusion: This code is ours. The file src/ogc-api/csapi/helpers.ts does not exist on upstream/main — the entire file is our contribution, with 100% sole authorship. scanCsapiLinks() was written from scratch in Phase 2.
Files to Modify
| File | Action | Est. Lines | Purpose |
|---|---|---|---|
src/ogc-api/csapi/helpers.ts |
Modify | ~15 | Add isTrustedHref() helper; apply filter before each result.set() in scanCsapiLinks() |
src/ogc-api/csapi/helpers.spec.ts |
Modify | ~30 | Add test cases for javascript:, data:, protocol-relative, and valid HTTP(S) scheme handling |
Proposed Solutions
Option A: Filter on isTrustedHref before storing (Recommended)
/**
* Returns true if the href is safe to use as a resource URL base.
* Relative URLs are safe (they resolve against the trusted base).
* Absolute URLs must use http: or https: schemes.
*/
function isTrustedHref(href: string): boolean {
try {
const url = new URL(href);
return url.protocol === 'https:' || url.protocol === 'http:';
} catch {
return true; // relative URL — safe, resolves against trusted base
}
}Applied in scanCsapiLinks():
// Convention 1: ogc-cs: prefixed
if (match) {
const value = typeof href === 'string' ? href : '';
if (isTrustedHref(value)) result.set(match[1], value);
continue;
}
// Convention 2: plain resource name
if (knownTypes.has(rel)) {
const value = typeof href === 'string' ? href : '';
if (isTrustedHref(value)) result.set(rel, value);
continue;
}
// Convention 3: rel: "items" with resource type in href
if (rel === 'items' && typeof href === 'string' && isTrustedHref(href)) {
// ...existing normalization...
}Pros: Small, focused change. Uses the standard new URL() constructor (WHATWG URL spec) for scheme parsing. Relative URLs pass through unchanged. No behavioral change for any real-world server observed in testing (all known servers return http:// or https:// hrefs, or relative URLs).
Cons: Adds one helper function and one try/catch per href. Negligible performance impact.
Effort: Small | Risk: None
Option B: No change — document that consumers are responsible for trusting their servers
Add a JSDoc @security note to scanCsapiLinks() stating that consumers must trust the server they connect to.
Pros: Zero risk; trivial effort.
Cons: Does NOT harden the library. The issue remains for any consumer that renders URLs from untrusted sources.
Effort: Trivial | Risk: None (but attack surface remains)
Not recommended. The fix is small enough that defense-in-depth is worth the effort.
Scope — What NOT to Touch
- ❌ Do NOT modify any file outside
src/ogc-api/csapi/— per the upstream maintainer's isolation requirement - ❌ Do NOT add scheme validation to
buildResourceUrl()orCSAPIQueryBuilder— the check belongs at the ingestion point (scanCsapiLinks), not downstream - ❌ Do NOT add scheme validation to the
CSAPIQueryBuilderconstructor'sresourceUrlsparameter — that is a consumer-supplied escape hatch; trust the consumer - ❌ Do NOT reject or warn on
http://URLs — only non-HTTP(S) absolute schemes should be filtered - ❌ Do NOT reject empty strings — the existing
typeof href === 'string' ? href : ''fallback is a separate minor concern (empty string stored as resource URL) - ❌ Do NOT combine this with finding 004/issue subPath appended to URLs without encoding or type constraint #142 (
subPathencoding) — different function, different attack surface
Acceptance Criteria
-
isTrustedHref()(or equivalent) validates URL scheme before storage -
scanCsapiLinks()skips hrefs withjavascript:,data:,ftp:, or other non-HTTP(S) absolute schemes - Relative hrefs are still stored unchanged (they resolve against the trusted base)
- Absolute
http://andhttps://hrefs are stored as before - Protocol-relative URLs (
//evil.com/...) are rejected (they parse as absolute with the page's scheme) - New test cases cover:
javascript:rejected,data:rejected,//evil.comrejected,http://accepted,https://accepted, relative URL accepted - All existing
scanCsapiLinkstests pass unchanged - Existing tests still pass (
npm test) - No lint errors (
npm run lint) - All modified files pass
npx prettier --check
Dependencies
Blocked by: Nothing (can be done independently)
Blocks: Nothing
Coordinate with:
- DEFERRED — No
@link/@idresolution utilities for cross-resource reference following #110 — Cross-reference resolution utilities. If implemented, the proposedresolveResourceRefHref()would also consume server hrefs and should apply the same scheme check. Independent concerns — this issue hardens the existing ingestion point.
Related: - subPath appended to URLs without encoding or type constraint #142 —
subPathencoding (finding 004). Same defense-in-depth category but different function (buildResourceUrl()) and different attack surface (path traversal vs scheme injection). - Finding 005 —
OgcApiEndpointconstructor acceptshttp://without warning. Related upstream scheme concern — the initial URL's scheme is also unvalidated, but that is upstream code.
Operational Constraints
⚠️ MANDATORY: Before starting work on this issue, reviewdocs/governance/AI_OPERATIONAL_CONSTRAINTS.md.
Key constraints:
- Precedence: OGC specifications → AI Collaboration Agreement → This issue description → Existing code → Conversational context
- No scope expansion: Add the scheme check to
scanCsapiLinks, nothing more - Minimal diffs: Prefer the smallest change that satisfies the acceptance criteria
- Ask when unclear: If intent is ambiguous, stop and ask for clarification
- Postel's Law (Phase 3 Lesson 2): The check must not reject valid server responses. All real-world servers observed in testing return
http://orhttps://hrefs, or relative URLs. The filter only blocks exotic schemes (javascript:,data:, etc.) that no spec-compliant server would return.
Upstream Isolation Constraint
Per the upstream maintainer's comment on PR #136:
- Anything in
src/ogc-api/csapimust NOT be included in the rootindex.ts - Anything NOT in
src/ogc-api/csapimust NOT import from CSAPI code - All files in this issue's "Files to Modify" table are within
src/ogc-api/csapi/— this fix is purely internal to the isolated CSAPI module
Ownership-Specific Constraints
Ownership: Ours
- Fix on the working branch
- Include in the next commit to
clean-prif the PR is still open - Add tests that cover the finding — specifically, tests for rejected and accepted URL schemes
References
Original Code Review Finding
| # | Document | What It Provides |
|---|---|---|
| 0 | docs/code-review/011-pending-p3-server-href-scheme-validation.md |
The original finding from the senior developer code review. Authoritative source for the problem statement, proposed isTrustedHref solution (Option A), and acceptance criteria. This issue tracks the resolution of that finding. |
Directly Affected Code
| # | Document | What It Provides |
|---|---|---|
| 1 | src/ogc-api/csapi/helpers.ts lines 129–171 |
scanCsapiLinks() — the function that stores server-returned href values without scheme validation |
| 2 | src/ogc-api/csapi/factory.ts lines 58–68 |
createCSAPIBuilder() — primary consumer of scanCsapiLinks(), passes the map directly to CSAPIQueryBuilder |
| 3 | src/ogc-api/csapi/url_builder.ts — resourceUrls_ field (~line 142) |
Private field storing the unvalidated map from scanCsapiLinks() |
| 4 | src/ogc-api/csapi/url_builder.ts — buildResourceUrl() (~lines 256–272) |
Core URL construction method that uses resourceUrls_.get() as the URL base for all 80+ query methods |
| 5 | src/ogc-api/csapi/helpers.spec.ts lines 137–248 (primary), 440–484 (edge cases) |
Existing scanCsapiLinks unit tests — 12 cases in the primary describe block, 5 additional edge cases (17 total). No current tests for scheme validation/rejection. |
Related Open Issues & Findings
| # | Document | What It Provides |
|---|---|---|
| 6 | Issue #142 — subPath appended to URLs without encoding |
Same defense-in-depth category — input to URL construction is not validated/encoded. Different function and attack surface (path traversal vs scheme injection). |
| 7 | Issue #110 — No @link/@id resolution utilities |
Proposes resolveResourceRefHref() which would also consume server hrefs — future coordination point for scheme validation. |
| 8 | Finding 005 — OgcApiEndpoint constructor accepts http:// without warning |
Related upstream scheme concern — the initial URL's scheme is also unvalidated, but that is upstream code we do not modify. |
Standards
| # | Document | What It Provides |
|---|---|---|
| 9 | OGC API — Connected Systems Part 1 (23-001) | Defines the HATEOAS link patterns (ogc-cs: prefix, resource type relations) from which scanCsapiLinks extracts hrefs. |
| 10 | OGC API — Connected Systems Part 2 (23-002) | Extends Part 1 with dynamic data resources discovered via the same link scanning pattern. |
| 11 | OGC API — Common (19-072) | Foundation spec defining HATEOAS link objects — the href field in link objects originates from this spec. |
| 12 | RFC 3986 — URI Generic Syntax | Defines URI scheme component (§3.1) and absolute vs relative reference distinction. The fix's new URL() check relies on scheme parsing per this RFC. |
| 13 | WHATWG URL Standard | Specifies how new URL() parses URLs in browsers and Node.js. The proposed isTrustedHref uses new URL() which follows this standard. |
| 32 | OGC API — Connected Systems Part 1: OpenAPI Specification | Machine-readable OpenAPI definition of CSAPI Part 1 REST API. Response schemas define link objects with href fields — the authoritative schema for what servers are expected to return. |
| 33 | OGC API — Connected Systems Part 2: OpenAPI Specification | Machine-readable OpenAPI definition of CSAPI Part 2 REST API. Defines link object schemas for dynamic data resources (DataStreams, Observations, ControlStreams, Commands). |
| 34 | OGC API — Features (17-069r4) | REST API for geospatial features. Part 1 resources use Features API patterns — link objects and GeoJSON encoding inherited by CSAPI. Defines the link array structure that scanCsapiLinks consumes. |
| 35 | IANA Link Relations Registry | Official registry of link relation types (self, alternate, collection, item, next, prev). scanCsapiLinks checks rel values against known CSAPI-specific relations — this registry defines the standard ones. |
| 36 | GeoJSON (RFC 7946) | Primary encoding format for CSAPI Part 1 resources. RFC 7946 itself does not define links arrays — those are added by OGC API conventions (Common/Features). Relevant as the carrier format for server responses that contain link objects processed elsewhere in the pipeline. |
Architecture & Design
| # | Document | What It Provides |
|---|---|---|
| 14 | CSAPI Implementation Guide | Format handler and helper function design requirements — defines architectural expectations for helpers.ts. |
| 15 | Cross-Server Interoperability Analysis | Documents bugs previously found and fixed in scanCsapiLinks() (query params in href, featuresOfInterest alias). Confirms "this function we wrote from scratch" ownership. |
| 16 | EDR Integration Pattern Analysis | Deep dive into scanCsapiLinks() architecture and data flow from root document → scanCsapiLinks → resourceUrls → CSAPIQueryBuilder. |
| 37 | URL Building Architecture | Documents URL building patterns in ogc-client — base URL strategy, query parameter assembly, resource path structure. Provides architectural context for the URL construction pipeline downstream of scanCsapiLinks, though the document itself predates that function and does not reference it by name. |
| 38 | QueryBuilder Pattern Analysis | Documents the QueryBuilder lifecycle, state management, and caching in the upstream library. Provides the design context for how CSAPIQueryBuilder manages state, including the resourceUrls map populated by scanCsapiLinks. The document does not reference scanCsapiLinks by name. |
| 39 | Error Handling Design Analysis | Documents error handling patterns in ogc-client — error classes, validation errors, missing resource errors. Provides design guidance for how the scheme validation fix should handle rejected hrefs (silent skip vs error throw). The document does not reference scanCsapiLinks or isTrustedHref by name. |
| 40 | Code Reuse vs Duplication Strategy | Defines when to reuse upstream utilities vs duplicate for CSAPI. Establishes the isolation principle — new helpers like the proposed isTrustedHref should be CSAPI-internal per the isolation constraint. The document does not reference scanCsapiLinks or isTrustedHref by name. |
| 41 | PR #114 (EDR Implementation) Analysis | Primary blueprint for CSAPI implementation. Documents the EDR factory method pattern (endpoint.edr() → EDRQueryBuilder) that our createCSAPIBuilder() follows. The document does not mention scanCsapiLinks or createCSAPIBuilder by name — those are CSAPI-specific functions that follow the patterns documented here. |
| 42 | Integration with Existing Code | Documents code changes for CSAPI integration into endpoint.ts, info.ts, and index.ts. Describes the factory method → CSAPIQueryBuilder flow following the EDR pattern. The document does not mention factory.ts or scanCsapiLinks — those are implementation details that emerged during development. |
| 43 | Architecture Patterns Analysis | Documents consistent architectural patterns in ogc-client — endpoint extension, conformance checking, caching, lazy loading. Provides the architectural context within which scanCsapiLinks operates, though the document does not mention it by name. |
| 44 | CSAPI Architecture Decisions | Architecture choices for 9 CSAPI resource types within ogc-client patterns. All resource types have their endpoint URLs resolved via the same link scanning mechanism, though this document does not reference scanCsapiLinks by name. |
| 45 | Format Negotiation Architecture | Documents format negotiation patterns (Accept header, query parameter, link-based discovery) for CSAPI resources. Notes that CSAPI does not need link-based format discovery. Relevant as background context — format requests are issued against URLs whose base originates from scanCsapiLinks output, though the document does not reference scanCsapiLinks by name. |
| 46 | Design Strategy Research | High-level design strategy covering integration approach, format abstraction, and architectural trade-offs for the overall CSAPI module. Provides strategic context for the implementation, though it does not reference scanCsapiLinks by name. |
| 47 | CSAPIQueryBuilder Architecture Decisions | 22-entry research series for CSAPIQueryBuilder design. Includes EDR pattern analysis, QueryBuilder pattern, scope analysis. Provides the design rationale for the QueryBuilder that consumes resourceUrls, though the series does not reference scanCsapiLinks by name. |
| 48 | OGCAPIEndpoint Integration Research | Factory method integration pattern and conformance detection research. Documents the endpoint.csapi() entry point that leads to CSAPIQueryBuilder instantiation, though the documents do not reference scanCsapiLinks by name. |
Requirements Research
| # | Document | What It Provides |
|---|---|---|
| 49 | 52°North Server Implementation Analysis | Analyzes 52°North's CSAPI server — documents multi-server compatibility requirements, pagination differences, and format support variations. Contains example link objects with href values (e.g., pagination links), providing real-world evidence of server response patterns. |
| 50 | OpenSensorHub Server Analysis | Comprehensive analysis of the primary CSAPI reference implementation. Contains example responses showing relative hrefs (e.g., /api/systems/abc123), demonstrating the server's convention of returning relative URLs. Provides real-world evidence of what href values servers actually return. |
| 51 | CSAPI Part 1 Requirements | Documents 5 resource types, 70+ operations, 30+ query parameters — all of which construct URLs on top of scanCsapiLinks output. Defines the breadth of impact if a malicious href propagates. |
| 52 | CSAPI Part 2 Requirements | Documents DataStreams, Observations, ControlStreams, Commands — all dynamic data resources whose URLs build on scanCsapiLinks output via buildResourceUrl(). |
| 53 | Query Parameter Requirements | Catalogs ALL query parameters from Part 1 and Part 2 — every parameter is appended to base URLs from scanCsapiLinks. Defines the full downstream impact surface. |
| 54 | Sub-Resource Navigation Requirements | Documents nested endpoint URL construction (e.g., /systems/{id}/subsystems) — all nested paths are built on scanCsapiLinks base URLs. |
| 55 | Gap Analysis from Previous Iteration | Identifies gaps and lessons from the first implementation attempt — documents error handling improvements needed, relevant to why defense-in-depth was missing. |
| 56 | Lessons Learned from Previous Iterations | Documents what worked and what didn't — over-engineering warnings relevant to keeping the isTrustedHref fix minimal. |
| 57 | Upstream Library Expectations | Defines camptocamp/ogc-client quality expectations — the upstream quality bar that motivates defense-in-depth hardening. |
| 58 | OWSLib CSAPI Implementation Analysis | Mature Python client reference — documents class-per-resource architecture, URL construction patterns, and authentication abstraction in a peer CSAPI client library. |
| 59 | OSHConnect-Python Client Analysis (Detailed) | Builder pattern and request construction in Python CSAPI client. Documents client-side URL construction (base_url + path) rather than link-following — a contrasting approach to our scanCsapiLinks server-href-based URL resolution. |
| 60 | Usage Scenarios and Priorities | 15 core usage scenarios, 8 common error patterns — all workflows that consume URLs built on scanCsapiLinks output. Error handling requirements inform the silent-skip behavior. |
| 61 | Full Implementation Scope Definition | Defines complete implementation scope — production-ready requirements that motivate security hardening like scheme validation. |
Testing
| # | Document | What It Provides |
|---|---|---|
| 17 | Live Smoke Test ST#7 (post-2.2) | First live verification of scanCsapiLinks against real OSH servers — confirms Convention 2 and 3 detection patterns work with real http:// hrefs. |
| 18 | Live Smoke Test ST#8 (post-2.3) | Re-verified scanCsapiLinks, found properties resource type not discoverable via root links on either server. |
| 19 | Live Smoke Test ST#9 (post-2.4) | Confirmed Convention 3 fix (featuresOfInterest normalization, query param stripping) works on both servers. |
| 62 | Testing Strategy Research | Comprehensive testing strategy for CSAPI — coverage targets, fixture strategy, test organization. Defines the quality bar for new isTrustedHref tests. |
| 63 | Testing Research Plans (20 Plans) | Systematic 20-plan series. Plan 12 (QueryBuilder testing strategy) and Plan 18 (error condition testing) are directly relevant to testing the scheme validation behavior. |
| 64 | Testing Strategy Review Phases | Multi-phase testing strategy review, including notes-why-models-default-to-server-validation.md — directly relevant to the question of whether to validate/reject server-returned data or trust it. |
| 65 | Test Fixtures Guide | Guide for fixture creation and management — informs how to create test fixtures for scheme validation test cases. |
Governance
| # | Document | What It Provides |
|---|---|---|
| 20 | AI Operational Constraints | Mandatory operational constraints for implementation |
| 21 | AI Collaboration Agreement | Collaboration framework — precedence rules, scope boundaries |
| 22 | Phase 3 Lessons Learned | Lesson 2 (Postel's Law — never gate on validation that blocks real-world data). The scheme check must not reject valid server responses. |
| 23 | Phase 2 Lessons Learned | Lesson 9 (DRY violation in link scanning that led to scanCsapiLinks creation), Lesson 10 (Convention 3 bugs found via multi-vendor testing). |
| 24 | Known Server Quirks | Documents actual href values returned by real servers — all observed values are http:///https:// or relative. No exotic schemes observed. |
| 66 | Code Review Prompt Templates | Phase 6 code review methodology template for AI-conducted reviews. Defines the systematic review process and quality criteria used across implementation phases. This finding was produced by a separate human senior developer review, not from this AI template. |
| 67 | Issue Creation Prompt Template (Code Review) | Template C — the code review issue creation template used to file this issue. Defines ownership verification workflow and severity assessment framework. |
Implementation Records
| # | Document | What It Provides |
|---|---|---|
| 25 | Phase 2.2 Code Review | F2 finding identified DRY violation in link scanning, recommending extraction of scanCsapiLinks. Original review of helpers.ts. |
| 26 | CSAPI Code Audit Phase 6 | Comprehensive code audit cataloguing helpers.ts (226 lines), [P-6] POSITIVE for scanCsapiLinks normalizing featuresOfInterest alias. Did not flag scheme validation gap. |
| 27 | Final Project Code Review | §6.5 reviews helpers.ts, rates "Excellent" with 72 test cases. Did not flag scheme validation gap. |
| 68 | Phase 2 Overview | Phase 2 implementation overview — documents when scanCsapiLinks() was originally created and the design decisions made at that time. |
Planning
| # | Document | What It Provides |
|---|---|---|
| 28 | Contribution Goal & Definition | "Zero-breaking-change integration with existing library functionality" — the fix must not break existing consumers. |
| 29 | Phase 6 Implementation Guide | Documents that scanCsapiLinks stays in CSAPI, its internal-only import classification, and the factory module data flow. |
| 70 | Phase 6 Roadmap | Phase 6 roadmap for upstream acceptance refactoring — the current phase context in which this finding was discovered. |
| 71 | Phase 6 Contribution Goal & Definition | PR acceptance readiness definition — security findings like this one need resolution before the PR is considered complete. |
Upstream PR Context
| # | Document | What It Provides |
|---|---|---|
| 30 | Upstream PR #136 | Draft PR on camptocamp/ogc-client that includes helpers.ts — any fix must be reflected here. |
| 31 | jahow's isolation comment | Upstream maintainer's requirement that CSAPI must be fully isolated — confirms all changes are within the allowed boundary. |
Code Repositories
| # | Document | What It Provides |
|---|---|---|
| 72 | camptocamp/ogc-client | The upstream library we're extending. Provides URL handling patterns, existing OGC API utilities, and the architectural foundation that scanCsapiLinks operates within. |
Live Infrastructure
| # | Document | What It Provides |
|---|---|---|
| 73 | OpenSensorHub Demo Server | Primary live CSAPI server for validation — returns real http:// hrefs in link objects. Use to verify that isTrustedHref does not reject any real-world server responses. |
| 74 | 52°North Demo Server | Secondary live CSAPI server (degraded, expired SSL). Returns different href patterns than OSH. Multi-vendor validation target for ensuring the scheme filter works across implementations. |