-
Notifications
You must be signed in to change notification settings - Fork 1
parseDatastream() and parseControlStream() share ~30 lines of identical base-stream extraction logic — DRY violation in part2.ts #146
Description
Finding
parseDatastream() and parseControlStream() in part2.ts share ~30 lines of identical extraction logic for 7 base-stream fields. Any bug fix or new field must be applied twice.
Review Source: Senior developer code review of clean-pr — docs/code-review/010-pending-p2-datastream-controlstream-base-duplication.md
Severity: P2-Important
Category: Code Quality / DRY / Architecture
Ownership: Ours
Problem Statement
parseDatastream() and parseControlStream() in src/ogc-api/csapi/formats/part2.ts both extract the same 7 base-stream fields from raw JSON using near-identical code. The OGC 23-002 spec defines a shared baseStream schema for both resource types (§9.2 DataStream, §10.2 ControlStream), and our code duplicates the extraction of that shared schema rather than factoring it out.
Duplicated blocks (line-by-line mapping):
| Field(s) | parseDatastream |
parseControlStream |
|---|---|---|
| Null guard + cast | Lines 114–118 | Lines 214–219 |
id, name, description |
Lines 142–147 | Lines 222–227 |
validTime |
Lines 122–123 | Lines 228–229 |
formats |
Lines 148–150 | Lines 232–234 |
systemId cross-ref |
Lines 160–162 | Lines 248–250 |
links |
Lines 166 | Lines 252 |
The JSDoc on parseControlStream already acknowledges this:
Structurally parallel to
parseDatastream— both share thebaseStreamschema in the OpenAPI spec.
Only the resource-specific fields differ:
- Datastream:
observedProperties,phenomenonTime,resultTime,resultType,outputName,live,type - ControlStream:
controlledProperties,issueTime,executionTime,inputName,live,async
Affected code — parseDatastream (lines 114–167):
export function parseDatastream(json: unknown): Datastream {
if (typeof json !== 'object' || json === null) {
throw new Error('parseDatastream: input must be a non-null object');
}
const obj = json as Record<string, unknown>;
// ...time fields...
return {
id: typeof obj.id === 'string' ? obj.id : '',
name: typeof obj.name === 'string' ? obj.name : '',
...(typeof obj.description === 'string' ? { description: obj.description } : {}),
...(validTime !== undefined ? { validTime } : {}),
formats: Array.isArray(obj.formats)
? (obj.formats.filter((f) => typeof f === 'string') as string[]) : [],
// ...Datastream-specific fields...
...(typeof obj['system@id'] === 'string' ? { systemId: obj['system@id'] as string } : {}),
links: Array.isArray(obj.links) ? (obj.links as ResourceLink[]) : [],
} satisfies Datastream;
}Affected code — parseControlStream (lines 214–253):
export function parseControlStream(json: unknown): ControlStream {
if (typeof json !== 'object' || json === null) {
throw new Error('parseControlStream: input must be a non-null object');
}
const obj = json as Record<string, unknown>;
// ...time fields...
return {
id: typeof obj.id === 'string' ? obj.id : '',
name: typeof obj.name === 'string' ? obj.name : '',
...(typeof obj.description === 'string' ? { description: obj.description } : {}),
...(validTime !== undefined ? { validTime } : {}),
formats: Array.isArray(obj.formats)
? (obj.formats.filter((f) => typeof f === 'string') as string[]) : [],
// ...ControlStream-specific fields...
...(typeof obj['system@id'] === 'string' ? { systemId: obj['system@id'] as string } : {}),
links: Array.isArray(obj.links) ? (obj.links as ResourceLink[]) : [],
} satisfies ControlStream;
}Impact: The repetition means any bug fix or new shared field (e.g., a new cross-reference field from the spec) must be applied in two places. Issue #103 (cross-reference fields) would require updating both functions identically — a parseBaseStream helper would consolidate this.
Ownership Verification
$ git log upstream/main --oneline -- src/ogc-api/csapi/formats/part2.ts
(empty — zero commits on upstream/main)
$ git blame --line-porcelain src/ogc-api/csapi/formats/part2.ts | Select-String "^author " | Group-Object
Count: 517, Name: "author Sam-Bolling"
Conclusion: This code is ours. The file src/ogc-api/csapi/formats/part2.ts does not exist on upstream/main — the entire file is our contribution, with 100% sole authorship.
Files to Modify
| File | Action | Est. Lines | Purpose |
|---|---|---|---|
src/ogc-api/csapi/formats/part2.ts |
Modify | ~40 | Add parseBaseStream helper; update both parsers to call it |
src/ogc-api/csapi/formats/part2.spec.ts |
None | 0 | All existing tests should pass unchanged — pure internal refactor |
Proposed Solutions
Option A: Extract parseBaseStream helper (Recommended)
interface BaseStream {
id: string;
name: string;
description?: string;
validTime?: TimeInterval;
formats: string[];
systemId?: string;
links: ResourceLink[];
}
function parseBaseStream(
fn: string,
json: unknown
): { base: BaseStream; obj: Record<string, unknown> } {
if (typeof json !== 'object' || json === null)
throw new Error(`${fn}: input must be a non-null object`);
const obj = json as Record<string, unknown>;
return {
obj,
base: {
id: typeof obj.id === 'string' ? obj.id : '',
name: typeof obj.name === 'string' ? obj.name : '',
...(typeof obj.description === 'string'
? { description: obj.description }
: {}),
validTime: parseValidTime(obj.validTime),
formats: Array.isArray(obj.formats)
? obj.formats.filter((f): f is string => typeof f === 'string')
: [],
...(typeof obj['system@id'] === 'string'
? { systemId: obj['system@id'] as string }
: {}),
links: Array.isArray(obj.links) ? (obj.links as ResourceLink[]) : [],
},
};
}Both parsers then become:
export function parseDatastream(json: unknown): Datastream {
const { base, obj } = parseBaseStream('parseDatastream', json);
// ...only Datastream-specific fields below
return { ...base, observedProperties, phenomenonTime, resultTime, resultType, live, ... };
}
export function parseControlStream(json: unknown): ControlStream {
const { base, obj } = parseBaseStream('parseControlStream', json);
// ...only ControlStream-specific fields below
return { ...base, controlledProperties, issueTime, executionTime, live, async, ... };
}Pros: Eliminates ~30 lines of duplicated extraction. Single point of change for shared base-stream fields. Follows the Postel's Law tolerant extraction pattern established in Phase 3. Proven pattern — Phase 5.4 used the same approach with parseComponentEntry (saved 39 net lines).
Cons: Adds one helper function. The validTime extraction moves into the helper, so the resource-specific time fields (phenomenonTime/resultTime vs issueTime/executionTime) must still be parsed in each caller.
Effort: Medium | Risk: Low (pure refactor, no behavioral change)
Option B: Leave as-is with a comment linking the two functions
Add a @see cross-reference comment to each function pointing to the other.
Pros: Zero risk; trivial effort.
Cons: Does NOT reduce code. The duplication remains. Bug fixes still need to be applied twice.
Effort: Trivial | Risk: None
Not recommended. The JSDoc already notes the parallelism — adding more comments doesn't solve the DRY problem.
Scope — What NOT to Touch
- ❌ Do NOT modify any file outside
src/ogc-api/csapi/— per the upstream maintainer's isolation requirement - ❌ Do NOT change the public API signatures of
parseDatastream()orparseControlStream()— return types must remainDatastreamandControlStreamrespectively - ❌ Do NOT export
parseBaseStream— it is an internal helper only - ❌ Do NOT refactor the resource-specific fields into the helper — only the 7 shared base-stream fields belong there
- ❌ Do NOT combine this with finding 011 (
requireObjectnull-guard extraction) in the same PR — those are independent concerns even though they share the null-guard code
Acceptance Criteria
-
parseBaseStreamhelper (or equivalent) extracts the 7 common fields (id,name,description,validTime,formats,systemId,links) -
parseDatastream()callsparseBaseStreamand adds only Datastream-specific fields -
parseControlStream()callsparseBaseStreamand adds only ControlStream-specific fields - All Part 2 parser tests pass (
part2.spec.ts) - Existing tests still pass (
npm test) - No lint errors (
npm run lint) - All modified files pass
npx prettier --check
Dependencies
Blocked by: Nothing (can be done independently)
Blocks: Nothing
Coordinate with:
- Finding 011 —
requireObjectnull-guard extraction. The null guard inparseBaseStreamcould later be replaced by arequireObjectutility if finding 011 is implemented. Independent concerns — do them in either order. - Parsed Part 2 models discard all cross-reference fields — parent resource navigation impossible without raw JSON #103 — Cross-reference fields (
system@id,system@link). Adding new cross-ref fields would benefit from theparseBaseStreamhelper existing first, but Parsed Part 2 models discard all cross-reference fields — parent resource navigation impossible without raw JSON #103 can also be done independently.
Related: parseControlStreamSchemaResponse()does not acceptparamsSchema— silent data loss on older OSH servers #140 —parseControlStreamSchemaResponse()paramsSchemasilent data loss. Same file (part2.ts) but different function and different concern.- [SPLIT] parseCollectionResponse casts raw JSON to T[] without element validation → #154 + #155 #141 —
parseCollectionResponse<T>generic cast. Upstream boundary that feeds into both parsers — independent concern. - Finding 008 / SensorML parsers spread raw JSON into typed results — bypasses type system #144 — SensorML raw JSON spread. Same pattern category (parser DRY) but different file.
Operational Constraints
⚠️ MANDATORY: Before starting work on this issue, reviewdocs/governance/AI_OPERATIONAL_CONSTRAINTS.md.
Key constraints:
- Precedence: OGC specifications → AI Collaboration Agreement → This issue description → Existing code → Conversational context
- No scope expansion: Extract the shared base-stream fields, nothing more
- Minimal diffs: Prefer the smallest change that satisfies the acceptance criteria
- Ask when unclear: If intent is ambiguous, stop and ask for clarification
- Tolerant extraction: The
parseBaseStreamhelper must follow Postel's Law — accept malformed input gracefully, map missing/null fields to defaults (per Phase 3 Lesson 2)
Upstream Isolation Constraint
Per the upstream maintainer's comment on PR #136:
- Anything in
src/ogc-api/csapimust NOT be included in the rootindex.ts - Anything NOT in
src/ogc-api/csapimust NOT import from CSAPI code - All files in this issue's "Files to Modify" table are within
src/ogc-api/csapi/— this fix is purely internal to the isolated CSAPI module
Ownership-Specific Constraints
Ownership: Ours
- Fix on the working branch
- Include in the next commit to
clean-prif the PR is still open - All existing tests must continue to pass — this is a pure internal refactor with no public API changes
References
Original Code Review Finding
| # | Document | What It Provides |
|---|---|---|
| 0 | docs/code-review/010-pending-p2-datastream-controlstream-base-duplication.md |
The original finding from the senior developer code review. Authoritative source for the problem statement, line-by-line mapping, proposed parseBaseStream helper (Option A), and acceptance criteria. This issue tracks the resolution of that finding. |
Directly Affected Code
| # | Document | What It Provides |
|---|---|---|
| 1 | src/ogc-api/csapi/formats/part2.ts lines 114–167 |
parseDatastream() — one half of the duplicated extraction pattern |
| 2 | src/ogc-api/csapi/formats/part2.ts lines 214–253 |
parseControlStream() — the other half of the duplicated extraction pattern |
| 3 | src/ogc-api/csapi/model.ts — Datastream interface (~line 532) |
Type definition — 8 fields shared with ControlStream (including live); 7 of these are proposed for the parseBaseStream helper (live handled as resource-specific in each caller) |
| 4 | src/ogc-api/csapi/model.ts — ControlStream interface (~line 604) |
Type definition — 8 fields shared with Datastream (including live); 7 of these are proposed for the parseBaseStream helper (live handled as resource-specific in each caller) |
| 5 | src/ogc-api/csapi/formats/part2.spec.ts |
Test suite — must pass unchanged after refactor |
| 6 | src/ogc-api/csapi/formats/geojson.ts — parseValidTime() |
Shared helper already used by both parsers for validTime extraction — the parseBaseStream helper would call this |
Related Open Issues & Findings
| # | Document | What It Provides |
|---|---|---|
| 7 | Issue #140 — parseControlStreamSchemaResponse() paramsSchema silent data loss |
Same file, different function. Independent concern but shows part2.ts has multiple pending improvements. |
| 8 | Issue #141 — parseCollectionResponse<T> generic cast |
Upstream boundary that feeds unvalidated items into both parseDatastream and parseControlStream. Independent concern. |
| 9 | Issue #144 — SensorML raw JSON spread | Same pattern category (parser DRY / raw JSON handling) but different file (sensorml.ts). |
| 10 | Finding 011 — requireObject null-guard extraction |
Referenced in finding 010's acceptance criteria. The null-guard portion of parseBaseStream could be further factored into a shared requireObject utility. |
| 11 | Demo App Finding — Issue #103: Cross-reference fields | Documents system@id / system@link cross-reference fields shared between both parsers. Adding new cross-ref fields would need to be done twice without parseBaseStream. |
| 12 | Demo App Finding — Issue #11: Generic CRUD Methods | Identified structural repetition in consumer-side dispatch code across all CSAPI resource types (Part 1 and Part 2) — proposed generic CRUD to reduce consumer boilerplate. Same DRY motivation as this finding but at the consumer abstraction level, not library-internal. |
| 13 | Issue #143 — extractCSAPIFeature casts properties to Record without null check | Same pattern category (parser safety) — the null-guard pattern in parseBaseStream relates to this class of concern. |
Standards
| # | Document | What It Provides |
|---|---|---|
| 14 | OGC API — Connected Systems Part 2 (23-002) | Primary spec. §9.2 Table 5 (DataStream) and §10.2 Table 10 (ControlStream) share the baseStream schema — the spec-level justification for the shared helper. |
| 15 | CSAPI Part 2 OpenAPI Spec | Machine-readable schema confirming the shared baseStream component between DataStream and ControlStream response schemas. |
| 16 | OGC API — Connected Systems Part 1 (23-001) | Part 1 defines the collection patterns that Part 2 extends. DataStream/ControlStream are Part 2 resources but inherit the general resource discovery and collection metadata patterns from Part 1. |
| 17 | CSAPI Part 1 OpenAPI Spec | Machine-readable Part 1 endpoint schemas — Part 2 resources follow the same structural patterns for collections and items. |
| 18 | GeoJSON (RFC 7946) | Part 2 resources are encoded as JSON objects. The parseValidTime() helper used by both parsers lives in geojson.ts. GeoJSON conventions influence the field extraction patterns. |
| 19 | ISO 8601 (Date and Time Format) | All time fields in both parsers (validTime, phenomenonTime, resultTime, issueTime, executionTime) use ISO 8601 intervals parsed by parseValidTime(). |
Architecture & Design
| # | Document | What It Provides |
|---|---|---|
| 20 | CSAPI Implementation Guide | Format handler design requirements and endpoint coverage — defines the architectural expectations for parser functions. |
| 21 | Code Reuse vs Duplication Strategy | Governs DRY principle vs isolation — when to extract shared utilities within the CSAPI module. Directly applicable to the parseBaseStream extraction decision. |
| 22 | Validation/Extraction Decoupling Design Notes | Architectural decision to separate validation from extraction — parseBaseStream follows this pattern (extraction only, tolerant). |
| 23 | Component Design Sequence | Dependency chains between format handlers, query builder, and integration layer — confirms format handlers are independently refactorable. |
| 24 | CSAPI Architecture Decisions | Part 1 vs Part 2 architecture decisions, resource type organization — documents DataStream and ControlStream as parallel Part 2 resources with similar URL patterns and format support. Covers shared URL-building helper methods across all 9 resource types. |
| 25 | TypeScript Type System Design | Interface definition patterns and type hierarchy design — informs the BaseStream interface design for the shared helper. |
| 26 | Error Handling Design Analysis | Error handling patterns — the null-guard throw new Error() in parseBaseStream must follow established conventions. |
| 27 | Design Strategy Research | Early-phase design strategy that advocated a URL-building-only approach with no parsing. Superseded by later implementation decisions that added format handlers. Provides historical context on how the parser architecture evolved beyond the original strategy. |
| 28 | Architecture Patterns Analysis | Upstream patterns for shared utilities — how the existing ogc-client organizes reusable helpers. |
Requirements Research
| # | Document | What It Provides |
|---|---|---|
| 29 | CSAPI Part 2 Requirements | Comprehensive Part 2 spec analysis covering DataStream and ControlStream resource types — documents the shared field set that justifies parseBaseStream. |
| 30 | Data Type and Schema Requirements | TypeScript interface definitions including Datastream and ControlStream — the BaseStream interface proposed in the fix needs to align with these established type patterns. |
| 31 | Comprehensive Format Requirements | GeoJSON, SensorML, and SWE Common format requirements for both resource types — parsing strategy that both parsers implement. |
| 32 | Common Format Requirements | Format negotiation and media type handling — the formats field extracted by both parsers maps to these format identifiers. |
| 33 | OpenSensorHub Server Analysis | Primary test server — documents actual DataStream/ControlStream response shapes from live data that both parsers handle. |
| 34 | 52°North Server Analysis | Multi-server compatibility research — 52°North has not implemented ControlStreams and DataStream support is partial ("In Development"). Documents general server differences (pagination, error patterns, conformance) relevant to tolerant parsing design. |
| 35 | Sub-Resource Navigation Requirements | Nested endpoint patterns — DataStreams and ControlStreams are sub-resources of Systems, documenting the navigation context. |
| 36 | Gap Analysis from Previous Iteration | Lessons from failed iteration — format parsing completeness requirements that motivated building both parsers comprehensively. |
| 37 | Lessons Learned from Previous Iterations | Mistakes to avoid from previous attempts — over-engineering vs under-engineering balance relevant to the extraction scope. |
| 38 | Upstream Library Expectations | Format abstraction alignment with existing upstream parsers — the shared helper pattern should match upstream conventions. |
| 39 | OSHConnect-Python Client Analysis | Python CSAPI client's approach to DataStream parsing (ControlStreams not implemented in OSHConnect-Python) — Pydantic model patterns and type system design for comparison. |
| 40 | OWSLib Analysis | Mature Python client library patterns — class-per-resource architecture with shared base extraction as a reference approach. |
Testing References
| # | Document | What It Provides |
|---|---|---|
| 41 | Testing Strategy Research | Overall testing approach — refactor-safe test design ensuring the parseBaseStream extraction doesn't break behavior-level tests. |
| 42 | Live Smoke Test ST#20 (post-5.1) | parseDatastream() validated against live OSH data (4 samples, PASS) — establishes the baseline behavior that must be preserved. |
| 43 | Live Smoke Test ST#21 (post-5.2) | parseControlStream() validated against live OSH data (18 CS surveyed, 2 traced, PASS). Confirmed both parsers share normalizeObservedProperties() reuse. |
| 44 | Live Smoke Test ST#22 (post-5.3) | Confirmed Part 2 parsers stable and no regressions after Phase 5.3 schema-response additions. |
| 45 | Testing Plan 06 — Meaningful vs Trivial | Defines meaningful test coverage — confirms Part 2 parser tests are behavior-level (parsed output), not implementation-level (which private helper is called). Refactor-safe by design. |
| 46 | Parser Testing vs Spec Validation Notes | Parser testing philosophy — distinguishes between testing parser correctness vs spec validation. Relevant for understanding what the existing tests actually verify. |
| 47 | Testing Plan 11 — GeoJSON CSAPI Testing Requirements | GeoJSON CSAPI extension testing — both parsers handle JSON objects; the parseValidTime() dependency lives in the GeoJSON module. |
| 48 | Test Fixtures Guide | Fixture management for Part 2 resources — DataStream and ControlStream fixtures used in part2.spec.ts. |
Governance
| # | Document | What It Provides |
|---|---|---|
| 49 | AI Operational Constraints | Mandatory operational constraints for implementation |
| 50 | AI Collaboration Agreement | Collaboration framework — precedence rules, scope boundaries |
| 51 | Phase 3 Lessons Learned | Postel's Law tolerant extraction (Lesson 2), don't couple validation to extraction (Lesson 3) — governs how parseBaseStream must handle missing/malformed fields |
| 52 | Phase 2 Lessons Learned | Testing strategy refinement and code review process improvements from earlier implementation phases. |
| 53 | Known Server Quirks | Server-side non-compliant behaviors affecting parsing — the parseBaseStream helper must handle these gracefully (e.g., missing fields, unexpected types). |
| 54 | Code Review Prompt Template (Phase 5) | Template used for Phase 5 code reviews that originally examined both parsers — provides the quality criteria applied at the time. |
Implementation Records
| # | Document | What It Provides |
|---|---|---|
| 55 | Phase 5.1 Code Review | Review of parseDatastream() implementation (Task 2a) — documents field extraction patterns, normalizeObservedProperties() helper, cross-reference handling, and parser completion scorecard. |
| 56 | Phase 5.2 Code Review | F13: noted structural parallelism between both parsers. At the time assessed as correct (spec-driven). Finding 010 re-evaluates this as a DRY concern now that both parsers are stable. |
| 57 | Phase 5.3 Code Review | Schema-response parser additions. L7 DRY check PASS with discussion of parseComponentEntry duplication as acceptable at the time. Documents parseDatastreamSchemaResponse/parseControlStreamSchemaResponse delegation patterns — same file family. |
| 58 | Phase 5.4 Code Review | Precedent: F27 — parseComponentEntry DRY extraction saved 39 net lines. Same pattern of shared-helper extraction proposed here. Proves the pattern works and is accepted. |
| 59 | Final Project Code Review | F-NEW-04: identified 3 duplication groups in Part 2 format handlers. This issue addresses group 1 (validTime/properties extraction). Classified as "DESIGN, minor — deferred to follow-up PR." |
| 60 | CSAPI Code Audit Phase 6 | Comprehensive code audit that catalogued formats/part2.ts (518 lines), noted D-3 through D-6 duplication findings across format handlers, and documented the time field parsing asymmetry (C-2) between parseDatastream/parseControlStream and parseObservation. |
| 61 | P5 Findings Coverage Analysis | Phase 5 findings tracking — documents F33 (parseControlStreamSchemaResponse) and other Part 2 parser findings. |
| 62 | Cross-Server Interoperability Analysis | Multi-server DataStream/ControlStream response variations — the shared helper must handle format differences across OpenSensorHub and 52°North. |
Planning
| # | Document | What It Provides |
|---|---|---|
| 63 | Phase 5 Roadmap | Task sequencing — parseDatastream (Task 2a), parseControlStream (Task 4). Documents why both were implemented as separate functions. |
| 64 | Phase 5 Implementation Guide | Parser function signatures, field mapping specs — the definitive reference for what each parser must extract. Confirms the 7 shared fields. |
| 65 | Phase 5 Contribution Goal | Per-parser acceptance criteria — defines what "complete" means for parseDatastream and parseControlStream. |
| 66 | Contribution Goal & Definition | Overall contribution scope — format handlers are a core deliverable. |
Upstream PR Context
| # | Document | What It Provides |
|---|---|---|
| 67 | Upstream PR #136 | Draft PR on camptocamp/ogc-client that includes part2.ts — any refactor must be reflected here. |
| 68 | jahow's isolation comment | Upstream maintainer's requirement that CSAPI must be fully isolated — confirms all changes are within the allowed boundary. |