Skip to content

parseDatastream() and parseControlStream() share ~30 lines of identical base-stream extraction logic — DRY violation in part2.ts #146

@Sam-Bolling

Description

@Sam-Bolling

Finding

parseDatastream() and parseControlStream() in part2.ts share ~30 lines of identical extraction logic for 7 base-stream fields. Any bug fix or new field must be applied twice.

Review Source: Senior developer code review of clean-prdocs/code-review/010-pending-p2-datastream-controlstream-base-duplication.md
Severity: P2-Important
Category: Code Quality / DRY / Architecture
Ownership: Ours


Problem Statement

parseDatastream() and parseControlStream() in src/ogc-api/csapi/formats/part2.ts both extract the same 7 base-stream fields from raw JSON using near-identical code. The OGC 23-002 spec defines a shared baseStream schema for both resource types (§9.2 DataStream, §10.2 ControlStream), and our code duplicates the extraction of that shared schema rather than factoring it out.

Duplicated blocks (line-by-line mapping):

Field(s) parseDatastream parseControlStream
Null guard + cast Lines 114–118 Lines 214–219
id, name, description Lines 142–147 Lines 222–227
validTime Lines 122–123 Lines 228–229
formats Lines 148–150 Lines 232–234
systemId cross-ref Lines 160–162 Lines 248–250
links Lines 166 Lines 252

The JSDoc on parseControlStream already acknowledges this:

Structurally parallel to parseDatastream — both share the baseStream schema in the OpenAPI spec.

Only the resource-specific fields differ:

  • Datastream: observedProperties, phenomenonTime, resultTime, resultType, outputName, live, type
  • ControlStream: controlledProperties, issueTime, executionTime, inputName, live, async

Affected code — parseDatastream (lines 114–167):

export function parseDatastream(json: unknown): Datastream {
  if (typeof json !== 'object' || json === null) {
    throw new Error('parseDatastream: input must be a non-null object');
  }
  const obj = json as Record<string, unknown>;
  // ...time fields...
  return {
    id: typeof obj.id === 'string' ? obj.id : '',
    name: typeof obj.name === 'string' ? obj.name : '',
    ...(typeof obj.description === 'string' ? { description: obj.description } : {}),
    ...(validTime !== undefined ? { validTime } : {}),
    formats: Array.isArray(obj.formats)
      ? (obj.formats.filter((f) => typeof f === 'string') as string[]) : [],
    // ...Datastream-specific fields...
    ...(typeof obj['system@id'] === 'string' ? { systemId: obj['system@id'] as string } : {}),
    links: Array.isArray(obj.links) ? (obj.links as ResourceLink[]) : [],
  } satisfies Datastream;
}

Affected code — parseControlStream (lines 214–253):

export function parseControlStream(json: unknown): ControlStream {
  if (typeof json !== 'object' || json === null) {
    throw new Error('parseControlStream: input must be a non-null object');
  }
  const obj = json as Record<string, unknown>;
  // ...time fields...
  return {
    id: typeof obj.id === 'string' ? obj.id : '',
    name: typeof obj.name === 'string' ? obj.name : '',
    ...(typeof obj.description === 'string' ? { description: obj.description } : {}),
    ...(validTime !== undefined ? { validTime } : {}),
    formats: Array.isArray(obj.formats)
      ? (obj.formats.filter((f) => typeof f === 'string') as string[]) : [],
    // ...ControlStream-specific fields...
    ...(typeof obj['system@id'] === 'string' ? { systemId: obj['system@id'] as string } : {}),
    links: Array.isArray(obj.links) ? (obj.links as ResourceLink[]) : [],
  } satisfies ControlStream;
}

Impact: The repetition means any bug fix or new shared field (e.g., a new cross-reference field from the spec) must be applied in two places. Issue #103 (cross-reference fields) would require updating both functions identically — a parseBaseStream helper would consolidate this.

Ownership Verification

$ git log upstream/main --oneline -- src/ogc-api/csapi/formats/part2.ts
(empty — zero commits on upstream/main)

$ git blame --line-porcelain src/ogc-api/csapi/formats/part2.ts | Select-String "^author " | Group-Object
Count: 517, Name: "author Sam-Bolling"

Conclusion: This code is ours. The file src/ogc-api/csapi/formats/part2.ts does not exist on upstream/main — the entire file is our contribution, with 100% sole authorship.

Files to Modify

File Action Est. Lines Purpose
src/ogc-api/csapi/formats/part2.ts Modify ~40 Add parseBaseStream helper; update both parsers to call it
src/ogc-api/csapi/formats/part2.spec.ts None 0 All existing tests should pass unchanged — pure internal refactor

Proposed Solutions

Option A: Extract parseBaseStream helper (Recommended)

interface BaseStream {
  id: string;
  name: string;
  description?: string;
  validTime?: TimeInterval;
  formats: string[];
  systemId?: string;
  links: ResourceLink[];
}

function parseBaseStream(
  fn: string,
  json: unknown
): { base: BaseStream; obj: Record<string, unknown> } {
  if (typeof json !== 'object' || json === null)
    throw new Error(`${fn}: input must be a non-null object`);
  const obj = json as Record<string, unknown>;
  return {
    obj,
    base: {
      id: typeof obj.id === 'string' ? obj.id : '',
      name: typeof obj.name === 'string' ? obj.name : '',
      ...(typeof obj.description === 'string'
        ? { description: obj.description }
        : {}),
      validTime: parseValidTime(obj.validTime),
      formats: Array.isArray(obj.formats)
        ? obj.formats.filter((f): f is string => typeof f === 'string')
        : [],
      ...(typeof obj['system@id'] === 'string'
        ? { systemId: obj['system@id'] as string }
        : {}),
      links: Array.isArray(obj.links) ? (obj.links as ResourceLink[]) : [],
    },
  };
}

Both parsers then become:

export function parseDatastream(json: unknown): Datastream {
  const { base, obj } = parseBaseStream('parseDatastream', json);
  // ...only Datastream-specific fields below
  return { ...base, observedProperties, phenomenonTime, resultTime, resultType, live, ... };
}

export function parseControlStream(json: unknown): ControlStream {
  const { base, obj } = parseBaseStream('parseControlStream', json);
  // ...only ControlStream-specific fields below
  return { ...base, controlledProperties, issueTime, executionTime, live, async, ... };
}

Pros: Eliminates ~30 lines of duplicated extraction. Single point of change for shared base-stream fields. Follows the Postel's Law tolerant extraction pattern established in Phase 3. Proven pattern — Phase 5.4 used the same approach with parseComponentEntry (saved 39 net lines).
Cons: Adds one helper function. The validTime extraction moves into the helper, so the resource-specific time fields (phenomenonTime/resultTime vs issueTime/executionTime) must still be parsed in each caller.
Effort: Medium | Risk: Low (pure refactor, no behavioral change)

Option B: Leave as-is with a comment linking the two functions

Add a @see cross-reference comment to each function pointing to the other.

Pros: Zero risk; trivial effort.
Cons: Does NOT reduce code. The duplication remains. Bug fixes still need to be applied twice.
Effort: Trivial | Risk: None

Not recommended. The JSDoc already notes the parallelism — adding more comments doesn't solve the DRY problem.

Scope — What NOT to Touch

  • ❌ Do NOT modify any file outside src/ogc-api/csapi/ — per the upstream maintainer's isolation requirement
  • ❌ Do NOT change the public API signatures of parseDatastream() or parseControlStream() — return types must remain Datastream and ControlStream respectively
  • ❌ Do NOT export parseBaseStream — it is an internal helper only
  • ❌ Do NOT refactor the resource-specific fields into the helper — only the 7 shared base-stream fields belong there
  • ❌ Do NOT combine this with finding 011 (requireObject null-guard extraction) in the same PR — those are independent concerns even though they share the null-guard code

Acceptance Criteria

  • parseBaseStream helper (or equivalent) extracts the 7 common fields (id, name, description, validTime, formats, systemId, links)
  • parseDatastream() calls parseBaseStream and adds only Datastream-specific fields
  • parseControlStream() calls parseBaseStream and adds only ControlStream-specific fields
  • All Part 2 parser tests pass (part2.spec.ts)
  • Existing tests still pass (npm test)
  • No lint errors (npm run lint)
  • All modified files pass npx prettier --check

Dependencies

Blocked by: Nothing (can be done independently)
Blocks: Nothing
Coordinate with:


Operational Constraints

⚠️ MANDATORY: Before starting work on this issue, review docs/governance/AI_OPERATIONAL_CONSTRAINTS.md.

Key constraints:

  • Precedence: OGC specifications → AI Collaboration Agreement → This issue description → Existing code → Conversational context
  • No scope expansion: Extract the shared base-stream fields, nothing more
  • Minimal diffs: Prefer the smallest change that satisfies the acceptance criteria
  • Ask when unclear: If intent is ambiguous, stop and ask for clarification
  • Tolerant extraction: The parseBaseStream helper must follow Postel's Law — accept malformed input gracefully, map missing/null fields to defaults (per Phase 3 Lesson 2)

Upstream Isolation Constraint

Per the upstream maintainer's comment on PR #136:

  • Anything in src/ogc-api/csapi must NOT be included in the root index.ts
  • Anything NOT in src/ogc-api/csapi must NOT import from CSAPI code
  • All files in this issue's "Files to Modify" table are within src/ogc-api/csapi/ — this fix is purely internal to the isolated CSAPI module

Ownership-Specific Constraints

Ownership: Ours

  • Fix on the working branch
  • Include in the next commit to clean-pr if the PR is still open
  • All existing tests must continue to pass — this is a pure internal refactor with no public API changes

References

Original Code Review Finding

# Document What It Provides
0 docs/code-review/010-pending-p2-datastream-controlstream-base-duplication.md The original finding from the senior developer code review. Authoritative source for the problem statement, line-by-line mapping, proposed parseBaseStream helper (Option A), and acceptance criteria. This issue tracks the resolution of that finding.

Directly Affected Code

# Document What It Provides
1 src/ogc-api/csapi/formats/part2.ts lines 114–167 parseDatastream() — one half of the duplicated extraction pattern
2 src/ogc-api/csapi/formats/part2.ts lines 214–253 parseControlStream() — the other half of the duplicated extraction pattern
3 src/ogc-api/csapi/model.tsDatastream interface (~line 532) Type definition — 8 fields shared with ControlStream (including live); 7 of these are proposed for the parseBaseStream helper (live handled as resource-specific in each caller)
4 src/ogc-api/csapi/model.tsControlStream interface (~line 604) Type definition — 8 fields shared with Datastream (including live); 7 of these are proposed for the parseBaseStream helper (live handled as resource-specific in each caller)
5 src/ogc-api/csapi/formats/part2.spec.ts Test suite — must pass unchanged after refactor
6 src/ogc-api/csapi/formats/geojson.tsparseValidTime() Shared helper already used by both parsers for validTime extraction — the parseBaseStream helper would call this

Related Open Issues & Findings

# Document What It Provides
7 Issue #140 — parseControlStreamSchemaResponse() paramsSchema silent data loss Same file, different function. Independent concern but shows part2.ts has multiple pending improvements.
8 Issue #141 — parseCollectionResponse<T> generic cast Upstream boundary that feeds unvalidated items into both parseDatastream and parseControlStream. Independent concern.
9 Issue #144 — SensorML raw JSON spread Same pattern category (parser DRY / raw JSON handling) but different file (sensorml.ts).
10 Finding 011 — requireObject null-guard extraction Referenced in finding 010's acceptance criteria. The null-guard portion of parseBaseStream could be further factored into a shared requireObject utility.
11 Demo App Finding — Issue #103: Cross-reference fields Documents system@id / system@link cross-reference fields shared between both parsers. Adding new cross-ref fields would need to be done twice without parseBaseStream.
12 Demo App Finding — Issue #11: Generic CRUD Methods Identified structural repetition in consumer-side dispatch code across all CSAPI resource types (Part 1 and Part 2) — proposed generic CRUD to reduce consumer boilerplate. Same DRY motivation as this finding but at the consumer abstraction level, not library-internal.
13 Issue #143 — extractCSAPIFeature casts properties to Record without null check Same pattern category (parser safety) — the null-guard pattern in parseBaseStream relates to this class of concern.

Standards

# Document What It Provides
14 OGC API — Connected Systems Part 2 (23-002) Primary spec. §9.2 Table 5 (DataStream) and §10.2 Table 10 (ControlStream) share the baseStream schema — the spec-level justification for the shared helper.
15 CSAPI Part 2 OpenAPI Spec Machine-readable schema confirming the shared baseStream component between DataStream and ControlStream response schemas.
16 OGC API — Connected Systems Part 1 (23-001) Part 1 defines the collection patterns that Part 2 extends. DataStream/ControlStream are Part 2 resources but inherit the general resource discovery and collection metadata patterns from Part 1.
17 CSAPI Part 1 OpenAPI Spec Machine-readable Part 1 endpoint schemas — Part 2 resources follow the same structural patterns for collections and items.
18 GeoJSON (RFC 7946) Part 2 resources are encoded as JSON objects. The parseValidTime() helper used by both parsers lives in geojson.ts. GeoJSON conventions influence the field extraction patterns.
19 ISO 8601 (Date and Time Format) All time fields in both parsers (validTime, phenomenonTime, resultTime, issueTime, executionTime) use ISO 8601 intervals parsed by parseValidTime().

Architecture & Design

# Document What It Provides
20 CSAPI Implementation Guide Format handler design requirements and endpoint coverage — defines the architectural expectations for parser functions.
21 Code Reuse vs Duplication Strategy Governs DRY principle vs isolation — when to extract shared utilities within the CSAPI module. Directly applicable to the parseBaseStream extraction decision.
22 Validation/Extraction Decoupling Design Notes Architectural decision to separate validation from extraction — parseBaseStream follows this pattern (extraction only, tolerant).
23 Component Design Sequence Dependency chains between format handlers, query builder, and integration layer — confirms format handlers are independently refactorable.
24 CSAPI Architecture Decisions Part 1 vs Part 2 architecture decisions, resource type organization — documents DataStream and ControlStream as parallel Part 2 resources with similar URL patterns and format support. Covers shared URL-building helper methods across all 9 resource types.
25 TypeScript Type System Design Interface definition patterns and type hierarchy design — informs the BaseStream interface design for the shared helper.
26 Error Handling Design Analysis Error handling patterns — the null-guard throw new Error() in parseBaseStream must follow established conventions.
27 Design Strategy Research Early-phase design strategy that advocated a URL-building-only approach with no parsing. Superseded by later implementation decisions that added format handlers. Provides historical context on how the parser architecture evolved beyond the original strategy.
28 Architecture Patterns Analysis Upstream patterns for shared utilities — how the existing ogc-client organizes reusable helpers.

Requirements Research

# Document What It Provides
29 CSAPI Part 2 Requirements Comprehensive Part 2 spec analysis covering DataStream and ControlStream resource types — documents the shared field set that justifies parseBaseStream.
30 Data Type and Schema Requirements TypeScript interface definitions including Datastream and ControlStream — the BaseStream interface proposed in the fix needs to align with these established type patterns.
31 Comprehensive Format Requirements GeoJSON, SensorML, and SWE Common format requirements for both resource types — parsing strategy that both parsers implement.
32 Common Format Requirements Format negotiation and media type handling — the formats field extracted by both parsers maps to these format identifiers.
33 OpenSensorHub Server Analysis Primary test server — documents actual DataStream/ControlStream response shapes from live data that both parsers handle.
34 52°North Server Analysis Multi-server compatibility research — 52°North has not implemented ControlStreams and DataStream support is partial ("In Development"). Documents general server differences (pagination, error patterns, conformance) relevant to tolerant parsing design.
35 Sub-Resource Navigation Requirements Nested endpoint patterns — DataStreams and ControlStreams are sub-resources of Systems, documenting the navigation context.
36 Gap Analysis from Previous Iteration Lessons from failed iteration — format parsing completeness requirements that motivated building both parsers comprehensively.
37 Lessons Learned from Previous Iterations Mistakes to avoid from previous attempts — over-engineering vs under-engineering balance relevant to the extraction scope.
38 Upstream Library Expectations Format abstraction alignment with existing upstream parsers — the shared helper pattern should match upstream conventions.
39 OSHConnect-Python Client Analysis Python CSAPI client's approach to DataStream parsing (ControlStreams not implemented in OSHConnect-Python) — Pydantic model patterns and type system design for comparison.
40 OWSLib Analysis Mature Python client library patterns — class-per-resource architecture with shared base extraction as a reference approach.

Testing References

# Document What It Provides
41 Testing Strategy Research Overall testing approach — refactor-safe test design ensuring the parseBaseStream extraction doesn't break behavior-level tests.
42 Live Smoke Test ST#20 (post-5.1) parseDatastream() validated against live OSH data (4 samples, PASS) — establishes the baseline behavior that must be preserved.
43 Live Smoke Test ST#21 (post-5.2) parseControlStream() validated against live OSH data (18 CS surveyed, 2 traced, PASS). Confirmed both parsers share normalizeObservedProperties() reuse.
44 Live Smoke Test ST#22 (post-5.3) Confirmed Part 2 parsers stable and no regressions after Phase 5.3 schema-response additions.
45 Testing Plan 06 — Meaningful vs Trivial Defines meaningful test coverage — confirms Part 2 parser tests are behavior-level (parsed output), not implementation-level (which private helper is called). Refactor-safe by design.
46 Parser Testing vs Spec Validation Notes Parser testing philosophy — distinguishes between testing parser correctness vs spec validation. Relevant for understanding what the existing tests actually verify.
47 Testing Plan 11 — GeoJSON CSAPI Testing Requirements GeoJSON CSAPI extension testing — both parsers handle JSON objects; the parseValidTime() dependency lives in the GeoJSON module.
48 Test Fixtures Guide Fixture management for Part 2 resources — DataStream and ControlStream fixtures used in part2.spec.ts.

Governance

# Document What It Provides
49 AI Operational Constraints Mandatory operational constraints for implementation
50 AI Collaboration Agreement Collaboration framework — precedence rules, scope boundaries
51 Phase 3 Lessons Learned Postel's Law tolerant extraction (Lesson 2), don't couple validation to extraction (Lesson 3) — governs how parseBaseStream must handle missing/malformed fields
52 Phase 2 Lessons Learned Testing strategy refinement and code review process improvements from earlier implementation phases.
53 Known Server Quirks Server-side non-compliant behaviors affecting parsing — the parseBaseStream helper must handle these gracefully (e.g., missing fields, unexpected types).
54 Code Review Prompt Template (Phase 5) Template used for Phase 5 code reviews that originally examined both parsers — provides the quality criteria applied at the time.

Implementation Records

# Document What It Provides
55 Phase 5.1 Code Review Review of parseDatastream() implementation (Task 2a) — documents field extraction patterns, normalizeObservedProperties() helper, cross-reference handling, and parser completion scorecard.
56 Phase 5.2 Code Review F13: noted structural parallelism between both parsers. At the time assessed as correct (spec-driven). Finding 010 re-evaluates this as a DRY concern now that both parsers are stable.
57 Phase 5.3 Code Review Schema-response parser additions. L7 DRY check PASS with discussion of parseComponentEntry duplication as acceptable at the time. Documents parseDatastreamSchemaResponse/parseControlStreamSchemaResponse delegation patterns — same file family.
58 Phase 5.4 Code Review Precedent: F27 — parseComponentEntry DRY extraction saved 39 net lines. Same pattern of shared-helper extraction proposed here. Proves the pattern works and is accepted.
59 Final Project Code Review F-NEW-04: identified 3 duplication groups in Part 2 format handlers. This issue addresses group 1 (validTime/properties extraction). Classified as "DESIGN, minor — deferred to follow-up PR."
60 CSAPI Code Audit Phase 6 Comprehensive code audit that catalogued formats/part2.ts (518 lines), noted D-3 through D-6 duplication findings across format handlers, and documented the time field parsing asymmetry (C-2) between parseDatastream/parseControlStream and parseObservation.
61 P5 Findings Coverage Analysis Phase 5 findings tracking — documents F33 (parseControlStreamSchemaResponse) and other Part 2 parser findings.
62 Cross-Server Interoperability Analysis Multi-server DataStream/ControlStream response variations — the shared helper must handle format differences across OpenSensorHub and 52°North.

Planning

# Document What It Provides
63 Phase 5 Roadmap Task sequencing — parseDatastream (Task 2a), parseControlStream (Task 4). Documents why both were implemented as separate functions.
64 Phase 5 Implementation Guide Parser function signatures, field mapping specs — the definitive reference for what each parser must extract. Confirms the 7 shared fields.
65 Phase 5 Contribution Goal Per-parser acceptance criteria — defines what "complete" means for parseDatastream and parseControlStream.
66 Contribution Goal & Definition Overall contribution scope — format handlers are a core deliverable.

Upstream PR Context

# Document What It Provides
67 Upstream PR #136 Draft PR on camptocamp/ogc-client that includes part2.ts — any refactor must be reflected here.
68 jahow's isolation comment Upstream maintainer's requirement that CSAPI must be fully isolated — confirms all changes are within the allowed boundary.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions