Design: Binary Format and Type Codec Registry #139
Replies: 6 comments 1 reply
-
Review NotesVerified ClaimsAll claims about the current codebase were verified against source. Highlights:
Design StrengthsThe phased approach is sound — isolating the parameter-side breaking change (Phase 1) from the result format change (Phase 2) is the right call. The parameter type widening is source-compatible for existing users since Issues1. Package docstring contains examples that will go stale
2.
|
| Decision | Assessment |
|---|---|
| Result format: Approach A in phases | Agree. Phased approach is the key — Phase 1 is low-risk, Phase 2 can be gated on codec coverage |
Unknown OID fallback: Array[U8] val |
Agree for alpha. Document the bytea ambiguity and future Option C cost |
| Encode on Codec: Option A | Agree. Single interface, extends cleanly to custom codecs |
| Registry on Session: Option B | Agree. Zero-config default is right for common case |
| RowDescription storage | Agree to defer. All-binary shorthand eliminates the need |
Suggested Additions
- Phase 1 steps: Add "Update package docstring in
postgres.pony", "Add error handling for bind-time encode failures intry_run_query", and "Register text codecs for all currently-parsed types in the default registry" - Phase 2 prerequisites: Enumerate the exact OID list for binary-to-String codecs and specify whether output must be text-format-identical
- Codec implementation guidance: "Binary codecs for fixed-width types must verify payload length and error on mismatch" and "Bool binary decoder should treat any nonzero byte as true"
- Specify Phase 2 Bind result format: Confirm it's
Int16(1) Int16(1)(single-format shorthand for all binary)
Open Questions
- Binary-to-String format equivalence: Must Phase 2 decoders for timestamp, numeric, etc. produce exactly the same string as PostgreSQL's text format? This determines whether Phase 2 is purely internal or user-visible.
- Phase 2 OID coverage: The exact list of OIDs needing binary codecs before Phase 2 ships. "numeric, date, timestamp, timestamptz, uuid, jsonb, interval" may not be complete.
Beta Was this translation helpful? Give feedback.
-
Updated Plan (v2)This incorporates all feedback from the review notes. Changes from the original design are tagged [NEW]. Resolved Open QuestionsMust Phase 2 decoders produce text-format-identical strings? For types that decode to Exact OID list for Phase 2 binary codecs? Enumerated in Phase 2 Steps 1–3. Covers text passthrough types (7 OIDs), temporal types (5 OIDs → native types), and String-producing types (4 OIDs). Types not on the list fall back to Codec Implementation GuidanceThese rules apply to all codec implementations across all phases:
Phase 1: Binary Parameters + Codec InfrastructureGoal: Ship typed parameters with binary encoding for numeric/bool/bytea types. Results stay text format — no user-visible regression on the result side. Breaking change to the parameter API. Classification: Step 1: Define
|
| Primitive | OID | Pony Type | Binary Wire Format |
|---|---|---|---|
_BoolBinaryCodec |
16 | Bool |
1 byte: encode 0x01/0x00. Decode: any nonzero = true |
_ByteaBinaryCodec |
17 | Array[U8] val |
Raw bytes (0-length valid) |
_Int2BinaryCodec |
21 | I16 |
2 bytes big-endian signed |
_Int4BinaryCodec |
23 | I32 |
4 bytes big-endian signed |
_Int8BinaryCodec |
20 | I64 |
8 bytes big-endian signed |
_Float4BinaryCodec |
700 | F32 |
4 bytes IEEE 754 big-endian |
_Float8BinaryCodec |
701 | F64 |
8 bytes IEEE 754 big-endian |
Length validation in decode():
_BoolBinaryCodec: error ifdata.size() != 1_Int2BinaryCodec: error ifdata.size() != 2_Int4BinaryCodec/_Float4BinaryCodec: error ifdata.size() != 4_Int8BinaryCodec/_Float8BinaryCodec: error ifdata.size() != 8_ByteaBinaryCodec: any length valid (including 0)
Step 3: Implement built-in text codecs
New file: postgres/_text_codecs.pony
| Primitive | OID | Decoding (mirrors current _field_to_type logic) |
|---|---|---|
_BoolTextCodec |
16 | String.from_array(data).at("t") |
_ByteaTextCodec |
17 | Hex decode — extract current _decode_hex_bytea logic |
_Int2TextCodec |
21 | String.from_array(data).i16()? |
_Int4TextCodec |
23 | String.from_array(data).i32()? |
_Int8TextCodec |
20 | String.from_array(data).i64()? |
_Float4TextCodec |
700 | String.from_array(data).f32()? |
_Float8TextCodec |
701 | String.from_array(data).f64()? |
Text codecs receive Array[U8] val (raw wire bytes) even though the content is UTF-8. encode converts the Pony type to its text representation.
Step 4: Implement CodecRegistry
New file: postgres/codec_registry.pony
class val CodecRegistry
"""
Maps PostgreSQL type OIDs to codecs. Immutable — adding a codec
produces a new registry.
"""
new val create()
"""Registry with all built-in text and binary codecs."""
new val _with_codec(base: CodecRegistry, oid: U32, codec: Codec)
"""New registry adding or replacing the codec for a given OID.
Package-private in Phase 1; public in Phase 3."""
fun decode(oid: U32, format: U16, data: Array[U8] val): FieldDataTypes ?
"""Decode result column data. Format 0 = text codec, format 1 = binary codec.
Text fallback: String.from_array(data). Binary fallback: raw Array[U8] val."""
fun has_binary_codec(oid: U32): Bool
"""Whether a binary codec is registered for this OID."""Internal storage: two Map[U32, Codec] val — one for text codecs, one for binary. Default constructor populates both with all built-in codecs from Steps 2–3.
No encode method on the registry in Phase 1 — parameter encoding matches directly on FieldDataTypes. [NEW] encode is deferred to Phase 3 for custom codec support.
Step 5: Change parameter type
PreparedQuery.params and NamedPreparedQuery.params change from Array[(String | None)] val to Array[FieldDataTypes] val. Update class docstrings in prepared_query.pony (line 5: "Values are sent in text format") and named_prepared_query.pony (line 7: same) to describe typed parameters — binary format for typed values, text format with server inference for String.
Step 6: Update _FrontendMessage.bind() — per-parameter format codes and binary encoding
[NEW] bind() becomes partial (?). The internal try ... else _Unreachable() end pattern inside bind() is replaced by ? propagation — encode errors flow to the caller instead of panicking. Update the bind() docstring (currently lines 137–146 of _frontend_message.pony: "All parameters use text format") to reflect per-parameter format codes and binary encoding.
fun bind(portal: String, stmt: String,
params: Array[FieldDataTypes] val): Array[U8] val ?Wire format changes:
- Parameter format codes:
Int16(0)(all text shorthand) →Int16(N)+ N per-parameter format codes - Parameter values: Binary for typed params, text for
String, NULL unchanged - Result format codes: Remain
Int16(0)(all text) in Phase 1
| Pony Type | Format Code | OID | Encoding |
|---|---|---|---|
I16 |
1 (binary) | 21 | 2 bytes big-endian |
I32 |
1 (binary) | 23 | 4 bytes big-endian |
I64 |
1 (binary) | 20 | 8 bytes big-endian |
F32 |
1 (binary) | 700 | 4 bytes IEEE 754 BE |
F64 |
1 (binary) | 701 | 8 bytes IEEE 754 BE |
Bool |
1 (binary) | 16 | 1 byte: 0x01 or 0x00 |
Array[U8] val |
1 (binary) | 17 | Raw bytes |
String |
0 (text) | 0 | UTF-8 bytes, server infers type |
None |
— | 0 | Length -1, no data |
Step 7: Update _FrontendMessage.parse() — send OIDs
No signature change. The existing param_type_oids: Array[U32] val parameter (currently always empty) gets populated from the params. A helper maps each FieldDataTypes variant to its OID (same table as Step 6). String and None get OID 0 (server infers).
Step 8: Add encode error handling in _QueryReady.try_run_query() [NEW]
Currently, try_run_query wraps everything in try ... else _Unreachable() end. With bind() partial, encode errors would hit _Unreachable() — wrong.
The fix: extract bind construction for each extended query path so that encode failures are caught separately from genuinely unreachable errors. On encode failure: deliver DataError to the appropriate receiver, dequeue the failed item, and call try_run_query again for the next queue item.
All 6 dispatch paths that call bind() need this (identified by match branch context, not line numbers, since earlier steps shift line positions):
PreparedQueryunder_QueuedQuerymatchNamedPreparedQueryunder_QueuedQuerymatchPreparedQueryunder_QueuedStreamingQuerymatchNamedPreparedQueryunder_QueuedStreamingQuerymatchPreparedQueryinside_QueuedPipelineloopNamedPreparedQueryinside_QueuedPipelineloop
Pipeline encode errors: If any query in a pipeline fails encoding, fail the entire pipeline — deliver pg_pipeline_failed with DataError for every query, then pg_pipeline_complete, dequeue, and move on. This avoids the complexity of partial pipeline sends. Since encode errors for built-in types should never happen (the type match is exhaustive), this is fine for Phase 1. Phase 3 (custom codecs) can refine with per-query error isolation if needed.
The _Unreachable() in the outer try remains for genuinely unreachable paths (queue empty, wrong item type).
Update DataError docstring in query_error.pony to cover both directions — currently says "data that came back from a query" (inbound only), but it now also covers outbound encode failures.
Step 9: Update all query dispatch paths
All bind() call sites pass Array[FieldDataTypes] val. All parse() call sites pass derived OID arrays. _DataRowMessage.columns stays Array[(String|None)] val] in Phase 1 — no changes to in-flight state accumulation types.
Step 10: Keep _RowsBuilder unchanged in Phase 1
Results still arrive as text. The existing _field_to_type() handles text decoding. Switching to registry-based decoding in Phase 1 would require a wasteful String → Array[U8] val round-trip since _data_row still outputs String. Phase 2 changes _DataRowMessage to raw bytes, at which point _RowsBuilder naturally switches to the registry.
Step 11: Add CodecRegistry to Session and _SessionLoggedIn
Session gets a CodecRegistry val field, defaulting to CodecRegistry.create(). Not user-configurable in Phase 1 (that's Phase 3). Threaded through to _SessionLoggedIn.
In Phase 1, _FrontendMessage.bind() doesn't use the registry — it matches directly on FieldDataTypes variants. The registry is created here so text codecs are registered and ready for Phase 2's result decoding switch.
Step 12: Update package docstring in postgres.pony [NEW]
- Line 105: replace "Parameters are text-format strings or
Nonefor NULL" with typed parameter description - Lines 168–169:
NamedPreparedQuery(name, ["42"])→ update array literal to[as FieldDataTypes: "42"](needed becauseparamstype changes toArray[FieldDataTypes] val) - Line 229:
Array[(String | None)]→Array[FieldDataTypes] - Lines 264–270:
[as (String | None): "1"]→[as FieldDataTypes: I32(1)]
Step 13: Update all examples
Every example creating PreparedQuery or NamedPreparedQuery needs the new param type:
examples/prepared-query/—[as (String | None): "Pony"; "10"; None]→[as FieldDataTypes: "Pony"; I32(10); None]examples/named-prepared-query/examples/crud/examples/streaming/examples/pipeline/
(bytea and cancel use SimpleQuery — no changes needed.)
Update examples/README.md — the prepared-query entry references Array[(String | None)] val which must change to Array[FieldDataTypes] val.
Step 14: Update all tests
Update param types in all test files that construct PreparedQuery / NamedPreparedQuery.
New unit tests:
- Codec encode/decode round-trip for each built-in binary codec
- Binary codec length validation (wrong-length payloads error)
- Bool binary decode: nonzero bytes other than 0x01 →
true - Text codec decode for each built-in text codec
CodecRegistry.decode()with known/unknown OIDs, both formats- Bind wire format with typed params (per-parameter format codes, binary values)
- Parse wire format with OIDs from typed params
- Encode error handling:
DataErrordelivery whenbind()fails
Integration tests:
- Queries with typed params (I32, I64, Bool, F32, F64, String, Array[U8] val, None)
- Mixed typed and String params
- Pipeline with typed params
- Streaming with typed params
Build/run: make ssl=3.0.x for all, make unit-tests ssl=3.0.x for unit only, make integration-tests ssl=3.0.x for integration (requires make start-pg-containers).
Step 15: Update CLAUDE.md
PreparedQuery/NamedPreparedQueryparam type- Codec architecture (interface, registry, built-in codecs)
- Binary encoding strategy
- Encode error handling in Query Execution Flow
- Updated
_FrontendMessage.bind()andparse()notes
Step 16: Release notes
Classification: changed. Label: changelog - changed.
Phase 2: Binary Results
Goal: Switch result format to all-binary for extended query protocol. Common types that currently return readable strings must continue to do so.
Classification: changed.
Prerequisites [NEW]
Phase 2 does not ship until:
- Every OID in the lists below has a binary codec
- String-producing binary codecs (numeric, uuid, oid, jsonb) produce text-format-identical output
- Temporal binary codecs produce correct native type instances
- Integration tests verify all of the above
Step 1: Text passthrough binary codecs
Single _TextPassthroughBinaryCodec primitive registered for all 7 OIDs. PostgreSQL's binary format for these IS raw UTF-8.
| OID | PostgreSQL Type |
|---|---|
| 18 | char |
| 19 | name |
| 25 | text |
| 114 | json |
| 142 | xml |
| 1042 | bpchar |
| 1043 | varchar |
Step 2: Add native temporal types to FieldDataTypes [NEW]
Every major binary-capable PostgreSQL driver (pgx, rust-postgres, asyncpg, Postgrex, psycopg3) decodes temporal types to native language types, not strings. Binary format is always UTC microseconds (timestamptz) or three-component (interval) — ParameterStatus tracking for timezone/interval style is unnecessary for binary decoding.
New val classes:
class val Timestamp
"""PostgreSQL timestamp/timestamptz. Microseconds since 2000-01-01 00:00:00 UTC."""
let microseconds: I64
class val Time
"""PostgreSQL time. Microseconds since midnight."""
let microseconds: I64
class val Interval
"""PostgreSQL interval. Three independent components (months have variable
length, days have variable length due to DST)."""
let microseconds: I64
let days: I32
let months: I32
class val Date
"""PostgreSQL date. Days since 2000-01-01."""
let days: I32Each class provides a string() method producing a human-readable representation. These are informational — they don't need to match PostgreSQL's text format exactly (that would require tracking IntervalStyle and DateStyle server parameters). Formats:
Date.string():"2024-01-15"(ISO 8601). Infinity:"infinity"/"-infinity"Time.string():"14:30:00"or"14:30:00.123456"(fractional only when non-zero)Timestamp.string():"2024-01-15 14:30:00"or with.NNNNNN. Infinity:"infinity"/"-infinity"Interval.string():"1 year 2 mons 3 days 04:05:06"(PostgreSQLpostgresstyle)
FieldDataTypes expands:
type FieldDataTypes is
( Array[U8] val | Bool | Date | F32 | F64 | I16 | I32 | I64
| Interval | None | String | Time | Timestamp )This is a breaking change — all match expressions on FieldDataTypes in user code need updating. Acceptable at alpha (v0.2.x). Document in release notes.
Text codecs for temporal types are also needed — SimpleQuery always returns text format, and those OIDs should decode to native types too (parse the text representation). Parsing details:
datetext:"2024-01-15"or"infinity"/"-infinity". ISO 8601 date format (PostgreSQL defaultDateStyle = 'ISO').timetext:"14:30:00"or"14:30:00.123456". Parse hours:minutes:seconds, optional fractional.timestamptext:"2024-01-15 14:30:00"or with.NNNNNN. Parse date + time, convert to microseconds since epoch.timestamptztext:"2024-01-15 14:30:00+00"or similar timezone suffix. Parse and convert to UTC microseconds. The timezone offset varies with sessionTimeZonesetting — the text codec must handle the offset.intervaltext: depends onIntervalStyleserver parameter. The text codec assumes the defaultpostgresstyle ("1 year 2 mons 3 days 04:05:06"). Other styles (sql_standard,iso_8601,postgres_verbose) are not supported by the text codec — users with non-defaultIntervalStyleshould use extended query protocol (binary) where the format is style-independent.
Step 3: Binary codecs for common types [NEW — exact list]
Codecs that decode to String:
| OID | PostgreSQL Type | Binary Format | Text Output |
|---|---|---|---|
| 26 | oid | 4 bytes U32 BE | "12345" |
| 1700 | numeric | Variable: ndigits (I16) + weight (I16) + sign (I16) + dscale (I16) + base-10000 digits | "123.45" |
| 2950 | uuid | 16 bytes raw | "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" (lowercase, 4-2-2-2-6 grouping) |
| 3802 | jsonb | 1 version byte (0x01) + JSON UTF-8 | Strip version byte, rest is text |
Codecs that decode to native types:
| OID | PostgreSQL Type | Binary Format | Pony Type |
|---|---|---|---|
| 1082 | date | 4 bytes: days since 2000-01-01 (signed I32 BE) | Date |
| 1083 | time | 8 bytes: microseconds since midnight (I64 BE) | Time |
| 1114 | timestamp | 8 bytes: microseconds since 2000-01-01 00:00:00 (I64 BE) | Timestamp |
| 1184 | timestamptz | 8 bytes: microseconds since 2000-01-01 00:00:00 UTC (I64 BE) | Timestamp |
Both timestamp (1114) and timestamptz (1184) decode to Timestamp. The binary format is identical — the server converts timestamptz to UTC before sending. The distinction between "with timezone" and "without" is a server-side storage concern, not a wire format difference.
| OID | PostgreSQL Type | Binary Format | Pony Type |
|---|---|---|---|
| 1186 | interval | 16 bytes: microseconds (I64 BE) + days (I32 BE) + months (I32 BE) | Interval |
Deferred: timetz (OID 1266) — complex timezone offset component. Array types (int2[], int4[], text[], int8[]) — require recursive codec dispatch. Both fall back to Array[U8] val until Phase 3.
Correctness invariants for special values:
- Infinity timestamps:
I64.max_value()→Timestampwithmicroseconds = I64.max_value(). Thestring()method produces"infinity". Same forI64.min_value()→"-infinity" - Infinity dates:
I32.max_value()→Datewithdays = I32.max_value(),string()→"infinity". Same forI32.min_value()→"-infinity" - Numeric NaN: sign field
0xC000→"NaN". Sign0x0000= positive,0x4000= negative - Numeric precision:
1.00stays"1.00"(preserve dscale), not"1" Time.microsecondsis always non-negative (range 0 to 86,400,000,000)
Text-format-identical verification (for codecs that produce String):
uuid: lowercase hex with dashesjsonb: JSON text after stripping the version byte
Step 4: Change _DataRowMessage columns to raw bytes
// Before
let columns: Array[(String|None)] val
// After
let columns: Array[(Array[U8] val | None)] valStep 5: Update _ResponseParser._data_row()
Store raw bytes instead of converting to String. The column_length == 0 case becomes an empty Array[U8] val (not empty string).
Step 6: Update _RowDescriptionMessage to store format code
// Before
let columns: Array[(String, U32)] val // (name, oid)
// After
let columns: Array[(String, U32, U16)] val // (name, oid, format_code)Update _row_description: change reader.skip(8)? to reader.skip(6)? (type_size=2 + type_modifier=4) then let format_code = reader.u16_be()?.
Step 7: Switch Bind result format to all binary
Change Bind message result format from Int16(0) (zero codes = all text shorthand) to Int16(1) Int16(1) (one code applied to all columns: binary). Message length increases by 2 bytes.
SimpleQuery is unaffected — no Bind message, always text. Text codecs handle that path.
Step 8: Update _RowsBuilder for codec-based decoding
fun apply(rows': Array[Array[(Array[U8] val | None)] val] val,
row_descriptions': Array[(String, U32, U16)] val,
registry: CodecRegistry): Rows ?Decoding: None → None. Everything else → registry.decode(oid, format_code, data). Remove _decode_hex_bytea and _hex_digit from _RowsBuilder (now in _ByteaTextCodec).
Step 9: Update bind() parameter encoding for new types
Phase 2 adds Timestamp, Time, Interval, and Date to FieldDataTypes. The bind() function matches on FieldDataTypes variants to determine format code, OID, and encoding. It must handle the new variants:
| Pony Type | Format Code | OID | Encoding |
|---|---|---|---|
Timestamp |
1 (binary) | 1114 (timestamp) | 8 bytes I64 BE |
Time |
1 (binary) | 1083 | 8 bytes I64 BE |
Date |
1 (binary) | 1082 | 4 bytes I32 BE |
Interval |
1 (binary) | 1186 | 16 bytes: I64 BE + I32 BE + I32 BE |
Note: Timestamp uses OID 1114 (timestamp without timezone), not 1184 (timestamptz). The binary encoding is identical — the server handles the timezone semantics based on the column type. Using 1114 avoids implicit timezone conversion surprises.
Step 10: Update all in-flight state accumulation types
All _data_rows fields change from Array[Array[(String|None)] val] to Array[Array[(Array[U8] val | None)] val] in:
_SimpleQueryInFlight_ExtendedQueryInFlight_StreamingQueryInFlight_PipelineInFlight
All _RowsBuilder call sites gain a registry parameter (from _SessionLoggedIn). All _RowDescriptionMessage consumers update for the 3-tuple.
Step 11: Tests
Unit tests: binary decode for all new codecs, length validation, special values (infinity timestamps, NaN numeric, negative intervals), DataRow parsing with binary data, RowDescription format code parsing, _RowsBuilder with binary data + registry. Native type tests: Timestamp, Time, Interval, Date construction, string() output, equality, special values.
Integration tests: For each String-producing codec (oid, numeric, uuid, jsonb), same query via SimpleQuery and PreparedQuery → assert identical String values. For temporal types, verify binary decoding produces correct native type instances (compare field values, not string representations). Unknown OID columns → Array[U8] val. Pipeline and streaming with binary results.
Step 12: Update CLAUDE.md
_DataRowMessage, _RowDescriptionMessage, Type Conversion section, result format strategy, _RowsBuilder documentation. Add Timestamp, Time, Interval, Date to Public API Types. Update FieldDataTypes definition.
Step 13: Release notes
Classification: changed. Label: changelog - changed.
Phase 3: Public Custom Codec API
Goal: Users can register custom codecs for PostgreSQL types not covered by built-ins.
Classification: added.
Step 1: Make CodecRegistry user-configurable
Rename _with_codec → with_codec (public). Add encode method:
fun encode(oid: U32, value: FieldDataTypes): Array[U8] val ?Add optional registry parameter to Session constructor:
new create(
server_connect_info': ServerConnectInfo,
database_connect_info': DatabaseConnectInfo,
notify': SessionStatusNotify,
registry': CodecRegistry = CodecRegistry)Step 2: Custom codec documentation
Package docstring gains a "Custom Codecs" section: how to implement Codec, create a registry with with_codec, pass it to Session. Correctness requirements (length validation, type mismatch errors).
Step 3: Custom codec example
New examples/custom-codec/. Update examples/README.md.
Step 4: Document FieldDataTypes expansion cost [NEW]
If a RawBinaryData wrapper type (to distinguish unknown-OID raw bytes from bytea) is added later, it adds a variant to FieldDataTypes — a breaking change for all match expressions in user code. Document this as a known future cost of using Array[U8] val for unknown OIDs.
Step 5: Tests, CLAUDE.md, release notes
Classification: added. Label: changelog - added.
Decisions
| # | Decision | Choice | Confidence | Reasoning |
|---|---|---|---|---|
| 1 | Result format | All binary (Approach A) in phases | High | Industry norm. Phase 1 = text results (no regression), Phase 2 gated on codec coverage |
| 2 | Unknown OID fallback | Array[U8] val |
High | Simple for alpha. Bytea ambiguity documented. Wrapper type deferred |
| 3 | Encode location | On Codec interface | High | Natural home for both operations. Extends to custom codecs in Phase 3 |
| 4 | Registry location | Session with global default | High | Zero-config common case |
| 5 | Text codecs | Explicit registry entries from Phase 1 | High | SimpleQuery needs them. Avoids two parallel decode systems |
| 6 | Encode errors | DataError path from Phase 1 |
High | Structurally needed even though unreachable for built-in types. Note: DataError carries no context about which parameter/codec failed — adequate for alpha, but Phase 3 may need a richer error type |
| 7 | Phase 2 String format | Text-format-identical | High | Prevents behavioral regression. Enforced by integration tests |
| 8 | Phase 2 Bind format | Int16(1) Int16(1) shorthand |
High | No extra roundtrip or metadata caching |
| 9 | Array types | Deferred to Phase 3 | Medium | Recursive codec dispatch too complex for Phase 2 |
| 10 | Temporal types | Native Pony types, not String | High | Every major binary-capable driver (pgx, rust-postgres, asyncpg, Postgrex, psycopg3) decodes these to native language types, not strings. Binary format is always UTC microseconds (timestamptz) or three-component (interval) — ParameterStatus tracking is unnecessary. Producing String would create an API we'd break when adding native types. Add Timestamp, Time, Interval, and Date val classes to FieldDataTypes in Phase 2 |
Open Questions
None — all prior open questions have been resolved. See Decision 10.
Beta Was this translation helpful? Give feedback.
-
Review of Updated Plan (v2)All factual claims verified against source. Line references, wire protocol details, and current code behavior descriptions are accurate. Phase 2 Gaps1.
2.
3. State transition ordering in Currently, 4. Example updates for 9 examples use exhaustive match on 5. Package docstring update for temporal types.
6. Existing test updates for Existing tests with exhaustive match on 7. Phase 2 Step 9 describes 8. Test builder updates for binary results.
9. Unit tests for temporal text codec parsing. Text parsing for Phase 1 Observations (Non-blocking)
|
Beta Was this translation helpful? Give feedback.
-
Updated Plan (v3)This incorporates all feedback from the v2 review. Changes from v1 are tagged [NEW]; changes from v2 are tagged [v3]. Each phase is a separate PR. Phase 2 grows from 13 steps (v2) to 17 steps (v3), with new steps for Resolved Open QuestionsMust Phase 2 decoders produce text-format-identical strings? For types that decode to Exact OID list for Phase 2 binary codecs? Enumerated in Phase 2 Steps 1, 3, and 4. Covers text passthrough types (7 OIDs), temporal types (5 OIDs to native types), and String-producing types (4 OIDs). Types not on the list fall back to Codec Implementation GuidanceThese rules apply to all codec implementations across all phases:
Phase 1: Binary Parameters + Codec InfrastructureGoal: Ship typed parameters with binary encoding for numeric/bool/bytea types. Results stay text format — no user-visible regression on the result side. Breaking change to the parameter API. Classification: Step 1: Define
|
| Primitive | OID | Pony Type | Binary Wire Format |
|---|---|---|---|
_BoolBinaryCodec |
16 | Bool |
1 byte: encode 0x01/0x00. Decode: any nonzero = true |
_ByteaBinaryCodec |
17 | Array[U8] val |
Raw bytes (0-length valid) |
_Int2BinaryCodec |
21 | I16 |
2 bytes big-endian signed |
_Int4BinaryCodec |
23 | I32 |
4 bytes big-endian signed |
_Int8BinaryCodec |
20 | I64 |
8 bytes big-endian signed |
_Float4BinaryCodec |
700 | F32 |
4 bytes IEEE 754 big-endian |
_Float8BinaryCodec |
701 | F64 |
8 bytes IEEE 754 big-endian |
Length validation in decode():
_BoolBinaryCodec: error ifdata.size() != 1_Int2BinaryCodec: error ifdata.size() != 2_Int4BinaryCodec/_Float4BinaryCodec: error ifdata.size() != 4_Int8BinaryCodec/_Float8BinaryCodec: error ifdata.size() != 8_ByteaBinaryCodec: any length valid (including 0)
Step 3: Implement built-in text codecs
New file: postgres/_text_codecs.pony
| Primitive | OID | Decoding (mirrors current _field_to_type logic) |
|---|---|---|
_BoolTextCodec |
16 | String.from_array(data).at("t") |
_ByteaTextCodec |
17 | Hex decode — extract current _decode_hex_bytea logic |
_Int2TextCodec |
21 | String.from_array(data).i16()? |
_Int4TextCodec |
23 | String.from_array(data).i32()? |
_Int8TextCodec |
20 | String.from_array(data).i64()? |
_Float4TextCodec |
700 | String.from_array(data).f32()? |
_Float8TextCodec |
701 | String.from_array(data).f64()? |
Text codecs receive Array[U8] val (raw wire bytes) even though the content is UTF-8. encode converts the Pony type to its text representation.
Step 4: Implement CodecRegistry
New file: postgres/codec_registry.pony
class val CodecRegistry
"""
Maps PostgreSQL type OIDs to codecs. Immutable — adding a codec
produces a new registry.
"""
new val create()
"""Registry with all built-in text and binary codecs."""
new val _with_codec(base: CodecRegistry, oid: U32, codec: Codec)
"""New registry adding or replacing the codec for a given OID.
Package-private in Phase 1; public in Phase 3."""
fun decode(oid: U32, format: U16, data: Array[U8] val): FieldDataTypes ?
"""Decode result column data. Format 0 = text codec, format 1 = binary codec.
Text fallback: String.from_array(data). Binary fallback: raw Array[U8] val."""
fun has_binary_codec(oid: U32): Bool
"""Whether a binary codec is registered for this OID."""Internal storage: two Map[U32, Codec] val — one for text codecs, one for binary. Default constructor populates both with all built-in codecs from Steps 2-3.
No encode method on the registry in Phase 1 — parameter encoding matches directly on FieldDataTypes. [NEW] encode is deferred to Phase 3 for custom codec support.
Step 5: Change parameter type
PreparedQuery.params and NamedPreparedQuery.params change from Array[(String | None)] val to Array[FieldDataTypes] val. Update class docstrings in prepared_query.pony (line 5: "Values are sent in text format") and named_prepared_query.pony (line 7: same) to describe typed parameters — binary format for typed values, text format with server inference for String.
Step 6: Update _FrontendMessage.bind() — per-parameter format codes and binary encoding
[NEW] bind() becomes partial (?). The internal try ... else _Unreachable() end pattern inside bind() is replaced by ? propagation — encode errors flow to the caller instead of panicking. Update the bind() docstring (currently lines 137-146 of _frontend_message.pony: "All parameters use text format") to reflect per-parameter format codes and binary encoding.
fun bind(portal: String, stmt: String,
params: Array[FieldDataTypes] val): Array[U8] val ?Wire format changes:
- Parameter format codes:
Int16(0)(all text shorthand) toInt16(N)+ N per-parameter format codes - Parameter values: Binary for typed params, text for
String, NULL unchanged - Result format codes: Remain
Int16(0)(all text) in Phase 1
| Pony Type | Format Code | OID | Encoding |
|---|---|---|---|
I16 |
1 (binary) | 21 | 2 bytes big-endian |
I32 |
1 (binary) | 23 | 4 bytes big-endian |
I64 |
1 (binary) | 20 | 8 bytes big-endian |
F32 |
1 (binary) | 700 | 4 bytes IEEE 754 BE |
F64 |
1 (binary) | 701 | 8 bytes IEEE 754 BE |
Bool |
1 (binary) | 16 | 1 byte: 0x01 or 0x00 |
Array[U8] val |
1 (binary) | 17 | Raw bytes |
String |
0 (text) | 0 | UTF-8 bytes, server infers type |
None |
n/a | 0 | Length -1, no data |
Step 7: Update _FrontendMessage.parse() — send OIDs
No signature change. The existing param_type_oids: Array[U32] val parameter (currently always empty) gets populated from the params. A helper maps each FieldDataTypes variant to its OID (same table as Step 6). String and None get OID 0 (server infers).
Step 8: Add encode error handling in _QueryReady.try_run_query() [NEW] [v3]
Currently, try_run_query wraps everything in try ... else _Unreachable() end. With bind() partial, encode errors would hit _Unreachable() — wrong.
The fix uses the build-before-transition pattern (Codec Implementation Guidance rule 5): construct all wire-format messages (including the partial bind() call) BEFORE setting li.query_state to an in-flight state, so an encode error leaves the state machine in _QueryReady. [v3] On encode failure: deliver DataError to the appropriate receiver, dequeue the failed item, and call try_run_query again for the next queue item.
Concrete restructuring for each of the 6 dispatch paths that call bind():
-
PreparedQueryunder_QueuedQuery: Build parse, bind, describe, execute, sync intocombinedinside atryblock. Only after success: setli.query_state = _ExtendedQueryInFlight.create()and send. Onbind()error: deliverpg_query_failed(s, qry.query, DataError), shift the queue, and recurse. -
NamedPreparedQueryunder_QueuedQuery: Same pattern — build bind+describe+execute+sync intry, then set state and send on success. -
PreparedQueryunder_QueuedStreamingQuery: Build parse+bind+describe+execute+flush intry, then setli.query_state = _StreamingQueryInFlight.create()and send on success. On error: deliverpg_stream_failedwithDataError. -
NamedPreparedQueryunder_QueuedStreamingQuery: Same pattern with bind+describe+execute+flush. -
_QueuedPipelineloop: The pipeline path requires special handling becausebind()is called inside arecover valblock. Insiderecover val, mutable state from the enclosing scope is inaccessible, so an error frombind()cannot communicate which query failed — the error just propagates out of the block as an undifferentiated?.Pre-validation approach: Before entering the
recover valblock, iterate the queries array and pre-validate all parameters by encoding each param (using the same encode logic thatbind()uses internally — a shared helper or direct match onFieldDataTypes). If any encode fails, reportpg_pipeline_failedwithDataErrorfor that specific query index (and all subsequent indices, since no partial pipeline is sent), thenpg_pipeline_complete, shift the queue, and recurse. If all params validate, proceed to therecover valblock to build the wire message —bind()calls inside the block are now guaranteed to succeed (same params, same encode logic), so any error there is genuinely unreachable.This gives per-query error reporting for encode failures while keeping the
recover valblock clean. The_Unreachable()in theelseof therecover valblock is now justified — pre-validation ensuresbind()won't fail inside it.Only after successful construction: set
li.query_state = _PipelineInFlight.create()and send.
The _Unreachable() in the outer try remains for genuinely unreachable paths (queue access, non-partial message construction).
Update DataError docstring in query_error.pony to cover both directions — currently says "data that came back from a query" (inbound only), but it now also covers outbound encode failures.
Pipeline encode errors summary: If any query in a pipeline fails encoding, fail that query and all subsequent queries (since no partial pipeline is sent) — deliver pg_pipeline_failed with DataError for each, then pg_pipeline_complete, dequeue, and move on. Pre-validation before the recover val block enables per-query error attribution. Since encode errors for built-in types should never happen (the type match is exhaustive), this is primarily infrastructure for Phase 3 (custom codecs) where encode errors become realistic.
Step 9: Update all query dispatch paths
All bind() call sites pass Array[FieldDataTypes] val. All parse() call sites pass derived OID arrays. _DataRowMessage.columns stays Array[(String|None)] val in Phase 1 — no changes to in-flight state accumulation types.
Step 10: Keep _RowsBuilder unchanged in Phase 1
Results still arrive as text. The existing _field_to_type() handles text decoding. Switching to registry-based decoding in Phase 1 would require a wasteful String to Array[U8] val round-trip since _data_row still outputs String. Phase 2 changes _DataRowMessage to raw bytes, at which point _RowsBuilder naturally switches to the registry.
Step 11: Add CodecRegistry to Session and _SessionLoggedIn
Session gets a CodecRegistry val field, defaulting to CodecRegistry.create(). Not user-configurable in Phase 1 (that's Phase 3). Threaded through to _SessionLoggedIn.
In Phase 1, _FrontendMessage.bind() doesn't use the registry — it matches directly on FieldDataTypes variants. The registry is created here so text codecs are registered and ready for Phase 2's result decoding switch.
Step 12: Update package docstring in postgres.pony [NEW]
- Line 105: replace "Parameters are text-format strings or
Nonefor NULL" with typed parameter description - Lines 168-169:
NamedPreparedQuery(name, ["42"])to update array literal to[as FieldDataTypes: "42"](needed becauseparamstype changes toArray[FieldDataTypes] val) - Line 229:
Array[(String | None)]toArray[FieldDataTypes] - Lines 264-270:
[as (String | None): "1"]to[as FieldDataTypes: I32(1)]
Step 13: Update all examples
Every example creating PreparedQuery or NamedPreparedQuery needs the new param type:
examples/prepared-query/—[as (String | None): "Pony"; "10"; None]to[as FieldDataTypes: "Pony"; I32(10); None]examples/named-prepared-query/examples/crud/examples/streaming/examples/pipeline/
(bytea and cancel use SimpleQuery — no changes needed.)
Update examples/README.md — the prepared-query entry references Array[(String | None)] val which must change to Array[FieldDataTypes] val.
Step 14: Update all tests
Update param types in all test files that construct PreparedQuery / NamedPreparedQuery.
New unit tests:
- Codec encode/decode round-trip for each built-in binary codec
- Binary codec length validation (wrong-length payloads error)
- Bool binary decode: nonzero bytes other than 0x01 produce
true - Text codec decode for each built-in text codec
- Text codec encode for each built-in text codec (encode methods are written in this phase, so they should be tested here rather than deferred to Phase 3) [v3]
CodecRegistry.decode()with known/unknown OIDs, both formats- Bind wire format with typed params (per-parameter format codes, binary values)
- Parse wire format with OIDs from typed params
- Encode error handling:
DataErrordelivery whenbind()fails - Pipeline pre-validation:
DataErrordelivery with correct per-query index attribution
Integration tests:
- Queries with typed params (I32, I64, Bool, F32, F64, String, Array[U8] val, None)
- Mixed typed and String params
- Pipeline with typed params
- Streaming with typed params
Build/run: make ssl=3.0.x for all, make unit-tests ssl=3.0.x for unit only, make integration-tests ssl=3.0.x for integration (requires make start-pg-containers).
Step 15: Update CLAUDE.md
PreparedQuery/NamedPreparedQueryparam type- Codec architecture (interface, registry, built-in codecs)
- Binary encoding strategy
- Encode error handling in Query Execution Flow
- Updated
_FrontendMessage.bind()andparse()notes
Step 16: Release notes
Classification: changed. Label: changelog - changed.
Phase 2: Binary Results
Goal: Switch result format to all-binary for extended query protocol. Common types that currently return readable strings must continue to do so.
Classification: changed.
Prerequisites [NEW]
Phase 2 does not ship until:
- Every OID in the lists below has a binary codec
- String-producing binary codecs (numeric, uuid, oid, jsonb) produce text-format-identical output
- Temporal binary codecs produce correct native type instances
- Integration tests verify all of the above
Step 1: Text passthrough binary codecs
Single _TextPassthroughBinaryCodec primitive registered for all 7 OIDs. PostgreSQL's binary format for these IS raw UTF-8.
| OID | PostgreSQL Type |
|---|---|
| 18 | char |
| 19 | name |
| 25 | text |
| 114 | json |
| 142 | xml |
| 1042 | bpchar |
| 1043 | varchar |
Step 2: Add native temporal types to FieldDataTypes [NEW]
Every major binary-capable PostgreSQL driver (pgx, rust-postgres, asyncpg, Postgrex, psycopg3) decodes temporal types to native language types, not strings. Binary format is always UTC microseconds (timestamptz) or three-component (interval) — ParameterStatus tracking for timezone/interval style is unnecessary for binary decoding.
New val classes:
class val Timestamp
"""PostgreSQL timestamp/timestamptz. Microseconds since 2000-01-01 00:00:00 UTC."""
let microseconds: I64
class val Time
"""PostgreSQL time. Microseconds since midnight."""
let microseconds: I64
class val Interval
"""PostgreSQL interval. Three independent components (months have variable
length, days have variable length due to DST)."""
let microseconds: I64
let days: I32
let months: I32
class val Date
"""PostgreSQL date. Days since 2000-01-01."""
let days: I32Each class provides a string() method producing a human-readable representation. These are informational — they don't need to match PostgreSQL's text format exactly (that would require tracking IntervalStyle and DateStyle server parameters). Formats:
Date.string():"2024-01-15"(ISO 8601). Infinity:"infinity"/"-infinity"Time.string():"14:30:00"or"14:30:00.123456"(fractional only when non-zero)Timestamp.string():"2024-01-15 14:30:00"or with.NNNNNN. Infinity:"infinity"/"-infinity"Interval.string():"1 year 2 mons 3 days 04:05:06"(PostgreSQLpostgresstyle)
All four implement Equatable[T] so they work in direct comparisons and Field.eq(). [v3]
FieldDataTypes expands:
type FieldDataTypes is
( Array[U8] val | Bool | Date | F32 | F64 | I16 | I32 | I64
| Interval | None | String | Time | Timestamp )This is a breaking change — all exhaustive match expressions on FieldDataTypes in user code need updating. Acceptable at alpha (v0.2.x). Document in release notes.
Text codecs for temporal types are also needed — SimpleQuery always returns text format, and those OIDs should decode to native types too (parse the text representation). Parsing details:
datetext:"2024-01-15"or"infinity"/"-infinity". ISO 8601 date format (PostgreSQL defaultDateStyle = 'ISO').timetext:"14:30:00"or"14:30:00.123456". Parse hours:minutes:seconds, optional fractional.timestamptext:"2024-01-15 14:30:00"or with.NNNNNN. Parse date + time, convert to microseconds since epoch.timestamptztext:"2024-01-15 14:30:00+00"or similar timezone suffix. Parse and convert to UTC microseconds. The timezone offset varies with sessionTimeZonesetting — the text codec must handle the offset.intervaltext: depends onIntervalStyleserver parameter. The text codec assumes the defaultpostgresstyle ("1 year 2 mons 3 days 04:05:06"). Other styles (sql_standard,iso_8601,postgres_verbose) are not supported by the text codec — users with non-defaultIntervalStyleshould use extended query protocol (binary) where the format is style-independent.
Step 3: Update Field.eq() with temporal match arms [v3]
field.pony lines 21-44: Field.eq() explicitly matches on (value, that.value) pairs for all 9 current FieldDataTypes variants. The else false catch-all means new types silently compare as unequal — a correctness bug, not a compile error. Add four new match arms:
| (let a: Timestamp, let b: Timestamp) => a == b
| (let a: Time, let b: Time) => a == b
| (let a: Date, let b: Date) => a == b
| (let a: Interval, let b: Interval) => a == bThese delegate to each type's Equatable.eq().
Step 4: Binary codecs for common types [NEW — exact list]
Codecs that decode to String:
| OID | PostgreSQL Type | Binary Format | Text Output |
|---|---|---|---|
| 26 | oid | 4 bytes U32 BE | "12345" |
| 1700 | numeric | Variable: ndigits (I16) + weight (I16) + sign (I16) + dscale (I16) + base-10000 digits | "123.45" |
| 2950 | uuid | 16 bytes raw | "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" (lowercase, 4-2-2-2-6 grouping) |
| 3802 | jsonb | 1 version byte (0x01) + JSON UTF-8 | Strip version byte, rest is text |
Codecs that decode to native types:
| OID | PostgreSQL Type | Binary Format | Pony Type |
|---|---|---|---|
| 1082 | date | 4 bytes: days since 2000-01-01 (signed I32 BE) | Date |
| 1083 | time | 8 bytes: microseconds since midnight (I64 BE) | Time |
| 1114 | timestamp | 8 bytes: microseconds since 2000-01-01 00:00:00 (I64 BE) | Timestamp |
| 1184 | timestamptz | 8 bytes: microseconds since 2000-01-01 00:00:00 UTC (I64 BE) | Timestamp |
Both timestamp (1114) and timestamptz (1184) decode to Timestamp. The binary format is identical — the server converts timestamptz to UTC before sending. The distinction between "with timezone" and "without" is a server-side storage concern, not a wire format difference.
| OID | PostgreSQL Type | Binary Format | Pony Type |
|---|---|---|---|
| 1186 | interval | 16 bytes: microseconds (I64 BE) + days (I32 BE) + months (I32 BE) | Interval |
Deferred: timetz (OID 1266) — complex timezone offset component. Array types (int2[], int4[], text[], int8[]) — require recursive codec dispatch. Both fall back to Array[U8] val until Phase 3.
Correctness invariants for special values:
- Infinity timestamps:
I64.max_value()producesTimestampwithmicroseconds = I64.max_value(). Thestring()method produces"infinity". Same forI64.min_value()producing"-infinity" - Infinity dates:
I32.max_value()producesDatewithdays = I32.max_value(),string()produces"infinity". Same forI32.min_value()producing"-infinity" - Numeric NaN: sign field
0xC000produces"NaN". Sign0x0000= positive,0x4000= negative - Numeric precision:
1.00stays"1.00"(preserve dscale), not"1" Time.microsecondsis always non-negative (range 0 to 86,400,000,000)
Text-format-identical verification (for codecs that produce String):
uuid: lowercase hex with dashesjsonb: JSON text after stripping the version byte
Step 5: Change _DataRowMessage columns to raw bytes
// Before
let columns: Array[(String|None)] val
// After
let columns: Array[(Array[U8] val | None)] valStep 6: Update _ResponseParser._data_row()
Store raw bytes instead of converting to String. The column_length == 0 case becomes an empty Array[U8] val (not empty string).
Step 7: Update _RowDescriptionMessage to store format code
// Before
let columns: Array[(String, U32)] val // (name, oid)
// After
let columns: Array[(String, U32, U16)] val // (name, oid, format_code)Update _row_description: change reader.skip(8)? to reader.skip(6)? (type_size=2 + type_modifier=4) then let format_code = reader.u16_be()?.
Step 8: Switch Bind result format to all binary
Change Bind message result format from Int16(0) (zero codes = all text shorthand) to Int16(1) Int16(1) (one code applied to all columns: binary). Message length increases by 2 bytes.
SimpleQuery is unaffected — no Bind message, always text. Text codecs handle that path.
Step 9: Update _RowsBuilder for codec-based decoding
fun apply(rows': Array[Array[(Array[U8] val | None)] val] val,
row_descriptions': Array[(String, U32, U16)] val,
registry: CodecRegistry): Rows ?Decoding: None produces None. Everything else uses registry.decode(oid, format_code, data). Remove _decode_hex_bytea and _hex_digit from _RowsBuilder (now in _ByteaTextCodec).
Step 10: Update bind() and parse() parameter encoding for new types [v3: expanded to cover both bind and parse]
Phase 2 adds Timestamp, Time, Interval, and Date to FieldDataTypes. Both bind() and the parse() OID-mapping helper must handle the new variants.
bind() encoding:
| Pony Type | Format Code | OID | Encoding |
|---|---|---|---|
Timestamp |
1 (binary) | 1114 (timestamp) | 8 bytes I64 BE |
Time |
1 (binary) | 1083 | 8 bytes I64 BE |
Date |
1 (binary) | 1082 | 4 bytes I32 BE |
Interval |
1 (binary) | 1186 | 16 bytes: I64 BE + I32 BE + I32 BE |
Note: Timestamp uses OID 1114 (timestamp without timezone), not 1184 (timestamptz). The binary encoding is identical — the server handles the timezone semantics based on the column type. Using 1114 avoids implicit timezone conversion surprises.
parse() OID mapping: The helper that maps FieldDataTypes variants to OIDs must also include the temporal types: Timestamp to 1114, Time to 1083, Date to 1082, Interval to 1186. Without this, temporal parameters would be sent with OID 0 (server infers), which works but forgoes type safety. [v3]
Step 11: Update all in-flight state accumulation types
In _SimpleQueryInFlight, _ExtendedQueryInFlight, _StreamingQueryInFlight, and _PipelineInFlight:
_data_rowschanges fromArray[Array[(String|None)] val] isotoArray[Array[(Array[U8] val | None)] val] iso_row_descriptionchanges from(Array[(String, U32)] val | None)to(Array[(String, U32, U16)] val | None)to carry the format code from Step 7's_RowDescriptionMessagechange [v3]
All _RowsBuilder call sites (5 total in session.pony) gain a registry parameter (from _SessionLoggedIn). All _RowDescriptionMessage consumers update for the 3-tuple.
Step 12: Update all examples for FieldDataTypes expansion [v3]
10 examples match on field.value; 9 use \exhaustive\ and will fail to compile when temporal types are added. All 10 need temporal type match arms. At minimum, call string() on the temporal type:
examples/query/query-example.ponyexamples/prepared-query/prepared-query-example.ponyexamples/named-prepared-query/named-prepared-query-example.ponyexamples/ssl-query/ssl-query-example.ponyexamples/ssl-preferred-query/ssl-preferred-query-example.ponyexamples/crud/crud-example.ponyexamples/copy-in/copy-in-example.ponyexamples/streaming/streaming-example.ponyexamples/pipeline/pipeline-example.pony(non-exhaustive match, but should still be updated for completeness)examples/bytea/bytea-example.pony
Update examples/README.md entries for any examples whose descriptions reference FieldDataTypes variants.
Step 13: Update package docstring for temporal types [v3]
postgres.pony lines 130-157 need updating:
- The example
match field.valueblock (lines 138-147) needs arms for temporal types (at minimum:| let t: Timestamp => _env.out.print(field.name + ": " + t.string()), and similarly forDate,Time,Interval) - The
// Also: I16, I64, F32, F64comment (line 145) needs expanding to include temporal types - The type mapping summary (lines 154-157: "bytea to
Array[U8] val, bool toBool, ... everything else toString") needs updating: addtimestamp/timestamptz to Timestamp,date to Date,time to Time,interval to Interval, and clarify that unknown OIDs produceArray[U8] val(binary) orString(text)
Step 14: Update test builders for binary results [v3]
_IncomingRowDescriptionTestMessage (in _test_response_parser.pony): currently takes Array[(String, String)] val (column name, type name) and hardcodes format code 0 (text) at the end of each column's field block (wb.u16_be(0) at line 1021). Two changes needed:
- Format code parameter: change the column tuple to
(String, String, U16)(name, type-name, format-code) so tests can construct RowDescription messages with binary format codes. Alternatively, change the tuple to(String, U32, U16)(name, raw-OID, format-code) to avoid maintaining a growing string-to-OID map — this is cleaner since it eliminates the type-name lookup entirely. [v3] - OID map expansion (if keeping the string-based API): the type name lookup table (lines 997-1008) currently only covers 8 types (text, bytea, bool, int2, int4, int8, float4, float8). Phase 2 tests need entries for: oid, numeric, uuid, jsonb, timestamp, timestamptz, date, time, interval. Without these, temporal and Phase 2 type tests cannot construct valid RowDescription messages. [v3]
_IncomingDataRowTestMessage: currently takes Array[(String | None)] val and writes column values as text strings. Add a new _IncomingBinaryDataRowTestMessage class that takes Array[(Array[U8] val | None)] val and writes raw binary column data. Keeping the existing builder avoids churning all existing text-format tests.
These test builder updates are shared infrastructure used across 9 test files (_test_session.pony, _test_pipeline.pony, _test_streaming.pony, _test_copy_in.pony, _test_copy_out.pony, _test_notice.pony, _test_notification.pony, _test_parameter_status.pony, _test_response_parser.pony). The existing tests continue using format code 0; only new binary result tests use format code 1.
Step 15: Tests [v3: expanded with items from review findings 2, 6, 8, 9]
Existing test updates for FieldDataTypes expansion: [v3]
These existing tests must be updated alongside the new tests — they are not "nice to have" but will fail to compile or silently lose coverage:
_test_equality.pony:_TestFieldEqualityReflexive(lines 14-27) and_TestFieldEqualityStructural(lines 36-47) enumerate all 9 variants — add all 4 temporal types._TestFieldInequality(lines 82-116) — add cross-type inequality cases for temporal types (e.g.,TimestampvsDate,TimevsI64)._test_equality.ponygenerators:_FieldDataTypesGen(lines 204-227) — add 4 newfrequencyentries, one per temporal type, each producing values with randomized fields._RowGen._random_field_value(lines 248-267) — expandmatch rnd.usize(0, 8)tomatch rnd.usize(0, 12)and add 4 arms for temporal types._RowsGen._random_field_value(lines 287-306) — same expansion (duplicated because Pony traits can't have iso fields).
New unit tests:
Binary codecs:
- Binary decode for all new codecs (text passthrough, oid, numeric, uuid, jsonb, temporal)
- Length validation for fixed-width binary codecs (date=4, time=8, timestamp=8, interval=16)
- Special values: infinity timestamps/dates, NaN numeric, negative intervals, zero-length bytea
Temporal text codecs: [v3]
timestamptztext codec: timezone offsets (+00,+05:30,-07,+00:00), fractional seconds with varying precision (1-6 digits), edge cases (midnight, year boundaries, epoch 2000-01-01)timestamptext codec: no timezone suffix, fractional secondsintervaltext codec: fullpostgres-style format ("1 year 2 mons 3 days 04:05:06"), partial components ("1 day","04:05:06","1 year"), negative components ("-1 days +02:00:00"), zero intervals ("00:00:00")datetext codec: standard dates, infinity values ("infinity","-infinity"), epoch boundarytimetext codec: fractional seconds, midnight, maximum precision
Result parsing:
- DataRow parsing with binary data (using
_IncomingBinaryDataRowTestMessage) - RowDescription format code parsing (using
_IncomingRowDescriptionTestMessagewith format code 1) _RowsBuilderwith binary data + registry_RowsBuilderwith text data + registry (SimpleQuery path unchanged)
Native type and equality:
Timestamp,Time,Interval,Dateconstruction,string()output, equality (Equatable), special values (infinity, negative intervals)Field.eq()with temporal types — reflexive, structural, symmetric, inequality (cross-type comparisons with non-temporal types) [v3]- Property-based equality tests exercise temporal variants via updated generators [v3]
Integration tests:
- For each String-producing codec (oid, numeric, uuid, jsonb), same query via SimpleQuery and PreparedQuery to assert identical String values
- For temporal types, verify binary decoding produces correct native type instances (compare field values, not string representations)
- SimpleQuery returning temporal types to verify text codecs produce correct native types
- Unknown OID columns produce
Array[U8] val - Pipeline and streaming with binary results
Build/run: make ssl=3.0.x for all, make unit-tests ssl=3.0.x for unit only, make integration-tests ssl=3.0.x for integration (requires make start-pg-containers).
Step 16: Update CLAUDE.md
_DataRowMessage,_RowDescriptionMessagetypes- Type Conversion section — replace OID table with codec-based description, add temporal types
- Result format strategy (text for SimpleQuery, binary for extended query)
_RowsBuilderreceives registry- Add
Timestamp,Time,Interval,Dateto Public API Types - Update
FieldDataTypesdefinition (13 variants, not 9) - Note
Field.eq()covers all 13 variants
Step 17: Release notes
Classification: changed. Label: changelog - changed.
Phase 3: Public Custom Codec API
Goal: Users can register custom codecs for PostgreSQL types not covered by built-ins.
Classification: added.
Step 1: Make CodecRegistry user-configurable
Rename _with_codec to with_codec (public). Add encode method:
fun encode(oid: U32, value: FieldDataTypes): Array[U8] val ?Add optional registry parameter to Session constructor:
new create(
server_connect_info': ServerConnectInfo,
database_connect_info': DatabaseConnectInfo,
notify': SessionStatusNotify,
registry': CodecRegistry = CodecRegistry)Step 2: Custom codec documentation
Package docstring gains a "Custom Codecs" section: how to implement Codec, create a registry with with_codec, pass it to Session. Correctness requirements (length validation, type mismatch errors).
Step 3: Custom codec example
New examples/custom-codec/. Update examples/README.md.
Step 4: Document FieldDataTypes expansion cost [NEW]
If a RawBinaryData wrapper type (to distinguish unknown-OID raw bytes from bytea) is added later, it adds a variant to FieldDataTypes — a breaking change for all match expressions in user code. Document this as a known future cost of using Array[U8] val for unknown OIDs.
Step 5: Tests, CLAUDE.md, release notes
Classification: added. Label: changelog - added.
Decisions
| # | Decision | Choice | Confidence | Reasoning |
|---|---|---|---|---|
| 1 | Result format | All binary (Approach A) in phases | High | Industry norm. Phase 1 = text results (no regression), Phase 2 gated on codec coverage |
| 2 | Unknown OID fallback | Array[U8] val |
High | Simple for alpha. Bytea ambiguity documented. Wrapper type deferred |
| 3 | Encode location | On Codec interface | High | Natural home for both operations. Extends to custom codecs in Phase 3 |
| 4 | Registry location | Session with global default | High | Zero-config common case |
| 5 | Text codecs | Explicit registry entries from Phase 1 | High | SimpleQuery needs them. Avoids two parallel decode systems |
| 6 | Encode errors | DataError path from Phase 1 |
High | Structurally needed even though unreachable for built-in types. Note: DataError carries no context about which parameter/codec failed — adequate for alpha, but Phase 3 may need a richer error type |
| 7 | Phase 2 String format | Text-format-identical | High | Prevents behavioral regression. Enforced by integration tests |
| 8 | Phase 2 Bind format | Int16(1) Int16(1) shorthand |
High | No extra roundtrip or metadata caching |
| 9 | Array types | Deferred to Phase 3 | Medium | Recursive codec dispatch too complex for Phase 2 |
| 10 | Temporal types | Native Pony types, not String | High | Every major binary-capable driver (pgx, rust-postgres, asyncpg, Postgrex, psycopg3) decodes these to native language types, not strings. Binary format is always UTC microseconds (timestamptz) or three-component (interval) — ParameterStatus tracking is unnecessary. Producing String would create an API we'd break when adding native types. Add Timestamp, Time, Interval, and Date val classes to FieldDataTypes in Phase 2 |
| 11 | State transition ordering | Build messages before transitioning query state | High | Prevents state machine inconsistency when bind() errors. The alternative (reset state on failure) adds complexity and risks orphaned state. Building first is cleaner: the query state only transitions when a message has been successfully constructed and is about to be sent. [v3] |
| 12 | Time type name |
Needs decision: accept collision with stdlib time.Time |
Medium | Pony's stdlib has a Time primitive in the time package. If a user does use "time" and use "postgres", the names collide. Options: (a) accept it — users can alias with use pg = "postgres" or use t = "time", which is standard Pony practice; (b) prefix with Pg — PgTime, but not PgTimestamp/PgDate/PgInterval since those don't collide, creating inconsistency; (c) prefix all four — PgTime, PgTimestamp, PgDate, PgInterval for consistency. The stdlib collision only affects Time. Other postgres driver types (Row, Rows, Field, Result) could also collide with user types but have been accepted as-is since alpha. Sean should decide. [v3] |
Open Questions
Time type naming — see Decision 12. Sean should decide between accepting the stdlib collision (option a) or prefixing (options b/c). [v3]
Beta Was this translation helpful? Give feedback.
-
|
phase 2 implementation PR: #139 |
Beta Was this translation helpful? Give feedback.
-
|
Phases 1 and 2 are complete (PRs #141 and #144). Phase 3 design continues in #146. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Design for binary wire format support and the type codec registry — roadmap items #21 and #19. These are designed together because the codec IS the binary encoder/decoder; they're the same feature at different abstraction layers.
Background
The driver currently uses text format exclusively for both parameters and results. All parameter values are
(String | None), sent with format code 0 (text). All result columns arrive as text strings, parsed into Pony types by_RowsBuilder._field_to_type()based on OID.Binary format is more efficient for numeric types — an
int4is 4 bytes on the wire instead of a variable-length decimal string, with no parse/format overhead. Forbytea, binary eliminates hex encoding entirely. 4 of 5 major drivers that support binary (pgx, tokio-postgres, asyncpg, Postgrex) default to it.Architecture Overview
Three concerns, designed together:
Codecinterface andCodecRegistrythat map between PostgreSQL wire format and Pony typesThe codec is the central concept. Each codec knows how to encode a Pony value to wire bytes (for parameters) and decode wire bytes to a Pony value (for results). The registry maps OIDs to codecs — a single codec can serve multiple OIDs.
Codec Design
Codec Interface
The interface deliberately has no
oid()method. The OID-to-codec mapping is the registry's responsibility, not the codec's. This allows a single codec implementation to serve multiple OIDs — for example, one_TextPassthroughBinaryCodecprimitive registered for all 7 text-like OIDs (text, varchar, char, bpchar, name, json, xml).Codecs are
val(immutable, shareable across actors). Built-in codecs are primitives (zero allocation, global singletons).Each codec handles one format. A type could have separate text and binary codecs — the registry selects which one is active. This follows psycopg3's model (separate text/binary dumpers) rather than pgx's (single codec with format parameter), which is simpler per-codec.
Encoding Error Handling
For built-in codecs,
encode()errors when the wrongFieldDataTypesvariant is passed (e.g.,Stringto_Int4BinaryCodec). In practice this can't happen for built-in types because the parameter encoder matches on the Pony type and selects the correct codec. For custom codecs (Phase 3),encode()failures surface asDataErrorto theResultReceiver/PipelineReceiver/StreamingResultReceiver— the query is not sent to the server.Built-in Binary Codecs
_BoolBinaryCodecBool_ByteaBinaryCodecArray[U8] val_Int8BinaryCodecI64_Int2BinaryCodecI16_Int4BinaryCodecI32_Float4BinaryCodecF32_Float8BinaryCodecF64Built-in Text Passthrough Codecs
These types have binary representations that are identical to their text representations (UTF-8 bytes). A passthrough codec converts the bytes to
String:All share a single
_TextPassthroughBinaryCodecprimitive registered for each OID, since the codec interface has nooid()method. The primitive doesString.from_array(data).Built-in Text Codecs (Existing Logic)
The current
_field_to_type()text parsing logic is preserved as text codecs, used when results arrive in text format (SimpleQuery, or when binary is not enabled):_BoolTextCodecfield.at("t")_ByteaTextCodec\xDEADBEEF→ bytes)_Int8TextCodecfield.i64()?_Int2TextCodecfield.i16()?_Int4TextCodecfield.i32()?_Float4TextCodecfield.f32()?_Float8TextCodecfield.f64()?CodecRegistry
The registry holds both text and binary codecs per OID.
decode()selects the right one based on the format code.has_binary_codec()is used to determine result format strategy.The default registry contains all built-in codecs listed above. Users extend it with
with_codec()to handle additional PostgreSQL types.Binary Parameters
Parameter Type Change
Breaking change: The parameter type changes from
(String | None)toFieldDataTypes:Same change for
NamedPreparedQuery.Migration is straightforward — replace string-encoded numbers with typed values:
Note that explicit type casts (
::int4) are no longer needed when using typed parameters, since the driver sends the OID in the Parse message.Encoding Strategy
Each
FieldDataTypesvariant determines the wire format:I16I32I64F32F64BoolArray[U8] valStringNoneStringparameters use text format (format code 0) with OID 0 (server infers). This preserves backward compatibility — users can still pass"42"as a string for an int4 column and the server will parse it. This also handles PostgreSQL types that don't have Pony equivalents (dates, timestamps, UUIDs, etc.) — send the text representation as aString.Wire Format Changes
Bind message: Currently uses the shorthand
Int16(0)for parameter format codes (meaning "all text"). Changes to per-parameter format codes:Parse message: Currently sends an empty OID array. Changes to send OIDs derived from parameter types:
Sending explicit OIDs in Parse eliminates the need for
::typecasts in SQL for typed parameters._FrontendMessage.bind()signature change:_FrontendMessage.parse()— no signature change needed. The caller derives OIDs from the params and passes them in the existingparam_type_oidsparameter, which is currently always empty.Type Safety Note
Binary parameters require the Pony type to match the PostgreSQL column type exactly. Sending
I32(42)to anint8column produces a server error (4 bytes received, 8 expected).Stringparameters are more forgiving — the server parses the text representation for whatever type the column expects. When in doubt about the PostgreSQL type, useString.Binary Results
DataRow Message Change
Currently,
_ResponseParser._data_row()converts each column toString. For binary results, columns contain raw bytes that may not be valid UTF-8. The parser changes to store raw bytes:The parser becomes format-agnostic — it reads raw bytes without conversion. All format-aware decoding moves to
_RowsBuilder.RowDescription Change
Currently,
_RowDescriptionMessagestores(column_name, type_oid)per column. The format code field is parsed but discarded (reader.skip(8)?skips type_length, type_modifier, and format_code together). The change: store the format code:The format code tells the decoder whether each column is text (0) or binary (1), regardless of what the client requested.
Decoding Strategy
_RowsBuilder._field_to_type()changes to use the codec registry and format code:_RowsBuilder.apply()also changes to accept the new column types and pass the registry/format through:The decoding logic in
_field_to_type:None→None(NULL, format-independent)Array[U8] val(zero-length column) → pass to codec (e.g., empty string for text types, zero-length bytea for bytea)String.from_array(data)Result Format Strategy
This is the key design decision. The Bind message's result format codes tell the server how to encode result columns. The PostgreSQL protocol allows:
Int16(0)— zero codes, all text (current behavior)Int16(1), Int16(0)— one code applied to all columns: textInt16(1), Int16(1)— one code applied to all columns: binaryInt16(N), Int16(f1), ..., Int16(fN)— per-column codes (N must equal column count)The shorthand (0 or 1 format code) works without knowing the number of result columns. Per-column codes require knowing the exact column count at Bind time.
For unnamed statements (PreparedQuery), the driver currently sends Parse+Bind+Describe+Execute+Sync in one shot. Column metadata isn't known until the Describe response, which arrives after Bind has already been sent. This means per-column format codes require either:
For named statements (NamedPreparedQuery), column metadata is available from the Prepare step's Describe response — but the driver doesn't currently store it.
This leaves three viable approaches:
Approach A: All binary (shorthand
Int16(1), Int16(1))Request all columns in binary format. Simple, no extra roundtrips.
Array[U8] valinstead ofString"2024-01-15"— with all-binary and no codec for those OIDs, they'd return raw binary bytesThe con is significant. To avoid regression, this approach requires binary codecs for all common PostgreSQL types, not just the 7 we currently decode to Pony primitives. The text passthrough codecs (text, varchar, json, etc.) handle some, but types with non-text binary representations (date, timestamp, numeric, uuid, jsonb, interval) need purpose-built binary-to-String decoders.
Approach B: All text for results, binary for parameters only
Keep
Int16(0)(all text) for result format codes. Binary encoding only on the parameter side.Approach C: Configurable, text by default
Default to all text for results. Users opt into all binary per-session (psycopg3 approach). When binary is enabled, types without binary codecs return
Array[U8] val.Impact on Existing Features
SimpleQuery: Always uses the simple query protocol, which has no Bind message. Results always arrive in text format. No change — text codecs handle decoding.
Streaming queries: Use the extended query protocol. Parameter encoding changes apply. Result format changes apply.
_StreamingQueryInFlightaccumulates_DataRowMessagecolumns and calls_RowsBuilder— the same pipeline, just with raw bytes and codec-based decoding.Pipeline queries: Same as streaming — each query in the pipeline uses the extended query protocol. Per-query result delivery through
_RowsBuilderwith codecs.COPY IN/OUT: Uses its own CopyData message format, not DataRow. Unaffected by this change.
Breaking Changes
PreparedQuery.paramstype:Array[(String | None)] val→Array[FieldDataTypes] valNamedPreparedQuery.paramstype: same changeArray[U8] valinstead ofStringAt v0.2.2 alpha, breaking changes are expected. The migration for items 1–3 is mechanical: replace
"42"withI32(42)for typed params, keep"hello"as-is for text params.Design Decisions
Decision 1: Result format strategy (primary design question)
See the three approaches above (A: all binary, B: text results only, C: configurable). This is the primary design question.
My recommendation is Approach A (all binary) implemented in phases. Phase 1 delivers binary parameters with text results (effectively Approach B). Phase 2 adds binary result decoding with comprehensive codec coverage, then switches to all binary. This gets binary parameters shipped quickly while building toward the industry-standard all-binary default.
Decision 2: Fallback for unknown OIDs in binary mode
When results are in binary format and no codec is registered for a column's OID, what should the decoder return?
Option A —
Array[U8] val: Raw bytes. The user gets the data but must decode it themselves. Ambiguous with bytea (OID 17) in theFieldDataTypesunion.Option B —
DataError: Fail the entire query. Strict — forces users to register codecs for all types they query. Prevents silent data corruption but may be too aggressive.Option C — Distinct wrapper type: Add a type like
class val RawBinaryDatawrappingArray[U8] valtoFieldDataTypes, distinguishing it from bytea. Adds a new variant to the union.My recommendation is Option A for simplicity. The bytea ambiguity rarely matters in practice — users know their schema. If it becomes a problem, Option C can be added later.
Decision 3: Codec interface — encode on Codec or separate?
The
encodemethod onCodecis used for parameters. But for parameter encoding, the Pony type determines the codec (I32 → int4 codec), whereas for result decoding, the OID determines the codec (OID 23 → int4 codec). The lookup direction is reversed.Option A — encode on Codec: Single interface, conceptually clean. Parameter encoder does a match on
FieldDataTypes, selects the appropriate codec, callsencode. Slight indirection but keeps encode/decode together.Option B — separate
_ParamEncoderprimitive: Hardcoded match onFieldDataTypesvariants, encoding inline. No codec lookup for parameters. Simpler for the built-in types but doesn't extend to custom types.Option C — encode on Codec, but parameter encoding uses direct match for built-in types and falls back to codec lookup for custom types: Pragmatic hybrid.
My recommendation is Option A. The codec is the natural home for both operations. For custom codecs (#19), users will need encode capability. Better to design it in from the start.
Decision 4: Where does the CodecRegistry live?
The registry needs to be accessible during:
Option A — On Session: The Session actor holds a
CodecRegistry val. Passed to query states and_RowsBuilder. Users configure it at session creation viaSessionConfigor similar.Option B — Global default + per-session override: A default registry (all built-in codecs) is used unless the user provides a custom one at session creation.
My recommendation is Option B. The default registry with built-in codecs requires zero configuration for the common case. Users who need custom codecs create a new registry with
with_codec(base, oid, codec)and pass it to the session.Decision 5: When to store RowDescription from Prepare
For named prepared statements, the Describe response during Prepare includes
ParameterDescription(parameter OIDs) andRowDescription(result column metadata). Currently neither is stored.If we need per-column result format codes (not needed for the shorthand all-binary approach), we'd need to store
RowDescriptionfrom Prepare and look it up at Bind time forNamedPreparedQuery.Even without per-column codes, storing
ParameterDescriptioncould enable parameter type validation at Bind time (verify that the Pony types match the expected PostgreSQL types before sending).This decision depends on the result format strategy (Decision 1). With all-binary shorthand, storing is optional but useful for validation. With per-column codes, storing is required for named statements.
Implementation Phases
Phase 1: Binary Parameters + Codec Infrastructure
Codecinterface_field_to_typelogic)CodecRegistry(String | None)toFieldDataTypes_FrontendMessage.bind()for per-parameter format codes and binary encoding_FrontendMessage.parse()to send OIDs derived from parameter typesPreparedQueryorNamedPreparedQuery(prepared-query, named-prepared-query, crud, bytea, streaming, pipeline, cancel, etc.) for new param typechanged(breaking param type change)Phase 2: Binary Results
_DataRowMessagecolumn type to(Array[U8] val | None)_RowDescriptionMessageto store format code per column_RowsBuilderto use codec registry and format code for decodingchanged(result format change; types without codecs returnArray[U8] val)Phase 3: Custom Codec Registry (Roadmap #19)
Codecinterface publicCodecRegistryconfigurable at session creationwith_codec()to user-facing APIadded(new public codec registry API)References
Beta Was this translation helpful? Give feedback.
All reactions