Design: Binary Format and Type Codec Registry #139

SeanTAllen · 2026-03-12T03:01:46Z

SeanTAllen
Mar 12, 2026
Maintainer

Design for binary wire format support and the type codec registry — roadmap items #21 and #19. These are designed together because the codec IS the binary encoder/decoder; they're the same feature at different abstraction layers.

Background

The driver currently uses text format exclusively for both parameters and results. All parameter values are (String | None), sent with format code 0 (text). All result columns arrive as text strings, parsed into Pony types by _RowsBuilder._field_to_type() based on OID.

Binary format is more efficient for numeric types — an int4 is 4 bytes on the wire instead of a variable-length decimal string, with no parse/format overhead. For bytea, binary eliminates hex encoding entirely. 4 of 5 major drivers that support binary (pgx, tokio-postgres, asyncpg, Postgrex) default to it.

Architecture Overview

Three concerns, designed together:

Codec infrastructure — the Codec interface and CodecRegistry that map between PostgreSQL wire format and Pony types
Binary parameters — typed parameter values encoded in binary by the client
Binary results — result columns decoded from binary by the client

The codec is the central concept. Each codec knows how to encode a Pony value to wire bytes (for parameters) and decode wire bytes to a Pony value (for results). The registry maps OIDs to codecs — a single codec can serve multiple OIDs.

Codec Design

Codec Interface

interface val Codec
  """
  Encodes and decodes values for a PostgreSQL type in a specific
  wire format (text or binary).
  """

  fun format(): U16
    """Wire format: 0 for text, 1 for binary."""

  fun encode(value: FieldDataTypes): Array[U8] val ?
    """Encode a Pony value to wire format bytes for use as a parameter."""

  fun decode(data: Array[U8] val): FieldDataTypes ?
    """Decode wire format bytes from a result column to a Pony value."""

The interface deliberately has no oid() method. The OID-to-codec mapping is the registry's responsibility, not the codec's. This allows a single codec implementation to serve multiple OIDs — for example, one _TextPassthroughBinaryCodec primitive registered for all 7 text-like OIDs (text, varchar, char, bpchar, name, json, xml).

Codecs are val (immutable, shareable across actors). Built-in codecs are primitives (zero allocation, global singletons).

Each codec handles one format. A type could have separate text and binary codecs — the registry selects which one is active. This follows psycopg3's model (separate text/binary dumpers) rather than pgx's (single codec with format parameter), which is simpler per-codec.

Encoding Error Handling

For built-in codecs, encode() errors when the wrong FieldDataTypes variant is passed (e.g., String to _Int4BinaryCodec). In practice this can't happen for built-in types because the parameter encoder matches on the Pony type and selects the correct codec. For custom codecs (Phase 3), encode() failures surface as DataError to the ResultReceiver / PipelineReceiver / StreamingResultReceiver — the query is not sent to the server.

Built-in Binary Codecs

OID	PostgreSQL Type	Codec	Pony Type	Binary Wire Format
16	bool	`_BoolBinaryCodec`	`Bool`	1 byte: 0 or 1
17	bytea	`_ByteaBinaryCodec`	`Array[U8] val`	Raw bytes (no hex encoding)
20	int8	`_Int8BinaryCodec`	`I64`	8 bytes big-endian
21	int2	`_Int2BinaryCodec`	`I16`	2 bytes big-endian
23	int4	`_Int4BinaryCodec`	`I32`	4 bytes big-endian
700	float4	`_Float4BinaryCodec`	`F32`	4 bytes IEEE 754 big-endian
701	float8	`_Float8BinaryCodec`	`F64`	8 bytes IEEE 754 big-endian

Built-in Text Passthrough Codecs

These types have binary representations that are identical to their text representations (UTF-8 bytes). A passthrough codec converts the bytes to String:

OID	PostgreSQL Type
18	char
19	name
25	text
114	json
142	xml
1042	bpchar
1043	varchar

All share a single _TextPassthroughBinaryCodec primitive registered for each OID, since the codec interface has no oid() method. The primitive does String.from_array(data).

Built-in Text Codecs (Existing Logic)

The current _field_to_type() text parsing logic is preserved as text codecs, used when results arrive in text format (SimpleQuery, or when binary is not enabled):

OID	Codec	Decoding
16	`_BoolTextCodec`	`field.at("t")`
17	`_ByteaTextCodec`	Hex decode (`\xDEADBEEF` → bytes)
20	`_Int8TextCodec`	`field.i64()?`
21	`_Int2TextCodec`	`field.i16()?`
23	`_Int4TextCodec`	`field.i32()?`
700	`_Float4TextCodec`	`field.f32()?`
701	`_Float8TextCodec`	`field.f64()?`

CodecRegistry

class val CodecRegistry
  """
  Maps PostgreSQL type OIDs to codecs. Immutable — adding a codec produces
  a new registry.
  """

  new val create()
    """Registry with all built-in codecs."""

  new val with_codec(base: CodecRegistry, oid: U32, codec: Codec)
    """New registry that adds or replaces the codec for the given OID."""

  fun decode(oid: U32, format: U16, data: Array[U8] val): FieldDataTypes ?
    """Decode result column data using the registered codec."""

  fun has_binary_codec(oid: U32): Bool
    """Whether a binary codec is registered for this OID."""

The registry holds both text and binary codecs per OID. decode() selects the right one based on the format code. has_binary_codec() is used to determine result format strategy.

The default registry contains all built-in codecs listed above. Users extend it with with_codec() to handle additional PostgreSQL types.

Binary Parameters

Parameter Type Change

Breaking change: The parameter type changes from (String | None) to FieldDataTypes:

// Before
class val PreparedQuery
  let params: Array[(String | None)] val

// After
class val PreparedQuery
  let params: Array[FieldDataTypes] val

Same change for NamedPreparedQuery.

Migration is straightforward — replace string-encoded numbers with typed values:

// Before
PreparedQuery("SELECT $1::int4, $2::text",
  recover val [as (String | None): "42"; "hello"] end)

// After
PreparedQuery("SELECT $1, $2",
  recover val [as FieldDataTypes: I32(42); "hello"] end)

Note that explicit type casts (::int4) are no longer needed when using typed parameters, since the driver sends the OID in the Parse message.

Encoding Strategy

Each FieldDataTypes variant determines the wire format:

Pony Type	Format Code	OID	Encoding
`I16`	1 (binary)	21	2 bytes big-endian
`I32`	1 (binary)	23	4 bytes big-endian
`I64`	1 (binary)	20	8 bytes big-endian
`F32`	1 (binary)	700	4 bytes IEEE 754 BE
`F64`	1 (binary)	701	8 bytes IEEE 754 BE
`Bool`	1 (binary)	16	1 byte: 1 or 0
`Array[U8] val`	1 (binary)	17	Raw bytes
`String`	0 (text)	0	UTF-8 bytes, server infers type
`None`	—	0	NULL (length -1, no data)

String parameters use text format (format code 0) with OID 0 (server infers). This preserves backward compatibility — users can still pass "42" as a string for an int4 column and the server will parse it. This also handles PostgreSQL types that don't have Pony equivalents (dates, timestamps, UUIDs, etc.) — send the text representation as a String.

Wire Format Changes

Bind message: Currently uses the shorthand Int16(0) for parameter format codes (meaning "all text"). Changes to per-parameter format codes:

// Before
Int16(0)                    -- 0 format codes = all text

// After
Int16(N)                    -- N format codes (one per parameter)
Int16(format_1)             -- 0 (text) or 1 (binary) per param
...
Int16(format_N)

Parse message: Currently sends an empty OID array. Changes to send OIDs derived from parameter types:

// Before
Int16(0)                    -- 0 parameter type OIDs = all inferred

// After
Int16(N)                    -- N parameter type OIDs
Int32(oid_1)                -- OID per param (0 for String = infer)
...
Int32(oid_N)

Sending explicit OIDs in Parse eliminates the need for ::type casts in SQL for typed parameters.

_FrontendMessage.bind() signature change:

// Before
fun bind(portal: String, stmt: String,
  params: Array[(String | None)] val): Array[U8] val

// After
fun bind(portal: String, stmt: String,
  params: Array[FieldDataTypes] val): Array[U8] val

_FrontendMessage.parse() — no signature change needed. The caller derives OIDs from the params and passes them in the existing param_type_oids parameter, which is currently always empty.

Type Safety Note

Binary parameters require the Pony type to match the PostgreSQL column type exactly. Sending I32(42) to an int8 column produces a server error (4 bytes received, 8 expected). String parameters are more forgiving — the server parses the text representation for whatever type the column expects. When in doubt about the PostgreSQL type, use String.

Binary Results

DataRow Message Change

Currently, _ResponseParser._data_row() converts each column to String. For binary results, columns contain raw bytes that may not be valid UTF-8. The parser changes to store raw bytes:

// Before
_DataRowMessage(columns: Array[(String | None)] val)

// After
_DataRowMessage(columns: Array[(Array[U8] val | None)] val)

The parser becomes format-agnostic — it reads raw bytes without conversion. All format-aware decoding moves to _RowsBuilder.

RowDescription Change

Currently, _RowDescriptionMessage stores (column_name, type_oid) per column. The format code field is parsed but discarded (reader.skip(8)? skips type_length, type_modifier, and format_code together). The change: store the format code:

// Before
_RowDescriptionMessage(columns: Array[(String, U32)] val)

// After
_RowDescriptionMessage(columns: Array[(String, U32, U16)] val)

The format code tells the decoder whether each column is text (0) or binary (1), regardless of what the client requested.

Decoding Strategy

_RowsBuilder._field_to_type() changes to use the codec registry and format code:

// Before
fun _field_to_type(field: (String | None), type_id: U32): FieldDataTypes ?

// After
fun _field_to_type(field: (Array[U8] val | None), type_id: U32,
  format: U16, registry: CodecRegistry): FieldDataTypes ?

_RowsBuilder.apply() also changes to accept the new column types and pass the registry/format through:

// Before
fun apply(rows': Array[Array[(String|None)] val] val,
  row_descriptions': Array[(String, U32)] val): Rows ?

// After
fun apply(rows': Array[Array[(Array[U8] val | None)] val] val,
  row_descriptions': Array[(String, U32, U16)] val,
  registry: CodecRegistry): Rows ?

The decoding logic in _field_to_type:

None → None (NULL, format-independent)
Empty Array[U8] val (zero-length column) → pass to codec (e.g., empty string for text types, zero-length bytea for bytea)
Format 0 (text): look up text codec for OID, decode. No text codec → String.from_array(data)
Format 1 (binary): look up binary codec for OID, decode. No binary codec → see Decision 2

Result Format Strategy

This is the key design decision. The Bind message's result format codes tell the server how to encode result columns. The PostgreSQL protocol allows:

Int16(0) — zero codes, all text (current behavior)
Int16(1), Int16(0) — one code applied to all columns: text
Int16(1), Int16(1) — one code applied to all columns: binary
Int16(N), Int16(f1), ..., Int16(fN) — per-column codes (N must equal column count)

The shorthand (0 or 1 format code) works without knowing the number of result columns. Per-column codes require knowing the exact column count at Bind time.

For unnamed statements (PreparedQuery), the driver currently sends Parse+Bind+Describe+Execute+Sync in one shot. Column metadata isn't known until the Describe response, which arrives after Bind has already been sent. This means per-column format codes require either:

An extra roundtrip: Parse+Describe first, wait for response, then Bind+Execute
Statement caching: first execution does two roundtrips, subsequent uses cached metadata

For named statements (NamedPreparedQuery), column metadata is available from the Prepare step's Describe response — but the driver doesn't currently store it.

This leaves three viable approaches:

Approach A: All binary (shorthand Int16(1), Int16(1))

Request all columns in binary format. Simple, no extra roundtrips.

Pro: Maximum efficiency, single roundtrip, no metadata caching needed
Pro: Industry norm (4/5 binary-capable drivers default to this)
Con: Types without a binary codec return Array[U8] val instead of String
Con: Common types (date, timestamp, numeric, uuid, jsonb, interval) currently return readable strings like "2024-01-15" — with all-binary and no codec for those OIDs, they'd return raw binary bytes

The con is significant. To avoid regression, this approach requires binary codecs for all common PostgreSQL types, not just the 7 we currently decode to Pony primitives. The text passthrough codecs (text, varchar, json, etc.) handle some, but types with non-text binary representations (date, timestamp, numeric, uuid, jsonb, interval) need purpose-built binary-to-String decoders.

Approach B: All text for results, binary for parameters only

Keep Int16(0) (all text) for result format codes. Binary encoding only on the parameter side.

Pro: No regression for results — everything works exactly as today
Pro: Simplest implementation — no DataRow or RowsBuilder changes needed
Pro: Parameters still get binary efficiency (the client → server direction)
Con: No binary benefit for results (the server → client direction)
Con: Diverges from industry norm

Approach C: Configurable, text by default

Default to all text for results. Users opt into all binary per-session (psycopg3 approach). When binary is enabled, types without binary codecs return Array[U8] val.

Pro: No regression by default
Pro: Users who want binary accept the trade-off explicitly
Con: Two code paths to maintain and test
Con: Default doesn't match industry norm

Impact on Existing Features

SimpleQuery: Always uses the simple query protocol, which has no Bind message. Results always arrive in text format. No change — text codecs handle decoding.

Streaming queries: Use the extended query protocol. Parameter encoding changes apply. Result format changes apply. _StreamingQueryInFlight accumulates _DataRowMessage columns and calls _RowsBuilder — the same pipeline, just with raw bytes and codec-based decoding.

Pipeline queries: Same as streaming — each query in the pipeline uses the extended query protocol. Per-query result delivery through _RowsBuilder with codecs.

COPY IN/OUT: Uses its own CopyData message format, not DataRow. Unaffected by this change.

Breaking Changes

PreparedQuery.params type: Array[(String | None)] val → Array[FieldDataTypes] val
NamedPreparedQuery.params type: same change
All code creating parameter arrays needs updating (examples, tests, user code)
If Approach A or C with binary enabled: types without codecs return Array[U8] val instead of String

At v0.2.2 alpha, breaking changes are expected. The migration for items 1–3 is mechanical: replace "42" with I32(42) for typed params, keep "hello" as-is for text params.

Design Decisions

Decision 1: Result format strategy (primary design question)

See the three approaches above (A: all binary, B: text results only, C: configurable). This is the primary design question.

My recommendation is Approach A (all binary) implemented in phases. Phase 1 delivers binary parameters with text results (effectively Approach B). Phase 2 adds binary result decoding with comprehensive codec coverage, then switches to all binary. This gets binary parameters shipped quickly while building toward the industry-standard all-binary default.

Decision 2: Fallback for unknown OIDs in binary mode

When results are in binary format and no codec is registered for a column's OID, what should the decoder return?

Option A — Array[U8] val: Raw bytes. The user gets the data but must decode it themselves. Ambiguous with bytea (OID 17) in the FieldDataTypes union.

Option B — DataError: Fail the entire query. Strict — forces users to register codecs for all types they query. Prevents silent data corruption but may be too aggressive.

Option C — Distinct wrapper type: Add a type like class val RawBinaryData wrapping Array[U8] val to FieldDataTypes, distinguishing it from bytea. Adds a new variant to the union.

My recommendation is Option A for simplicity. The bytea ambiguity rarely matters in practice — users know their schema. If it becomes a problem, Option C can be added later.

Decision 3: Codec interface — encode on Codec or separate?

The encode method on Codec is used for parameters. But for parameter encoding, the Pony type determines the codec (I32 → int4 codec), whereas for result decoding, the OID determines the codec (OID 23 → int4 codec). The lookup direction is reversed.

Option A — encode on Codec: Single interface, conceptually clean. Parameter encoder does a match on FieldDataTypes, selects the appropriate codec, calls encode. Slight indirection but keeps encode/decode together.

Option B — separate _ParamEncoder primitive: Hardcoded match on FieldDataTypes variants, encoding inline. No codec lookup for parameters. Simpler for the built-in types but doesn't extend to custom types.

Option C — encode on Codec, but parameter encoding uses direct match for built-in types and falls back to codec lookup for custom types: Pragmatic hybrid.

My recommendation is Option A. The codec is the natural home for both operations. For custom codecs (#19), users will need encode capability. Better to design it in from the start.

Decision 4: Where does the CodecRegistry live?

The registry needs to be accessible during:

Bind message construction (parameter encoding)
Result building (_RowsBuilder, which is called from query states)

Option A — On Session: The Session actor holds a CodecRegistry val. Passed to query states and _RowsBuilder. Users configure it at session creation via SessionConfig or similar.

Option B — Global default + per-session override: A default registry (all built-in codecs) is used unless the user provides a custom one at session creation.

My recommendation is Option B. The default registry with built-in codecs requires zero configuration for the common case. Users who need custom codecs create a new registry with with_codec(base, oid, codec) and pass it to the session.

Decision 5: When to store RowDescription from Prepare

For named prepared statements, the Describe response during Prepare includes ParameterDescription (parameter OIDs) and RowDescription (result column metadata). Currently neither is stored.

If we need per-column result format codes (not needed for the shorthand all-binary approach), we'd need to store RowDescription from Prepare and look it up at Bind time for NamedPreparedQuery.

Even without per-column codes, storing ParameterDescription could enable parameter type validation at Bind time (verify that the Pony types match the expected PostgreSQL types before sending).

This decision depends on the result format strategy (Decision 1). With all-binary shorthand, storing is optional but useful for validation. With per-column codes, storing is required for named statements.

Implementation Phases

Phase 1: Binary Parameters + Codec Infrastructure

Define Codec interface
Implement built-in binary codecs (bool, bytea, int2, int4, int8, float4, float8)
Implement built-in text codecs (preserve existing _field_to_type logic)
Implement CodecRegistry
Change parameter type from (String | None) to FieldDataTypes
Update _FrontendMessage.bind() for per-parameter format codes and binary encoding
Update _FrontendMessage.parse() to send OIDs derived from parameter types
Update all query dispatch paths in session state machine
Results stay text format (no regression)
Update all examples that use PreparedQuery or NamedPreparedQuery (prepared-query, named-prepared-query, crud, bytea, streaming, pipeline, cancel, etc.) for new param type
Update examples README if example descriptions change
Unit tests: codec encode/decode for each built-in type, Bind and Parse wire format with typed params
Integration tests: execute queries with typed params, verify results match text-param equivalents
Update project CLAUDE.md for new types, API changes, and codec architecture
Release notes: changed (breaking param type change)

Phase 2: Binary Results

Implement text passthrough binary codecs (text, varchar, char, bpchar, name, json, xml)
Implement binary-to-String codecs for common types (numeric, date, timestamp, timestamptz, uuid, jsonb, interval)
Change _DataRowMessage column type to (Array[U8] val | None)
Change _RowDescriptionMessage to store format code per column
Update _RowsBuilder to use codec registry and format code for decoding
Switch result format to all binary (shorthand)
Unit tests: binary decode for each codec, DataRow parsing with binary data
Integration tests: verify binary results match text results for all supported types
Update project CLAUDE.md for result format changes
Release notes: changed (result format change; types without codecs return Array[U8] val)

Phase 3: Custom Codec Registry (Roadmap #19)

Make Codec interface public
Make CodecRegistry configurable at session creation
Add with_codec() to user-facing API
Document how to write custom codecs
Example: custom codec for a user-defined PostgreSQL type
Update examples README
Update project CLAUDE.md for public codec API
Release notes: added (new public codec registry API)

References

PostgreSQL protocol: message formats — Bind, Parse, DataRow, RowDescription wire formats
Feature roadmap — items Add a test for response parsing with a "junk" message #19 and Make test incoming messages "immutable" #21
Expanded type conversions — decision on which types get explicit handling

SeanTAllen · 2026-03-12T20:13:18Z

SeanTAllen
Mar 12, 2026
Maintainer Author

Review Notes

Verified Claims

All claims about the current codebase were verified against source. Highlights:

PreparedQuery.params and NamedPreparedQuery.params are Array[(String | None)] val (confirmed at prepared_query.pony:12, named_prepared_query.pony:10)
_FrontendMessage.bind() writes Int16(0) for both parameter and result format codes (all text shorthand)
_FrontendMessage.parse() accepts param_type_oids: Array[U32] val, always called with an empty array
_data_row parser converts columns to String via String.from_array() — lossy for binary data containing null bytes
_row_description skips 8 bytes after type OID (type_size=2 + type_modifier=4 + format_code=2); format code is the last 2 bytes
All four in-flight query states accumulate _DataRowMessage.columns identically and feed into _RowsBuilder
SimpleQuery always returns text format; COPY IN/OUT uses its own message format and is unaffected
Text passthrough codecs are correct — PostgreSQL's binary format for text-based types IS raw UTF-8 bytes
Wire format correctness for numeric types is straightforward: integer encoding uses signed two's complement big-endian (matches Pony's I16/I32/I64), float encoding uses IEEE 754 bit patterns. NaN and infinity round-trip correctly

Design Strengths

The phased approach is sound — isolating the parameter-side breaking change (Phase 1) from the result format change (Phase 2) is the right call. The parameter type widening is source-compatible for existing users since String and None are both members of FieldDataTypes. interface val Codec with primitives is correct for Pony. No oid() on Codec is the right call since a codec can serve multiple OIDs. Immutable CodecRegistry via with_codec is clean Pony design.

Issues

1. Package docstring contains examples that will go stale

postgres.pony lines 229 and 264–270 contain inline code examples showing Array[(String | None)] params. The Phase 1 steps mention updating examples but don't mention the package docstring. Should be an explicit Phase 1 step.

2. `encode` error propagation has no current path

The design says encode failures surface as DataError. But bind message construction happens inside _QueryReady.try_run_query(), where all failures currently go to _Unreachable(). There is no error handling path for bind-time encode errors. Phase 1 needs new error handling: catch encode errors in try_run_query, surface as pg_query_failed with DataError, and don't send anything to the server.

For built-in types this path shouldn't be hit (the encoder matches on the Pony type and selects the correct codec), but the design explicitly calls out DataError surfacing for custom codecs in Phase 3, and the error path needs to exist. Better to build it in Phase 1 than retrofit.

3. Empty binary column handling not specified

The decoding step 2 says "Empty Array[U8] val → pass to codec." But for fixed-width types (int4 = exactly 4 bytes, bool = exactly 1 byte), a 0-length payload is a protocol error, not an empty value. Each binary codec's decode must verify the expected byte length and error on mismatch. This should be explicit in the design — it's a correctness invariant, not an implementation detail.

4. Phase 2 usability regression for common types without binary codecs

Currently, types like timestamp (OID 1114), date (OID 1082), numeric (OID 1700), uuid (OID 2950), and interval (OID 1186) return readable String values. In Phase 2 (all-binary), without dedicated codecs, they'd return opaque Array[U8] val binary bytes. The design mentions "binary-to-String codecs for common types" in Phase 2's scope, but:

The exact OID list isn't enumerated
Some formats are non-trivial (PostgreSQL numeric is variable-length base-10000 digits; timestamp is microseconds since 2000-01-01)
The design doesn't specify whether binary-to-String decoders must produce text-format-identical output — if they don't, it's a subtle behavioral change for users comparing String values

Phase 2 should not ship until these codecs exist and their output format is specified.

5. Text codecs must be explicit registry entries

The design describes binary codecs in detail but doesn't explicitly list text codecs as registry entries in the Phase 1 steps. In Phase 2, when _RowsBuilder._field_to_type() is replaced by registry-based decode(), the registry needs both text and binary codecs for all built-in types. SimpleQuery always returns text format and needs text codecs. The design's "Built-in Text Codecs (Existing Logic)" section implies these will be registered — the ask is to make this explicit in the Phase 1 implementation steps.

6. Missing `encode` method on `CodecRegistry`

The registry has decode but no encode. The parameter encoding path bypasses the registry — it matches directly on FieldDataTypes to select OID and codec. This works for Phase 1 (built-in types only), but Phase 3 (custom codecs) needs registry-involved encoding. Worth noting as a Phase 3 API addition.

7. Bool decoding robustness

The design says bool binary is "1 byte: 0 or 1." PostgreSQL specifies 0x01 for true, but some implementations may send other nonzero values. The _BoolBinaryCodec.decode should treat any nonzero byte as true for robustness.

8. Future `FieldDataTypes` expansion has breaking implications

Decision 2's Option A (Array[U8] val for unknown OIDs) is acceptable for alpha. But if Option C (RawBinaryData wrapper type) is adopted later, it adds a new variant to FieldDataTypes, which is a breaking change for all match expressions on FieldDataTypes in user code. Worth documenting as a known future cost of Option A so the decision is made with full information.

Decision Concurrence

Decision	Assessment
Result format: Approach A in phases	Agree. Phased approach is the key — Phase 1 is low-risk, Phase 2 can be gated on codec coverage
Unknown OID fallback: `Array[U8] val`	Agree for alpha. Document the bytea ambiguity and future Option C cost
Encode on Codec: Option A	Agree. Single interface, extends cleanly to custom codecs
Registry on Session: Option B	Agree. Zero-config default is right for common case
RowDescription storage	Agree to defer. All-binary shorthand eliminates the need

Suggested Additions

Phase 1 steps: Add "Update package docstring in postgres.pony", "Add error handling for bind-time encode failures in try_run_query", and "Register text codecs for all currently-parsed types in the default registry"
Phase 2 prerequisites: Enumerate the exact OID list for binary-to-String codecs and specify whether output must be text-format-identical
Codec implementation guidance: "Binary codecs for fixed-width types must verify payload length and error on mismatch" and "Bool binary decoder should treat any nonzero byte as true"
Specify Phase 2 Bind result format: Confirm it's Int16(1) Int16(1) (single-format shorthand for all binary)

Open Questions

Binary-to-String format equivalence: Must Phase 2 decoders for timestamp, numeric, etc. produce exactly the same string as PostgreSQL's text format? This determines whether Phase 2 is purely internal or user-visible.
Phase 2 OID coverage: The exact list of OIDs needing binary codecs before Phase 2 ships. "numeric, date, timestamp, timestamptz, uuid, jsonb, interval" may not be complete.

0 replies

SeanTAllen · 2026-03-12T21:07:31Z

SeanTAllen
Mar 12, 2026
Maintainer Author

Updated Plan (v2)

This incorporates all feedback from the review notes. Changes from the original design are tagged [NEW].

Resolved Open Questions

Must Phase 2 decoders produce text-format-identical strings? For types that decode to String (numeric, uuid, oid, jsonb), yes — binary-to-String decoders must produce output identical to PostgreSQL's text format. This prevents subtle behavioral regressions. For temporal types (timestamp, date, time, interval), the answer changed: these now decode to native Pony types (Timestamp, Date, Time, Interval) following industry practice. See Decision 10.

Exact OID list for Phase 2 binary codecs? Enumerated in Phase 2 Steps 1–3. Covers text passthrough types (7 OIDs), temporal types (5 OIDs → native types), and String-producing types (4 OIDs). Types not on the list fall back to Array[U8] val.

Codec Implementation Guidance

These rules apply to all codec implementations across all phases:

Fixed-width binary codecs must verify payload length. A binary int4 column is exactly 4 bytes. A 0-length payload for a fixed-width type is a protocol error, not an empty value. Each fixed-width codec's decode must check data.size() and error on mismatch. [NEW]
Bool binary decoding treats any nonzero byte as true. PostgreSQL specifies 0x01 for true, but the decoder should treat any nonzero value as true for robustness. [NEW]
Text codecs are explicit registry entries, not implied. Every OID that the driver decodes must have a registered codec. SimpleQuery always returns text format, so text codecs must be in the registry from Phase 1. [NEW]
Codec encode errors surface as DataError. When encode fails (wrong Pony type variant, or a custom codec rejects a value), the error surfaces as pg_query_failed / pg_pipeline_failed / pg_stream_failed with DataError. The query is not sent to the server. [NEW]

Phase 1: Binary Parameters + Codec Infrastructure

Goal: Ship typed parameters with binary encoding for numeric/bool/bytea types. Results stay text format — no user-visible regression on the result side. Breaking change to the parameter API.

Classification: changed (breaking parameter type change).

Step 1: Define `Codec` interface

New file: postgres/codec.pony

interface val Codec
  """
  Encodes and decodes values for a PostgreSQL type in a specific
  wire format (text or binary).
  """

  fun format(): U16
    """Wire format: 0 for text, 1 for binary."""

  fun encode(value: FieldDataTypes): Array[U8] val ?
    """Encode a Pony value to wire format bytes for use as a parameter."""

  fun decode(data: Array[U8] val): FieldDataTypes ?
    """Decode wire format bytes from a result column to a Pony value."""

No oid() method — the OID-to-codec mapping is the registry's job. This lets a single codec serve multiple OIDs (e.g., one _TextPassthroughBinaryCodec for all 7 text-like OIDs). Codecs are val (immutable, shareable). Built-in codecs are primitives (zero allocation).

Step 2: Implement built-in binary codecs

New file: postgres/_binary_codecs.pony

Primitive	OID	Pony Type	Binary Wire Format
`_BoolBinaryCodec`	16	`Bool`	1 byte: encode 0x01/0x00. Decode: any nonzero = `true`
`_ByteaBinaryCodec`	17	`Array[U8] val`	Raw bytes (0-length valid)
`_Int2BinaryCodec`	21	`I16`	2 bytes big-endian signed
`_Int4BinaryCodec`	23	`I32`	4 bytes big-endian signed
`_Int8BinaryCodec`	20	`I64`	8 bytes big-endian signed
`_Float4BinaryCodec`	700	`F32`	4 bytes IEEE 754 big-endian
`_Float8BinaryCodec`	701	`F64`	8 bytes IEEE 754 big-endian

Length validation in decode():

_BoolBinaryCodec: error if data.size() != 1
_Int2BinaryCodec: error if data.size() != 2
_Int4BinaryCodec/_Float4BinaryCodec: error if data.size() != 4
_Int8BinaryCodec/_Float8BinaryCodec: error if data.size() != 8
_ByteaBinaryCodec: any length valid (including 0)

Step 3: Implement built-in text codecs

New file: postgres/_text_codecs.pony

Primitive	OID	Decoding (mirrors current `_field_to_type` logic)
`_BoolTextCodec`	16	`String.from_array(data).at("t")`
`_ByteaTextCodec`	17	Hex decode — extract current `_decode_hex_bytea` logic
`_Int2TextCodec`	21	`String.from_array(data).i16()?`
`_Int4TextCodec`	23	`String.from_array(data).i32()?`
`_Int8TextCodec`	20	`String.from_array(data).i64()?`
`_Float4TextCodec`	700	`String.from_array(data).f32()?`
`_Float8TextCodec`	701	`String.from_array(data).f64()?`

Text codecs receive Array[U8] val (raw wire bytes) even though the content is UTF-8. encode converts the Pony type to its text representation.

Step 4: Implement `CodecRegistry`

New file: postgres/codec_registry.pony

class val CodecRegistry
  """
  Maps PostgreSQL type OIDs to codecs. Immutable — adding a codec
  produces a new registry.
  """

  new val create()
    """Registry with all built-in text and binary codecs."""

  new val _with_codec(base: CodecRegistry, oid: U32, codec: Codec)
    """New registry adding or replacing the codec for a given OID.
    Package-private in Phase 1; public in Phase 3."""

  fun decode(oid: U32, format: U16, data: Array[U8] val): FieldDataTypes ?
    """Decode result column data. Format 0 = text codec, format 1 = binary codec.
    Text fallback: String.from_array(data). Binary fallback: raw Array[U8] val."""

  fun has_binary_codec(oid: U32): Bool
    """Whether a binary codec is registered for this OID."""

Internal storage: two Map[U32, Codec] val — one for text codecs, one for binary. Default constructor populates both with all built-in codecs from Steps 2–3.

No encode method on the registry in Phase 1 — parameter encoding matches directly on FieldDataTypes. [NEW] encode is deferred to Phase 3 for custom codec support.

Step 5: Change parameter type

PreparedQuery.params and NamedPreparedQuery.params change from Array[(String | None)] val to Array[FieldDataTypes] val. Update class docstrings in prepared_query.pony (line 5: "Values are sent in text format") and named_prepared_query.pony (line 7: same) to describe typed parameters — binary format for typed values, text format with server inference for String.

Step 6: Update `_FrontendMessage.bind()` — per-parameter format codes and binary encoding

[NEW] bind() becomes partial (?). The internal try ... else _Unreachable() end pattern inside bind() is replaced by ? propagation — encode errors flow to the caller instead of panicking. Update the bind() docstring (currently lines 137–146 of _frontend_message.pony: "All parameters use text format") to reflect per-parameter format codes and binary encoding.

fun bind(portal: String, stmt: String,
  params: Array[FieldDataTypes] val): Array[U8] val ?

Wire format changes:

Parameter format codes: Int16(0) (all text shorthand) → Int16(N) + N per-parameter format codes
Parameter values: Binary for typed params, text for String, NULL unchanged
Result format codes: Remain Int16(0) (all text) in Phase 1

Pony Type	Format Code	OID	Encoding
`I16`	1 (binary)	21	2 bytes big-endian
`I32`	1 (binary)	23	4 bytes big-endian
`I64`	1 (binary)	20	8 bytes big-endian
`F32`	1 (binary)	700	4 bytes IEEE 754 BE
`F64`	1 (binary)	701	8 bytes IEEE 754 BE
`Bool`	1 (binary)	16	1 byte: 0x01 or 0x00
`Array[U8] val`	1 (binary)	17	Raw bytes
`String`	0 (text)	0	UTF-8 bytes, server infers type
`None`	—	0	Length -1, no data

Step 7: Update `_FrontendMessage.parse()` — send OIDs

No signature change. The existing param_type_oids: Array[U32] val parameter (currently always empty) gets populated from the params. A helper maps each FieldDataTypes variant to its OID (same table as Step 6). String and None get OID 0 (server infers).

Step 8: Add encode error handling in `_QueryReady.try_run_query()` [NEW]

Currently, try_run_query wraps everything in try ... else _Unreachable() end. With bind() partial, encode errors would hit _Unreachable() — wrong.

The fix: extract bind construction for each extended query path so that encode failures are caught separately from genuinely unreachable errors. On encode failure: deliver DataError to the appropriate receiver, dequeue the failed item, and call try_run_query again for the next queue item.

All 6 dispatch paths that call bind() need this (identified by match branch context, not line numbers, since earlier steps shift line positions):

PreparedQuery under _QueuedQuery match
NamedPreparedQuery under _QueuedQuery match
PreparedQuery under _QueuedStreamingQuery match
NamedPreparedQuery under _QueuedStreamingQuery match
PreparedQuery inside _QueuedPipeline loop
NamedPreparedQuery inside _QueuedPipeline loop

Pipeline encode errors: If any query in a pipeline fails encoding, fail the entire pipeline — deliver pg_pipeline_failed with DataError for every query, then pg_pipeline_complete, dequeue, and move on. This avoids the complexity of partial pipeline sends. Since encode errors for built-in types should never happen (the type match is exhaustive), this is fine for Phase 1. Phase 3 (custom codecs) can refine with per-query error isolation if needed.

The _Unreachable() in the outer try remains for genuinely unreachable paths (queue empty, wrong item type).

Update DataError docstring in query_error.pony to cover both directions — currently says "data that came back from a query" (inbound only), but it now also covers outbound encode failures.

Step 9: Update all query dispatch paths

All bind() call sites pass Array[FieldDataTypes] val. All parse() call sites pass derived OID arrays. _DataRowMessage.columns stays Array[(String|None)] val] in Phase 1 — no changes to in-flight state accumulation types.

Step 10: Keep `_RowsBuilder` unchanged in Phase 1

Results still arrive as text. The existing _field_to_type() handles text decoding. Switching to registry-based decoding in Phase 1 would require a wasteful String → Array[U8] val round-trip since _data_row still outputs String. Phase 2 changes _DataRowMessage to raw bytes, at which point _RowsBuilder naturally switches to the registry.

Step 11: Add `CodecRegistry` to `Session` and `_SessionLoggedIn`

Session gets a CodecRegistry val field, defaulting to CodecRegistry.create(). Not user-configurable in Phase 1 (that's Phase 3). Threaded through to _SessionLoggedIn.

In Phase 1, _FrontendMessage.bind() doesn't use the registry — it matches directly on FieldDataTypes variants. The registry is created here so text codecs are registered and ready for Phase 2's result decoding switch.

Step 12: Update package docstring in `postgres.pony` [NEW]

Line 105: replace "Parameters are text-format strings or None for NULL" with typed parameter description
Lines 168–169: NamedPreparedQuery(name, ["42"]) → update array literal to [as FieldDataTypes: "42"] (needed because params type changes to Array[FieldDataTypes] val)
Line 229: Array[(String | None)] → Array[FieldDataTypes]
Lines 264–270: [as (String | None): "1"] → [as FieldDataTypes: I32(1)]

Step 13: Update all examples

Every example creating PreparedQuery or NamedPreparedQuery needs the new param type:

examples/prepared-query/ — [as (String | None): "Pony"; "10"; None] → [as FieldDataTypes: "Pony"; I32(10); None]
examples/named-prepared-query/
examples/crud/
examples/streaming/
examples/pipeline/

(bytea and cancel use SimpleQuery — no changes needed.)

Update examples/README.md — the prepared-query entry references Array[(String | None)] val which must change to Array[FieldDataTypes] val.

Step 14: Update all tests

Update param types in all test files that construct PreparedQuery / NamedPreparedQuery.

New unit tests:

Codec encode/decode round-trip for each built-in binary codec
Binary codec length validation (wrong-length payloads error)
Bool binary decode: nonzero bytes other than 0x01 → true
Text codec decode for each built-in text codec
CodecRegistry.decode() with known/unknown OIDs, both formats
Bind wire format with typed params (per-parameter format codes, binary values)
Parse wire format with OIDs from typed params
Encode error handling: DataError delivery when bind() fails

Integration tests:

Queries with typed params (I32, I64, Bool, F32, F64, String, Array[U8] val, None)
Mixed typed and String params
Pipeline with typed params
Streaming with typed params

Build/run: make ssl=3.0.x for all, make unit-tests ssl=3.0.x for unit only, make integration-tests ssl=3.0.x for integration (requires make start-pg-containers).

Step 15: Update CLAUDE.md

PreparedQuery/NamedPreparedQuery param type
Codec architecture (interface, registry, built-in codecs)
Binary encoding strategy
Encode error handling in Query Execution Flow
Updated _FrontendMessage.bind() and parse() notes

Step 16: Release notes

Classification: changed. Label: changelog - changed.

Phase 2: Binary Results

Goal: Switch result format to all-binary for extended query protocol. Common types that currently return readable strings must continue to do so.

Classification: changed.

Prerequisites [NEW]

Phase 2 does not ship until:

Every OID in the lists below has a binary codec
String-producing binary codecs (numeric, uuid, oid, jsonb) produce text-format-identical output
Temporal binary codecs produce correct native type instances
Integration tests verify all of the above

Step 1: Text passthrough binary codecs

Single _TextPassthroughBinaryCodec primitive registered for all 7 OIDs. PostgreSQL's binary format for these IS raw UTF-8.

OID	PostgreSQL Type
18	char
19	name
25	text
114	json
142	xml
1042	bpchar
1043	varchar

Step 2: Add native temporal types to `FieldDataTypes` [NEW]

Every major binary-capable PostgreSQL driver (pgx, rust-postgres, asyncpg, Postgrex, psycopg3) decodes temporal types to native language types, not strings. Binary format is always UTC microseconds (timestamptz) or three-component (interval) — ParameterStatus tracking for timezone/interval style is unnecessary for binary decoding.

New val classes:

class val Timestamp
  """PostgreSQL timestamp/timestamptz. Microseconds since 2000-01-01 00:00:00 UTC."""
  let microseconds: I64

class val Time
  """PostgreSQL time. Microseconds since midnight."""
  let microseconds: I64

class val Interval
  """PostgreSQL interval. Three independent components (months have variable
  length, days have variable length due to DST)."""
  let microseconds: I64
  let days: I32
  let months: I32

class val Date
  """PostgreSQL date. Days since 2000-01-01."""
  let days: I32

Each class provides a string() method producing a human-readable representation. These are informational — they don't need to match PostgreSQL's text format exactly (that would require tracking IntervalStyle and DateStyle server parameters). Formats:

Date.string(): "2024-01-15" (ISO 8601). Infinity: "infinity" / "-infinity"
Time.string(): "14:30:00" or "14:30:00.123456" (fractional only when non-zero)
Timestamp.string(): "2024-01-15 14:30:00" or with .NNNNNN. Infinity: "infinity" / "-infinity"
Interval.string(): "1 year 2 mons 3 days 04:05:06" (PostgreSQL postgres style)

FieldDataTypes expands:

type FieldDataTypes is
  ( Array[U8] val | Bool | Date | F32 | F64 | I16 | I32 | I64
  | Interval | None | String | Time | Timestamp )

This is a breaking change — all match expressions on FieldDataTypes in user code need updating. Acceptable at alpha (v0.2.x). Document in release notes.

Text codecs for temporal types are also needed — SimpleQuery always returns text format, and those OIDs should decode to native types too (parse the text representation). Parsing details:

date text: "2024-01-15" or "infinity" / "-infinity". ISO 8601 date format (PostgreSQL default DateStyle = 'ISO').
time text: "14:30:00" or "14:30:00.123456". Parse hours:minutes:seconds, optional fractional.
timestamp text: "2024-01-15 14:30:00" or with .NNNNNN. Parse date + time, convert to microseconds since epoch.
timestamptz text: "2024-01-15 14:30:00+00" or similar timezone suffix. Parse and convert to UTC microseconds. The timezone offset varies with session TimeZone setting — the text codec must handle the offset.
interval text: depends on IntervalStyle server parameter. The text codec assumes the default postgres style ("1 year 2 mons 3 days 04:05:06"). Other styles (sql_standard, iso_8601, postgres_verbose) are not supported by the text codec — users with non-default IntervalStyle should use extended query protocol (binary) where the format is style-independent.

Step 3: Binary codecs for common types [NEW — exact list]

Codecs that decode to String:

OID	PostgreSQL Type	Binary Format	Text Output
26	oid	4 bytes U32 BE	`"12345"`
1700	numeric	Variable: ndigits (I16) + weight (I16) + sign (I16) + dscale (I16) + base-10000 digits	`"123.45"`
2950	uuid	16 bytes raw	`"xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"` (lowercase, 4-2-2-2-6 grouping)
3802	jsonb	1 version byte (0x01) + JSON UTF-8	Strip version byte, rest is text

Codecs that decode to native types:

OID	PostgreSQL Type	Binary Format	Pony Type
1082	date	4 bytes: days since 2000-01-01 (signed I32 BE)	`Date`
1083	time	8 bytes: microseconds since midnight (I64 BE)	`Time`
1114	timestamp	8 bytes: microseconds since 2000-01-01 00:00:00 (I64 BE)	`Timestamp`
1184	timestamptz	8 bytes: microseconds since 2000-01-01 00:00:00 UTC (I64 BE)	`Timestamp`

Both timestamp (1114) and timestamptz (1184) decode to Timestamp. The binary format is identical — the server converts timestamptz to UTC before sending. The distinction between "with timezone" and "without" is a server-side storage concern, not a wire format difference.

OID	PostgreSQL Type	Binary Format	Pony Type
1186	interval	16 bytes: microseconds (I64 BE) + days (I32 BE) + months (I32 BE)	`Interval`

Deferred: timetz (OID 1266) — complex timezone offset component. Array types (int2[], int4[], text[], int8[]) — require recursive codec dispatch. Both fall back to Array[U8] val until Phase 3.

Correctness invariants for special values:

Infinity timestamps: I64.max_value() → Timestamp with microseconds = I64.max_value(). The string() method produces "infinity". Same for I64.min_value() → "-infinity"
Infinity dates: I32.max_value() → Date with days = I32.max_value(), string() → "infinity". Same for I32.min_value() → "-infinity"
Numeric NaN: sign field 0xC000 → "NaN". Sign 0x0000 = positive, 0x4000 = negative
Numeric precision: 1.00 stays "1.00" (preserve dscale), not "1"
Time.microseconds is always non-negative (range 0 to 86,400,000,000)

Text-format-identical verification (for codecs that produce String):

uuid: lowercase hex with dashes
jsonb: JSON text after stripping the version byte

Step 4: Change `_DataRowMessage` columns to raw bytes

// Before
let columns: Array[(String|None)] val
// After
let columns: Array[(Array[U8] val | None)] val

Step 5: Update `_ResponseParser._data_row()`

Store raw bytes instead of converting to String. The column_length == 0 case becomes an empty Array[U8] val (not empty string).

Step 6: Update `_RowDescriptionMessage` to store format code

// Before
let columns: Array[(String, U32)] val    // (name, oid)
// After
let columns: Array[(String, U32, U16)] val  // (name, oid, format_code)

Update _row_description: change reader.skip(8)? to reader.skip(6)? (type_size=2 + type_modifier=4) then let format_code = reader.u16_be()?.

Step 7: Switch Bind result format to all binary

Change Bind message result format from Int16(0) (zero codes = all text shorthand) to Int16(1) Int16(1) (one code applied to all columns: binary). Message length increases by 2 bytes.

SimpleQuery is unaffected — no Bind message, always text. Text codecs handle that path.

Step 8: Update `_RowsBuilder` for codec-based decoding

fun apply(rows': Array[Array[(Array[U8] val | None)] val] val,
  row_descriptions': Array[(String, U32, U16)] val,
  registry: CodecRegistry): Rows ?

Decoding: None → None. Everything else → registry.decode(oid, format_code, data). Remove _decode_hex_bytea and _hex_digit from _RowsBuilder (now in _ByteaTextCodec).

Step 9: Update `bind()` parameter encoding for new types

Phase 2 adds Timestamp, Time, Interval, and Date to FieldDataTypes. The bind() function matches on FieldDataTypes variants to determine format code, OID, and encoding. It must handle the new variants:

Pony Type	Format Code	OID	Encoding
`Timestamp`	1 (binary)	1114 (timestamp)	8 bytes I64 BE
`Time`	1 (binary)	1083	8 bytes I64 BE
`Date`	1 (binary)	1082	4 bytes I32 BE
`Interval`	1 (binary)	1186	16 bytes: I64 BE + I32 BE + I32 BE

Note: Timestamp uses OID 1114 (timestamp without timezone), not 1184 (timestamptz). The binary encoding is identical — the server handles the timezone semantics based on the column type. Using 1114 avoids implicit timezone conversion surprises.

Step 10: Update all in-flight state accumulation types

All _data_rows fields change from Array[Array[(String|None)] val] to Array[Array[(Array[U8] val | None)] val] in:

_SimpleQueryInFlight
_ExtendedQueryInFlight
_StreamingQueryInFlight
_PipelineInFlight

All _RowsBuilder call sites gain a registry parameter (from _SessionLoggedIn). All _RowDescriptionMessage consumers update for the 3-tuple.

Step 11: Tests

Unit tests: binary decode for all new codecs, length validation, special values (infinity timestamps, NaN numeric, negative intervals), DataRow parsing with binary data, RowDescription format code parsing, _RowsBuilder with binary data + registry. Native type tests: Timestamp, Time, Interval, Date construction, string() output, equality, special values.

Integration tests: For each String-producing codec (oid, numeric, uuid, jsonb), same query via SimpleQuery and PreparedQuery → assert identical String values. For temporal types, verify binary decoding produces correct native type instances (compare field values, not string representations). Unknown OID columns → Array[U8] val. Pipeline and streaming with binary results.

Step 12: Update CLAUDE.md

_DataRowMessage, _RowDescriptionMessage, Type Conversion section, result format strategy, _RowsBuilder documentation. Add Timestamp, Time, Interval, Date to Public API Types. Update FieldDataTypes definition.

Step 13: Release notes

Classification: changed. Label: changelog - changed.

Phase 3: Public Custom Codec API

Goal: Users can register custom codecs for PostgreSQL types not covered by built-ins.

Classification: added.

Step 1: Make `CodecRegistry` user-configurable

Rename _with_codec → with_codec (public). Add encode method:

fun encode(oid: U32, value: FieldDataTypes): Array[U8] val ?

Add optional registry parameter to Session constructor:

new create(
  server_connect_info': ServerConnectInfo,
  database_connect_info': DatabaseConnectInfo,
  notify': SessionStatusNotify,
  registry': CodecRegistry = CodecRegistry)

Step 2: Custom codec documentation

Package docstring gains a "Custom Codecs" section: how to implement Codec, create a registry with with_codec, pass it to Session. Correctness requirements (length validation, type mismatch errors).

Step 3: Custom codec example

New examples/custom-codec/. Update examples/README.md.

Step 4: Document `FieldDataTypes` expansion cost [NEW]

If a RawBinaryData wrapper type (to distinguish unknown-OID raw bytes from bytea) is added later, it adds a variant to FieldDataTypes — a breaking change for all match expressions in user code. Document this as a known future cost of using Array[U8] val for unknown OIDs.

Step 5: Tests, CLAUDE.md, release notes

Classification: added. Label: changelog - added.

Decisions

#	Decision	Choice	Confidence	Reasoning
1	Result format	All binary (Approach A) in phases	High	Industry norm. Phase 1 = text results (no regression), Phase 2 gated on codec coverage
2	Unknown OID fallback	`Array[U8] val`	High	Simple for alpha. Bytea ambiguity documented. Wrapper type deferred
3	Encode location	On Codec interface	High	Natural home for both operations. Extends to custom codecs in Phase 3
4	Registry location	Session with global default	High	Zero-config common case
5	Text codecs	Explicit registry entries from Phase 1	High	SimpleQuery needs them. Avoids two parallel decode systems
6	Encode errors	`DataError` path from Phase 1	High	Structurally needed even though unreachable for built-in types. Note: `DataError` carries no context about which parameter/codec failed — adequate for alpha, but Phase 3 may need a richer error type
7	Phase 2 String format	Text-format-identical	High	Prevents behavioral regression. Enforced by integration tests
8	Phase 2 Bind format	`Int16(1) Int16(1)` shorthand	High	No extra roundtrip or metadata caching
9	Array types	Deferred to Phase 3	Medium	Recursive codec dispatch too complex for Phase 2
10	Temporal types	Native Pony types, not String	High	Every major binary-capable driver (pgx, rust-postgres, asyncpg, Postgrex, psycopg3) decodes these to native language types, not strings. Binary format is always UTC microseconds (timestamptz) or three-component (interval) — ParameterStatus tracking is unnecessary. Producing String would create an API we'd break when adding native types. Add `Timestamp`, `Time`, `Interval`, and `Date` val classes to `FieldDataTypes` in Phase 2

Open Questions

None — all prior open questions have been resolved. See Decision 10.

0 replies

SeanTAllen · 2026-03-12T21:29:24Z

SeanTAllen
Mar 12, 2026
Maintainer Author

Review of Updated Plan (v2)

All factual claims verified against source. Line references, wire protocol details, and current code behavior descriptions are accurate.

Phase 2 Gaps

1. Field.eq() needs new match arms for temporal types.

field.pony lines 21–44: Field.eq() explicitly matches on (value, that.value) pairs for all 9 current FieldDataTypes variants, with else false as catch-all. When Timestamp, Time, Interval, Date are added, equality between fields containing temporal types silently returns false — a correctness bug, not a compile error. Phase 2 needs to update Field.eq() with match arms for the new types.

2. _FieldDataTypesGen / _RowGen / _RowsGen generators need temporal variants.

_test_equality.pony lines 204–227, 248–267, 287–306: these generators enumerate exactly 9 variants and drive the property-based equality tests. Without temporal type variants, property tests silently lose coverage — no compile error, just untested paths.

3. State transition ordering in try_run_query when bind() becomes partial.

Currently, li.query_state = _ExtendedQueryInFlight.create() is set BEFORE bind() is called (e.g., line 1052 before line 1055). If bind() becomes partial and errors after the state transition, the query state is _ExtendedQueryInFlight with no message sent to the server. The plan's Step 8 says "dequeue the failed item, and call try_run_query again" — but try_run_query dispatches on queue state assuming _QueryReady. Either call bind() before setting query state, or reset state on failure. Same issue for streaming (line 1129 before 1134/1155) and pipeline (line 1185 before 1193/1198) paths. The fix is straightforward but the plan should make the ordering explicit since getting it wrong creates a state machine inconsistency.

4. Example updates for FieldDataTypes expansion.

9 examples use exhaustive match on field.value and will fail to compile when temporal types are added. Phase 2 needs a step to update all examples (analogous to Phase 1 Step 13).

5. Package docstring update for temporal types.

postgres.pony lines 130–157 show an example match on field.value covering only current variants, and lines 154–157 describe the type mapping ("everything else → String") which becomes inaccurate for temporal types. Phase 2 needs a package docstring update step.

6. Existing test updates for FieldDataTypes expansion.

Existing tests with exhaustive match on FieldDataTypes will fail to compile. Phase 2 Step 11 should note existing test updates alongside new tests.

7. parse() OID mapping for temporal types.

Phase 2 Step 9 describes bind() handling for new temporal types but doesn't mention parse(), which also derives OIDs from FieldDataTypes. The new types need OID mapping in the parse() helper too.

8. Test builder updates for binary results.

_IncomingRowDescriptionTestMessage (line 984 of _test_response_parser.pony) hardcodes format code 0. _IncomingDataRowTestMessage (line 947) takes Array[(String | None)] val. Both need updating for Phase 2 binary result testing. These are shared test infrastructure used across multiple test files.

9. Unit tests for temporal text codec parsing.

Text parsing for timestamptz (timezone offsets) and interval (complex component structure) is non-trivial. Phase 2 should have explicit unit tests for these, not just integration tests.

Phase 1 Observations (Non-blocking)

Text codec encode methods are written in Phase 1 but untested until Phase 3. Consider either deferring encode implementations or adding tests.
The plan doesn't specify whether phases are separate PRs. If separate (likely given the scope), the CHANGELOG labels are correct as stated.

0 replies

SeanTAllen · 2026-03-12T22:01:14Z

SeanTAllen
Mar 12, 2026
Maintainer Author

Updated Plan (v3)

This incorporates all feedback from the v2 review. Changes from v1 are tagged [NEW]; changes from v2 are tagged [v3].

Each phase is a separate PR. Phase 2 grows from 13 steps (v2) to 17 steps (v3), with new steps for Field.eq() temporal arms, example updates, package docstring updates, and test builder infrastructure. [v3]

Resolved Open Questions

Must Phase 2 decoders produce text-format-identical strings? For types that decode to String (numeric, uuid, oid, jsonb), yes — binary-to-String decoders must produce output identical to PostgreSQL's text format. This prevents subtle behavioral regressions. For temporal types (timestamp, date, time, interval), the answer changed: these now decode to native Pony types (Timestamp, Date, Time, Interval) following industry practice. See Decision 10.

Exact OID list for Phase 2 binary codecs? Enumerated in Phase 2 Steps 1, 3, and 4. Covers text passthrough types (7 OIDs), temporal types (5 OIDs to native types), and String-producing types (4 OIDs). Types not on the list fall back to Array[U8] val.

Codec Implementation Guidance

These rules apply to all codec implementations across all phases:

Fixed-width binary codecs must verify payload length. A binary int4 column is exactly 4 bytes. A 0-length payload for a fixed-width type is a protocol error, not an empty value. Each fixed-width codec's decode must check data.size() and error on mismatch. [NEW]
Bool binary decoding treats any nonzero byte as true. PostgreSQL specifies 0x01 for true, but the decoder should treat any nonzero value as true for robustness. [NEW]
Text codecs are explicit registry entries, not implied. Every OID that the driver decodes must have a registered codec. SimpleQuery always returns text format, so text codecs must be in the registry from Phase 1. [NEW]
Codec encode errors surface as DataError. When encode fails (wrong Pony type variant, or a custom codec rejects a value), the error surfaces as pg_query_failed / pg_pipeline_failed / pg_stream_failed with DataError. The query is not sent to the server. [NEW]
Build message bytes before transitioning query state. All extended query dispatch paths in try_run_query must construct wire messages (including the partial bind() call) BEFORE setting li.query_state to an in-flight state. This prevents the state machine from being left in an in-flight state with no message sent if bind() errors. The pattern: build the combined message bytes in a try block; on success, set the state and send; on encode failure, deliver DataError, dequeue, and recurse to the next queue item. [v3]

Phase 1: Binary Parameters + Codec Infrastructure

Goal: Ship typed parameters with binary encoding for numeric/bool/bytea types. Results stay text format — no user-visible regression on the result side. Breaking change to the parameter API.

Classification: changed (breaking parameter type change).

Step 1: Define `Codec` interface

New file: postgres/codec.pony

interface val Codec
  """
  Encodes and decodes values for a PostgreSQL type in a specific
  wire format (text or binary).
  """

  fun format(): U16
    """Wire format: 0 for text, 1 for binary."""

  fun encode(value: FieldDataTypes): Array[U8] val ?
    """Encode a Pony value to wire format bytes for use as a parameter."""

  fun decode(data: Array[U8] val): FieldDataTypes ?
    """Decode wire format bytes from a result column to a Pony value."""

No oid() method — the OID-to-codec mapping is the registry's job. This lets a single codec serve multiple OIDs (e.g., one _TextPassthroughBinaryCodec for all 7 text-like OIDs). Codecs are val (immutable, shareable). Built-in codecs are primitives (zero allocation).

Step 2: Implement built-in binary codecs

New file: postgres/_binary_codecs.pony

Primitive	OID	Pony Type	Binary Wire Format
`_BoolBinaryCodec`	16	`Bool`	1 byte: encode 0x01/0x00. Decode: any nonzero = `true`
`_ByteaBinaryCodec`	17	`Array[U8] val`	Raw bytes (0-length valid)
`_Int2BinaryCodec`	21	`I16`	2 bytes big-endian signed
`_Int4BinaryCodec`	23	`I32`	4 bytes big-endian signed
`_Int8BinaryCodec`	20	`I64`	8 bytes big-endian signed
`_Float4BinaryCodec`	700	`F32`	4 bytes IEEE 754 big-endian
`_Float8BinaryCodec`	701	`F64`	8 bytes IEEE 754 big-endian

Length validation in decode():

_BoolBinaryCodec: error if data.size() != 1
_Int2BinaryCodec: error if data.size() != 2
_Int4BinaryCodec/_Float4BinaryCodec: error if data.size() != 4
_Int8BinaryCodec/_Float8BinaryCodec: error if data.size() != 8
_ByteaBinaryCodec: any length valid (including 0)

Step 3: Implement built-in text codecs

New file: postgres/_text_codecs.pony

Primitive	OID	Decoding (mirrors current `_field_to_type` logic)
`_BoolTextCodec`	16	`String.from_array(data).at("t")`
`_ByteaTextCodec`	17	Hex decode — extract current `_decode_hex_bytea` logic
`_Int2TextCodec`	21	`String.from_array(data).i16()?`
`_Int4TextCodec`	23	`String.from_array(data).i32()?`
`_Int8TextCodec`	20	`String.from_array(data).i64()?`
`_Float4TextCodec`	700	`String.from_array(data).f32()?`
`_Float8TextCodec`	701	`String.from_array(data).f64()?`

Text codecs receive Array[U8] val (raw wire bytes) even though the content is UTF-8. encode converts the Pony type to its text representation.

Step 4: Implement `CodecRegistry`

New file: postgres/codec_registry.pony

class val CodecRegistry
  """
  Maps PostgreSQL type OIDs to codecs. Immutable — adding a codec
  produces a new registry.
  """

  new val create()
    """Registry with all built-in text and binary codecs."""

  new val _with_codec(base: CodecRegistry, oid: U32, codec: Codec)
    """New registry adding or replacing the codec for a given OID.
    Package-private in Phase 1; public in Phase 3."""

  fun decode(oid: U32, format: U16, data: Array[U8] val): FieldDataTypes ?
    """Decode result column data. Format 0 = text codec, format 1 = binary codec.
    Text fallback: String.from_array(data). Binary fallback: raw Array[U8] val."""

  fun has_binary_codec(oid: U32): Bool
    """Whether a binary codec is registered for this OID."""

Internal storage: two Map[U32, Codec] val — one for text codecs, one for binary. Default constructor populates both with all built-in codecs from Steps 2-3.

No encode method on the registry in Phase 1 — parameter encoding matches directly on FieldDataTypes. [NEW] encode is deferred to Phase 3 for custom codec support.

Step 5: Change parameter type

PreparedQuery.params and NamedPreparedQuery.params change from Array[(String | None)] val to Array[FieldDataTypes] val. Update class docstrings in prepared_query.pony (line 5: "Values are sent in text format") and named_prepared_query.pony (line 7: same) to describe typed parameters — binary format for typed values, text format with server inference for String.

Step 6: Update `_FrontendMessage.bind()` — per-parameter format codes and binary encoding

[NEW] bind() becomes partial (?). The internal try ... else _Unreachable() end pattern inside bind() is replaced by ? propagation — encode errors flow to the caller instead of panicking. Update the bind() docstring (currently lines 137-146 of _frontend_message.pony: "All parameters use text format") to reflect per-parameter format codes and binary encoding.

fun bind(portal: String, stmt: String,
  params: Array[FieldDataTypes] val): Array[U8] val ?

Wire format changes:

Parameter format codes: Int16(0) (all text shorthand) to Int16(N) + N per-parameter format codes
Parameter values: Binary for typed params, text for String, NULL unchanged
Result format codes: Remain Int16(0) (all text) in Phase 1

Pony Type	Format Code	OID	Encoding
`I16`	1 (binary)	21	2 bytes big-endian
`I32`	1 (binary)	23	4 bytes big-endian
`I64`	1 (binary)	20	8 bytes big-endian
`F32`	1 (binary)	700	4 bytes IEEE 754 BE
`F64`	1 (binary)	701	8 bytes IEEE 754 BE
`Bool`	1 (binary)	16	1 byte: 0x01 or 0x00
`Array[U8] val`	1 (binary)	17	Raw bytes
`String`	0 (text)	0	UTF-8 bytes, server infers type
`None`	n/a	0	Length -1, no data

Step 7: Update `_FrontendMessage.parse()` — send OIDs

No signature change. The existing param_type_oids: Array[U32] val parameter (currently always empty) gets populated from the params. A helper maps each FieldDataTypes variant to its OID (same table as Step 6). String and None get OID 0 (server infers).

Step 8: Add encode error handling in `_QueryReady.try_run_query()` [NEW] [v3]

Currently, try_run_query wraps everything in try ... else _Unreachable() end. With bind() partial, encode errors would hit _Unreachable() — wrong.

The fix uses the build-before-transition pattern (Codec Implementation Guidance rule 5): construct all wire-format messages (including the partial bind() call) BEFORE setting li.query_state to an in-flight state, so an encode error leaves the state machine in _QueryReady. [v3] On encode failure: deliver DataError to the appropriate receiver, dequeue the failed item, and call try_run_query again for the next queue item.

Concrete restructuring for each of the 6 dispatch paths that call bind():

PreparedQuery under _QueuedQuery: Build parse, bind, describe, execute, sync into combined inside a try block. Only after success: set li.query_state = _ExtendedQueryInFlight.create() and send. On bind() error: deliver pg_query_failed(s, qry.query, DataError), shift the queue, and recurse.
NamedPreparedQuery under _QueuedQuery: Same pattern — build bind+describe+execute+sync in try, then set state and send on success.
PreparedQuery under _QueuedStreamingQuery: Build parse+bind+describe+execute+flush in try, then set li.query_state = _StreamingQueryInFlight.create() and send on success. On error: deliver pg_stream_failed with DataError.
NamedPreparedQuery under _QueuedStreamingQuery: Same pattern with bind+describe+execute+flush.
_QueuedPipeline loop: The pipeline path requires special handling because bind() is called inside a recover val block. Inside recover val, mutable state from the enclosing scope is inaccessible, so an error from bind() cannot communicate which query failed — the error just propagates out of the block as an undifferentiated ?.

Pre-validation approach: Before entering the recover val block, iterate the queries array and pre-validate all parameters by encoding each param (using the same encode logic that bind() uses internally — a shared helper or direct match on FieldDataTypes). If any encode fails, report pg_pipeline_failed with DataError for that specific query index (and all subsequent indices, since no partial pipeline is sent), then pg_pipeline_complete, shift the queue, and recurse. If all params validate, proceed to the recover val block to build the wire message — bind() calls inside the block are now guaranteed to succeed (same params, same encode logic), so any error there is genuinely unreachable.

This gives per-query error reporting for encode failures while keeping the recover val block clean. The _Unreachable() in the else of the recover val block is now justified — pre-validation ensures bind() won't fail inside it.

Only after successful construction: set li.query_state = _PipelineInFlight.create() and send.

The _Unreachable() in the outer try remains for genuinely unreachable paths (queue access, non-partial message construction).

Update DataError docstring in query_error.pony to cover both directions — currently says "data that came back from a query" (inbound only), but it now also covers outbound encode failures.

Pipeline encode errors summary: If any query in a pipeline fails encoding, fail that query and all subsequent queries (since no partial pipeline is sent) — deliver pg_pipeline_failed with DataError for each, then pg_pipeline_complete, dequeue, and move on. Pre-validation before the recover val block enables per-query error attribution. Since encode errors for built-in types should never happen (the type match is exhaustive), this is primarily infrastructure for Phase 3 (custom codecs) where encode errors become realistic.

Step 9: Update all query dispatch paths

All bind() call sites pass Array[FieldDataTypes] val. All parse() call sites pass derived OID arrays. _DataRowMessage.columns stays Array[(String|None)] val in Phase 1 — no changes to in-flight state accumulation types.

Step 10: Keep `_RowsBuilder` unchanged in Phase 1

Results still arrive as text. The existing _field_to_type() handles text decoding. Switching to registry-based decoding in Phase 1 would require a wasteful String to Array[U8] val round-trip since _data_row still outputs String. Phase 2 changes _DataRowMessage to raw bytes, at which point _RowsBuilder naturally switches to the registry.

Step 11: Add `CodecRegistry` to `Session` and `_SessionLoggedIn`

Session gets a CodecRegistry val field, defaulting to CodecRegistry.create(). Not user-configurable in Phase 1 (that's Phase 3). Threaded through to _SessionLoggedIn.

In Phase 1, _FrontendMessage.bind() doesn't use the registry — it matches directly on FieldDataTypes variants. The registry is created here so text codecs are registered and ready for Phase 2's result decoding switch.

Step 12: Update package docstring in `postgres.pony` [NEW]

Line 105: replace "Parameters are text-format strings or None for NULL" with typed parameter description
Lines 168-169: NamedPreparedQuery(name, ["42"]) to update array literal to [as FieldDataTypes: "42"] (needed because params type changes to Array[FieldDataTypes] val)
Line 229: Array[(String | None)] to Array[FieldDataTypes]
Lines 264-270: [as (String | None): "1"] to [as FieldDataTypes: I32(1)]

Step 13: Update all examples

Every example creating PreparedQuery or NamedPreparedQuery needs the new param type:

examples/prepared-query/ — [as (String | None): "Pony"; "10"; None] to [as FieldDataTypes: "Pony"; I32(10); None]
examples/named-prepared-query/
examples/crud/
examples/streaming/
examples/pipeline/

(bytea and cancel use SimpleQuery — no changes needed.)

Update examples/README.md — the prepared-query entry references Array[(String | None)] val which must change to Array[FieldDataTypes] val.

Step 14: Update all tests

Update param types in all test files that construct PreparedQuery / NamedPreparedQuery.

New unit tests:

Codec encode/decode round-trip for each built-in binary codec
Binary codec length validation (wrong-length payloads error)
Bool binary decode: nonzero bytes other than 0x01 produce true
Text codec decode for each built-in text codec
Text codec encode for each built-in text codec (encode methods are written in this phase, so they should be tested here rather than deferred to Phase 3) [v3]
CodecRegistry.decode() with known/unknown OIDs, both formats
Bind wire format with typed params (per-parameter format codes, binary values)
Parse wire format with OIDs from typed params
Encode error handling: DataError delivery when bind() fails
Pipeline pre-validation: DataError delivery with correct per-query index attribution

Integration tests:

Queries with typed params (I32, I64, Bool, F32, F64, String, Array[U8] val, None)
Mixed typed and String params
Pipeline with typed params
Streaming with typed params

Build/run: make ssl=3.0.x for all, make unit-tests ssl=3.0.x for unit only, make integration-tests ssl=3.0.x for integration (requires make start-pg-containers).

Step 15: Update CLAUDE.md

PreparedQuery/NamedPreparedQuery param type
Codec architecture (interface, registry, built-in codecs)
Binary encoding strategy
Encode error handling in Query Execution Flow
Updated _FrontendMessage.bind() and parse() notes

Step 16: Release notes

Classification: changed. Label: changelog - changed.

Phase 2: Binary Results

Goal: Switch result format to all-binary for extended query protocol. Common types that currently return readable strings must continue to do so.

Classification: changed.

Prerequisites [NEW]

Phase 2 does not ship until:

Every OID in the lists below has a binary codec
String-producing binary codecs (numeric, uuid, oid, jsonb) produce text-format-identical output
Temporal binary codecs produce correct native type instances
Integration tests verify all of the above

Step 1: Text passthrough binary codecs

Single _TextPassthroughBinaryCodec primitive registered for all 7 OIDs. PostgreSQL's binary format for these IS raw UTF-8.

OID	PostgreSQL Type
18	char
19	name
25	text
114	json
142	xml
1042	bpchar
1043	varchar

Step 2: Add native temporal types to `FieldDataTypes` [NEW]

Every major binary-capable PostgreSQL driver (pgx, rust-postgres, asyncpg, Postgrex, psycopg3) decodes temporal types to native language types, not strings. Binary format is always UTC microseconds (timestamptz) or three-component (interval) — ParameterStatus tracking for timezone/interval style is unnecessary for binary decoding.

New val classes:

class val Timestamp
  """PostgreSQL timestamp/timestamptz. Microseconds since 2000-01-01 00:00:00 UTC."""
  let microseconds: I64

class val Time
  """PostgreSQL time. Microseconds since midnight."""
  let microseconds: I64

class val Interval
  """PostgreSQL interval. Three independent components (months have variable
  length, days have variable length due to DST)."""
  let microseconds: I64
  let days: I32
  let months: I32

class val Date
  """PostgreSQL date. Days since 2000-01-01."""
  let days: I32

Each class provides a string() method producing a human-readable representation. These are informational — they don't need to match PostgreSQL's text format exactly (that would require tracking IntervalStyle and DateStyle server parameters). Formats:

Date.string(): "2024-01-15" (ISO 8601). Infinity: "infinity" / "-infinity"
Time.string(): "14:30:00" or "14:30:00.123456" (fractional only when non-zero)
Timestamp.string(): "2024-01-15 14:30:00" or with .NNNNNN. Infinity: "infinity" / "-infinity"
Interval.string(): "1 year 2 mons 3 days 04:05:06" (PostgreSQL postgres style)

All four implement Equatable[T] so they work in direct comparisons and Field.eq(). [v3]

FieldDataTypes expands:

type FieldDataTypes is
  ( Array[U8] val | Bool | Date | F32 | F64 | I16 | I32 | I64
  | Interval | None | String | Time | Timestamp )

This is a breaking change — all exhaustive match expressions on FieldDataTypes in user code need updating. Acceptable at alpha (v0.2.x). Document in release notes.

Text codecs for temporal types are also needed — SimpleQuery always returns text format, and those OIDs should decode to native types too (parse the text representation). Parsing details:

date text: "2024-01-15" or "infinity" / "-infinity". ISO 8601 date format (PostgreSQL default DateStyle = 'ISO').
time text: "14:30:00" or "14:30:00.123456". Parse hours:minutes:seconds, optional fractional.
timestamp text: "2024-01-15 14:30:00" or with .NNNNNN. Parse date + time, convert to microseconds since epoch.
timestamptz text: "2024-01-15 14:30:00+00" or similar timezone suffix. Parse and convert to UTC microseconds. The timezone offset varies with session TimeZone setting — the text codec must handle the offset.
interval text: depends on IntervalStyle server parameter. The text codec assumes the default postgres style ("1 year 2 mons 3 days 04:05:06"). Other styles (sql_standard, iso_8601, postgres_verbose) are not supported by the text codec — users with non-default IntervalStyle should use extended query protocol (binary) where the format is style-independent.

Step 3: Update `Field.eq()` with temporal match arms [v3]

field.pony lines 21-44: Field.eq() explicitly matches on (value, that.value) pairs for all 9 current FieldDataTypes variants. The else false catch-all means new types silently compare as unequal — a correctness bug, not a compile error. Add four new match arms:

| (let a: Timestamp, let b: Timestamp) => a == b
| (let a: Time, let b: Time) => a == b
| (let a: Date, let b: Date) => a == b
| (let a: Interval, let b: Interval) => a == b

These delegate to each type's Equatable.eq().

Step 4: Binary codecs for common types [NEW — exact list]

Codecs that decode to String:

OID	PostgreSQL Type	Binary Format	Text Output
26	oid	4 bytes U32 BE	`"12345"`
1700	numeric	Variable: ndigits (I16) + weight (I16) + sign (I16) + dscale (I16) + base-10000 digits	`"123.45"`
2950	uuid	16 bytes raw	`"xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"` (lowercase, 4-2-2-2-6 grouping)
3802	jsonb	1 version byte (0x01) + JSON UTF-8	Strip version byte, rest is text

Codecs that decode to native types:

OID	PostgreSQL Type	Binary Format	Pony Type
1082	date	4 bytes: days since 2000-01-01 (signed I32 BE)	`Date`
1083	time	8 bytes: microseconds since midnight (I64 BE)	`Time`
1114	timestamp	8 bytes: microseconds since 2000-01-01 00:00:00 (I64 BE)	`Timestamp`
1184	timestamptz	8 bytes: microseconds since 2000-01-01 00:00:00 UTC (I64 BE)	`Timestamp`

Both timestamp (1114) and timestamptz (1184) decode to Timestamp. The binary format is identical — the server converts timestamptz to UTC before sending. The distinction between "with timezone" and "without" is a server-side storage concern, not a wire format difference.

OID	PostgreSQL Type	Binary Format	Pony Type
1186	interval	16 bytes: microseconds (I64 BE) + days (I32 BE) + months (I32 BE)	`Interval`

Deferred: timetz (OID 1266) — complex timezone offset component. Array types (int2[], int4[], text[], int8[]) — require recursive codec dispatch. Both fall back to Array[U8] val until Phase 3.

Correctness invariants for special values:

Infinity timestamps: I64.max_value() produces Timestamp with microseconds = I64.max_value(). The string() method produces "infinity". Same for I64.min_value() producing "-infinity"
Infinity dates: I32.max_value() produces Date with days = I32.max_value(), string() produces "infinity". Same for I32.min_value() producing "-infinity"
Numeric NaN: sign field 0xC000 produces "NaN". Sign 0x0000 = positive, 0x4000 = negative
Numeric precision: 1.00 stays "1.00" (preserve dscale), not "1"
Time.microseconds is always non-negative (range 0 to 86,400,000,000)

Text-format-identical verification (for codecs that produce String):

uuid: lowercase hex with dashes
jsonb: JSON text after stripping the version byte

Step 5: Change `_DataRowMessage` columns to raw bytes

// Before
let columns: Array[(String|None)] val
// After
let columns: Array[(Array[U8] val | None)] val

Step 6: Update `_ResponseParser._data_row()`

Store raw bytes instead of converting to String. The column_length == 0 case becomes an empty Array[U8] val (not empty string).

Step 7: Update `_RowDescriptionMessage` to store format code

// Before
let columns: Array[(String, U32)] val    // (name, oid)
// After
let columns: Array[(String, U32, U16)] val  // (name, oid, format_code)

Update _row_description: change reader.skip(8)? to reader.skip(6)? (type_size=2 + type_modifier=4) then let format_code = reader.u16_be()?.

Step 8: Switch Bind result format to all binary

Change Bind message result format from Int16(0) (zero codes = all text shorthand) to Int16(1) Int16(1) (one code applied to all columns: binary). Message length increases by 2 bytes.

SimpleQuery is unaffected — no Bind message, always text. Text codecs handle that path.

Step 9: Update `_RowsBuilder` for codec-based decoding

fun apply(rows': Array[Array[(Array[U8] val | None)] val] val,
  row_descriptions': Array[(String, U32, U16)] val,
  registry: CodecRegistry): Rows ?

Decoding: None produces None. Everything else uses registry.decode(oid, format_code, data). Remove _decode_hex_bytea and _hex_digit from _RowsBuilder (now in _ByteaTextCodec).

Step 10: Update `bind()` and `parse()` parameter encoding for new types [v3: expanded to cover both bind and parse]

Phase 2 adds Timestamp, Time, Interval, and Date to FieldDataTypes. Both bind() and the parse() OID-mapping helper must handle the new variants.

bind() encoding:

Pony Type	Format Code	OID	Encoding
`Timestamp`	1 (binary)	1114 (timestamp)	8 bytes I64 BE
`Time`	1 (binary)	1083	8 bytes I64 BE
`Date`	1 (binary)	1082	4 bytes I32 BE
`Interval`	1 (binary)	1186	16 bytes: I64 BE + I32 BE + I32 BE

Note: Timestamp uses OID 1114 (timestamp without timezone), not 1184 (timestamptz). The binary encoding is identical — the server handles the timezone semantics based on the column type. Using 1114 avoids implicit timezone conversion surprises.

parse() OID mapping: The helper that maps FieldDataTypes variants to OIDs must also include the temporal types: Timestamp to 1114, Time to 1083, Date to 1082, Interval to 1186. Without this, temporal parameters would be sent with OID 0 (server infers), which works but forgoes type safety. [v3]

Step 11: Update all in-flight state accumulation types

In _SimpleQueryInFlight, _ExtendedQueryInFlight, _StreamingQueryInFlight, and _PipelineInFlight:

_data_rows changes from Array[Array[(String|None)] val] iso to Array[Array[(Array[U8] val | None)] val] iso
_row_description changes from (Array[(String, U32)] val | None) to (Array[(String, U32, U16)] val | None) to carry the format code from Step 7's _RowDescriptionMessage change [v3]

All _RowsBuilder call sites (5 total in session.pony) gain a registry parameter (from _SessionLoggedIn). All _RowDescriptionMessage consumers update for the 3-tuple.

Step 12: Update all examples for `FieldDataTypes` expansion [v3]

10 examples match on field.value; 9 use \exhaustive\ and will fail to compile when temporal types are added. All 10 need temporal type match arms. At minimum, call string() on the temporal type:

examples/query/query-example.pony
examples/prepared-query/prepared-query-example.pony
examples/named-prepared-query/named-prepared-query-example.pony
examples/ssl-query/ssl-query-example.pony
examples/ssl-preferred-query/ssl-preferred-query-example.pony
examples/crud/crud-example.pony
examples/copy-in/copy-in-example.pony
examples/streaming/streaming-example.pony
examples/pipeline/pipeline-example.pony (non-exhaustive match, but should still be updated for completeness)
examples/bytea/bytea-example.pony

Update examples/README.md entries for any examples whose descriptions reference FieldDataTypes variants.

Step 13: Update package docstring for temporal types [v3]

postgres.pony lines 130-157 need updating:

The example match field.value block (lines 138-147) needs arms for temporal types (at minimum: | let t: Timestamp => _env.out.print(field.name + ": " + t.string()), and similarly for Date, Time, Interval)
The // Also: I16, I64, F32, F64 comment (line 145) needs expanding to include temporal types
The type mapping summary (lines 154-157: "bytea to Array[U8] val, bool to Bool, ... everything else to String") needs updating: add timestamp/timestamptz to Timestamp, date to Date, time to Time, interval to Interval, and clarify that unknown OIDs produce Array[U8] val (binary) or String (text)

Step 14: Update test builders for binary results [v3]

_IncomingRowDescriptionTestMessage (in _test_response_parser.pony): currently takes Array[(String, String)] val (column name, type name) and hardcodes format code 0 (text) at the end of each column's field block (wb.u16_be(0) at line 1021). Two changes needed:

Format code parameter: change the column tuple to (String, String, U16) (name, type-name, format-code) so tests can construct RowDescription messages with binary format codes. Alternatively, change the tuple to (String, U32, U16) (name, raw-OID, format-code) to avoid maintaining a growing string-to-OID map — this is cleaner since it eliminates the type-name lookup entirely. [v3]
OID map expansion (if keeping the string-based API): the type name lookup table (lines 997-1008) currently only covers 8 types (text, bytea, bool, int2, int4, int8, float4, float8). Phase 2 tests need entries for: oid, numeric, uuid, jsonb, timestamp, timestamptz, date, time, interval. Without these, temporal and Phase 2 type tests cannot construct valid RowDescription messages. [v3]

_IncomingDataRowTestMessage: currently takes Array[(String | None)] val and writes column values as text strings. Add a new _IncomingBinaryDataRowTestMessage class that takes Array[(Array[U8] val | None)] val and writes raw binary column data. Keeping the existing builder avoids churning all existing text-format tests.

These test builder updates are shared infrastructure used across 9 test files (_test_session.pony, _test_pipeline.pony, _test_streaming.pony, _test_copy_in.pony, _test_copy_out.pony, _test_notice.pony, _test_notification.pony, _test_parameter_status.pony, _test_response_parser.pony). The existing tests continue using format code 0; only new binary result tests use format code 1.

Step 15: Tests [v3: expanded with items from review findings 2, 6, 8, 9]

Existing test updates for FieldDataTypes expansion: [v3]

These existing tests must be updated alongside the new tests — they are not "nice to have" but will fail to compile or silently lose coverage:

_test_equality.pony: _TestFieldEqualityReflexive (lines 14-27) and _TestFieldEqualityStructural (lines 36-47) enumerate all 9 variants — add all 4 temporal types. _TestFieldInequality (lines 82-116) — add cross-type inequality cases for temporal types (e.g., Timestamp vs Date, Time vs I64).
_test_equality.pony generators: _FieldDataTypesGen (lines 204-227) — add 4 new frequency entries, one per temporal type, each producing values with randomized fields. _RowGen._random_field_value (lines 248-267) — expand match rnd.usize(0, 8) to match rnd.usize(0, 12) and add 4 arms for temporal types. _RowsGen._random_field_value (lines 287-306) — same expansion (duplicated because Pony traits can't have iso fields).

New unit tests:

Binary codecs:

Binary decode for all new codecs (text passthrough, oid, numeric, uuid, jsonb, temporal)
Length validation for fixed-width binary codecs (date=4, time=8, timestamp=8, interval=16)
Special values: infinity timestamps/dates, NaN numeric, negative intervals, zero-length bytea

Temporal text codecs: [v3]

timestamptz text codec: timezone offsets (+00, +05:30, -07, +00:00), fractional seconds with varying precision (1-6 digits), edge cases (midnight, year boundaries, epoch 2000-01-01)
timestamp text codec: no timezone suffix, fractional seconds
interval text codec: full postgres-style format ("1 year 2 mons 3 days 04:05:06"), partial components ("1 day", "04:05:06", "1 year"), negative components ("-1 days +02:00:00"), zero intervals ("00:00:00")
date text codec: standard dates, infinity values ("infinity", "-infinity"), epoch boundary
time text codec: fractional seconds, midnight, maximum precision

Result parsing:

DataRow parsing with binary data (using _IncomingBinaryDataRowTestMessage)
RowDescription format code parsing (using _IncomingRowDescriptionTestMessage with format code 1)
_RowsBuilder with binary data + registry
_RowsBuilder with text data + registry (SimpleQuery path unchanged)

Native type and equality:

Timestamp, Time, Interval, Date construction, string() output, equality (Equatable), special values (infinity, negative intervals)
Field.eq() with temporal types — reflexive, structural, symmetric, inequality (cross-type comparisons with non-temporal types) [v3]
Property-based equality tests exercise temporal variants via updated generators [v3]

Integration tests:

For each String-producing codec (oid, numeric, uuid, jsonb), same query via SimpleQuery and PreparedQuery to assert identical String values
For temporal types, verify binary decoding produces correct native type instances (compare field values, not string representations)
SimpleQuery returning temporal types to verify text codecs produce correct native types
Unknown OID columns produce Array[U8] val
Pipeline and streaming with binary results

Build/run: make ssl=3.0.x for all, make unit-tests ssl=3.0.x for unit only, make integration-tests ssl=3.0.x for integration (requires make start-pg-containers).

Step 16: Update CLAUDE.md

_DataRowMessage, _RowDescriptionMessage types
Type Conversion section — replace OID table with codec-based description, add temporal types
Result format strategy (text for SimpleQuery, binary for extended query)
_RowsBuilder receives registry
Add Timestamp, Time, Interval, Date to Public API Types
Update FieldDataTypes definition (13 variants, not 9)
Note Field.eq() covers all 13 variants

Step 17: Release notes

Classification: changed. Label: changelog - changed.

Phase 3: Public Custom Codec API

Goal: Users can register custom codecs for PostgreSQL types not covered by built-ins.

Classification: added.

Step 1: Make `CodecRegistry` user-configurable

Rename _with_codec to with_codec (public). Add encode method:

fun encode(oid: U32, value: FieldDataTypes): Array[U8] val ?

Add optional registry parameter to Session constructor:

new create(
  server_connect_info': ServerConnectInfo,
  database_connect_info': DatabaseConnectInfo,
  notify': SessionStatusNotify,
  registry': CodecRegistry = CodecRegistry)

Step 2: Custom codec documentation

Package docstring gains a "Custom Codecs" section: how to implement Codec, create a registry with with_codec, pass it to Session. Correctness requirements (length validation, type mismatch errors).

Step 3: Custom codec example

New examples/custom-codec/. Update examples/README.md.

Step 4: Document `FieldDataTypes` expansion cost [NEW]

If a RawBinaryData wrapper type (to distinguish unknown-OID raw bytes from bytea) is added later, it adds a variant to FieldDataTypes — a breaking change for all match expressions in user code. Document this as a known future cost of using Array[U8] val for unknown OIDs.

Step 5: Tests, CLAUDE.md, release notes

Classification: added. Label: changelog - added.

Decisions

#	Decision	Choice	Confidence	Reasoning
1	Result format	All binary (Approach A) in phases	High	Industry norm. Phase 1 = text results (no regression), Phase 2 gated on codec coverage
2	Unknown OID fallback	`Array[U8] val`	High	Simple for alpha. Bytea ambiguity documented. Wrapper type deferred
3	Encode location	On Codec interface	High	Natural home for both operations. Extends to custom codecs in Phase 3
4	Registry location	Session with global default	High	Zero-config common case
5	Text codecs	Explicit registry entries from Phase 1	High	SimpleQuery needs them. Avoids two parallel decode systems
6	Encode errors	`DataError` path from Phase 1	High	Structurally needed even though unreachable for built-in types. Note: `DataError` carries no context about which parameter/codec failed — adequate for alpha, but Phase 3 may need a richer error type
7	Phase 2 String format	Text-format-identical	High	Prevents behavioral regression. Enforced by integration tests
8	Phase 2 Bind format	`Int16(1) Int16(1)` shorthand	High	No extra roundtrip or metadata caching
9	Array types	Deferred to Phase 3	Medium	Recursive codec dispatch too complex for Phase 2
10	Temporal types	Native Pony types, not String	High	Every major binary-capable driver (pgx, rust-postgres, asyncpg, Postgrex, psycopg3) decodes these to native language types, not strings. Binary format is always UTC microseconds (timestamptz) or three-component (interval) — ParameterStatus tracking is unnecessary. Producing String would create an API we'd break when adding native types. Add `Timestamp`, `Time`, `Interval`, and `Date` val classes to `FieldDataTypes` in Phase 2
11	State transition ordering	Build messages before transitioning query state	High	Prevents state machine inconsistency when `bind()` errors. The alternative (reset state on failure) adds complexity and risks orphaned state. Building first is cleaner: the query state only transitions when a message has been successfully constructed and is about to be sent. [v3]
12	`Time` type name	Needs decision: accept collision with stdlib `time.Time`	Medium	Pony's stdlib has a `Time` primitive in the `time` package. If a user does `use "time"` and `use "postgres"`, the names collide. Options: (a) accept it — users can alias with `use pg = "postgres"` or `use t = "time"`, which is standard Pony practice; (b) prefix with `Pg` — `PgTime`, but not `PgTimestamp`/`PgDate`/`PgInterval` since those don't collide, creating inconsistency; (c) prefix all four — `PgTime`, `PgTimestamp`, `PgDate`, `PgInterval` for consistency. The stdlib collision only affects `Time`. Other postgres driver types (`Row`, `Rows`, `Field`, `Result`) could also collide with user types but have been accepted as-is since alpha. Sean should decide. [v3]

Open Questions

Time type naming — see Decision 12. Sean should decide between accepting the stdlib collision (option a) or prefixing (options b/c). [v3]

1 reply

SeanTAllen Mar 12, 2026
Maintainer Author

option c

SeanTAllen · 2026-03-14T01:27:08Z

SeanTAllen
Mar 14, 2026
Maintainer Author

phase 2 implementation PR: #139

0 replies

SeanTAllen · 2026-03-14T01:32:33Z

SeanTAllen
Mar 14, 2026
Maintainer Author

Phases 1 and 2 are complete (PRs #141 and #144). Phase 3 design continues in #146.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Design: Binary Format and Type Codec Registry #139

Uh oh!

{{title}}

Uh oh!

Replies: 6 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Design: Binary Format and Type Codec Registry #139

Uh oh!

SeanTAllen Mar 12, 2026 Maintainer

Background

Architecture Overview

Codec Design

Codec Interface

Encoding Error Handling

Built-in Binary Codecs

Built-in Text Passthrough Codecs

Built-in Text Codecs (Existing Logic)

CodecRegistry

Binary Parameters

Parameter Type Change

Encoding Strategy

Wire Format Changes

Type Safety Note

Binary Results

DataRow Message Change

RowDescription Change

Decoding Strategy

Result Format Strategy

Impact on Existing Features

Breaking Changes

Design Decisions

Decision 1: Result format strategy (primary design question)

Decision 2: Fallback for unknown OIDs in binary mode

Decision 3: Codec interface — encode on Codec or separate?

Decision 4: Where does the CodecRegistry live?

Decision 5: When to store RowDescription from Prepare

Implementation Phases

Phase 1: Binary Parameters + Codec Infrastructure

Phase 2: Binary Results

Phase 3: Custom Codec Registry (Roadmap #19)

References

Replies: 6 comments · 1 reply

Uh oh!

SeanTAllen Mar 12, 2026 Maintainer Author

Review Notes

Verified Claims

Design Strengths

Issues

1. Package docstring contains examples that will go stale

2. encode error propagation has no current path

3. Empty binary column handling not specified

4. Phase 2 usability regression for common types without binary codecs

5. Text codecs must be explicit registry entries

6. Missing encode method on CodecRegistry

7. Bool decoding robustness

8. Future FieldDataTypes expansion has breaking implications

Decision Concurrence

Suggested Additions

Open Questions

Uh oh!

SeanTAllen Mar 12, 2026 Maintainer Author

Updated Plan (v2)

Resolved Open Questions

Codec Implementation Guidance

Phase 1: Binary Parameters + Codec Infrastructure

Step 1: Define Codec interface

Step 2: Implement built-in binary codecs

Step 3: Implement built-in text codecs

Step 4: Implement CodecRegistry

Step 5: Change parameter type

Step 6: Update _FrontendMessage.bind() — per-parameter format codes and binary encoding

Step 7: Update _FrontendMessage.parse() — send OIDs

Step 8: Add encode error handling in _QueryReady.try_run_query() [NEW]

Step 9: Update all query dispatch paths

Step 10: Keep _RowsBuilder unchanged in Phase 1

Step 11: Add CodecRegistry to Session and _SessionLoggedIn

Step 12: Update package docstring in postgres.pony [NEW]

Step 13: Update all examples

Step 14: Update all tests

Step 15: Update CLAUDE.md

Step 16: Release notes

Phase 2: Binary Results

Prerequisites [NEW]

Step 1: Text passthrough binary codecs

Step 2: Add native temporal types to FieldDataTypes [NEW]

SeanTAllen
Mar 12, 2026
Maintainer

Replies: 6 comments 1 reply

SeanTAllen
Mar 12, 2026
Maintainer Author

2. `encode` error propagation has no current path

6. Missing `encode` method on `CodecRegistry`

8. Future `FieldDataTypes` expansion has breaking implications

SeanTAllen
Mar 12, 2026
Maintainer Author

Step 1: Define `Codec` interface

Step 4: Implement `CodecRegistry`

Step 6: Update `_FrontendMessage.bind()` — per-parameter format codes and binary encoding

Step 7: Update `_FrontendMessage.parse()` — send OIDs

Step 8: Add encode error handling in `_QueryReady.try_run_query()` [NEW]

Step 10: Keep `_RowsBuilder` unchanged in Phase 1

Step 11: Add `CodecRegistry` to `Session` and `_SessionLoggedIn`

Step 12: Update package docstring in `postgres.pony` [NEW]

Step 2: Add native temporal types to `FieldDataTypes` [NEW]

Step 4: Change `_DataRowMessage` columns to raw bytes

Step 5: Update `_ResponseParser._data_row()`

Step 6: Update `_RowDescriptionMessage` to store format code

Step 8: Update `_RowsBuilder` for codec-based decoding

Step 9: Update `bind()` parameter encoding for new types

Step 1: Make `CodecRegistry` user-configurable

Step 4: Document `FieldDataTypes` expansion cost [NEW]

SeanTAllen
Mar 12, 2026
Maintainer Author

SeanTAllen
Mar 12, 2026
Maintainer Author

Step 1: Define `Codec` interface

Step 4: Implement `CodecRegistry`

Step 6: Update `_FrontendMessage.bind()` — per-parameter format codes and binary encoding

Step 7: Update `_FrontendMessage.parse()` — send OIDs

Step 8: Add encode error handling in `_QueryReady.try_run_query()` [NEW] [v3]

Step 10: Keep `_RowsBuilder` unchanged in Phase 1

Step 11: Add `CodecRegistry` to `Session` and `_SessionLoggedIn`

Step 12: Update package docstring in `postgres.pony` [NEW]

Step 2: Add native temporal types to `FieldDataTypes` [NEW]

Step 3: Update `Field.eq()` with temporal match arms [v3]

Step 5: Change `_DataRowMessage` columns to raw bytes

Step 6: Update `_ResponseParser._data_row()`

Step 7: Update `_RowDescriptionMessage` to store format code