Skip to content

operations table: Store the operation_xdr as BYTEA#497

Merged
aditya1702 merged 26 commits intohash-byteafrom
opxdr-bytea-2
Feb 26, 2026
Merged

operations table: Store the operation_xdr as BYTEA#497
aditya1702 merged 26 commits intohash-byteafrom
opxdr-bytea-2

Conversation

@aditya1702
Copy link
Contributor

@aditya1702 aditya1702 commented Feb 5, 2026

What

Updates the operations.operation_xdr storage to use PostgreSQL BYTEA (raw XDR bytes) while preserving the GraphQL API contract by continuing to expose operationXdr as a base64-encoded String.

Changes:

Change operations.operation_xdr from TEXT to BYTEA in the schema/migration and update DB write paths accordingly.
Introduce types.XDRBytea to scan/store raw bytes and encode to base64 when presented as a string.
Update GraphQL schema/resolvers and tests to reflect the new storage/encoding behavior.

Why

Storage optimization for our DB

Known limitations

N/A

Issue that this PR addresses

Closes #492

Part of the plan to store XDR data as raw bytes for efficiency.
Similar pattern to HashBytea but uses base64 encoding (standard for XDR)
and supports variable-length data.
Changed OperationXDR field from string to XDRBytea for automatic
base64/BYTEA conversion.
Cast the base64 XDR string to XDRBytea for proper database storage.
- Change UNNEST cast from text[] to bytea[]
- Convert XDRBytea to raw bytes in BatchInsert
- Update BatchCopy to pass raw bytes instead of pgtype.Text
Forces GraphQL to use a resolver for operationXdr to convert
XDRBytea to base64 string for API consumers.
- Update test_utils.go to use types.XDRBytea
- Regenerate GraphQL code with OperationXdr resolver
Returns the XDRBytea as a base64-encoded string for API consumers.
- Use parameterized queries with XDRBytea for SQL inserts
- Update assertions to use .String() for comparison
- Fix type casting in test data creation
- Use base64-encoded test XDR data in test_utils.go
- Add testOpXDR helper functions in test files
- Update all assertions to use .String() method
- Update createTestOperation to use proper base64 XDR
- Update generateTestOperations to encode test data as base64
- Update operations_test.go with base64 encoding for all test XDR data
- Fix assertions to compare against the stored XDRBytea values
This simplifies the XDR storage flow by storing raw bytes directly
instead of encoding to base64 and then decoding. The String() method
now handles base64 encoding for external representation.
Skip the intermediate base64 encoding step by using MarshalBinary()
instead of MarshalBase64(). The raw bytes are now stored directly
in XDRBytea.
Remove unnecessary Value() calls since XDRBytea is now []byte.
Access raw bytes directly via type conversion.
Decode expected base64 XDR string to raw bytes for comparison
since XDRBytea now uses []byte underlying type.
Use raw bytes directly instead of base64-encoded strings when
creating test data for XDRBytea fields.
Use raw bytes directly for test XDR data instead of base64-encoding.
The String() method will handle base64 encoding for assertions.
Use raw bytes directly instead of pre-encoded base64 string.
Use parameterized queries instead of raw SQL string literals
for BYTEA operation_xdr column. Fix .String() assertion to
compare base64 values via opXdr1.String().
Use parameterized queries instead of raw SQL string literals
for BYTEA operation_xdr column in BatchGetByOperationIDs and
BatchGetByStateChangeIDs tests.
Use parameterized queries instead of raw SQL string literals
for BYTEA operation_xdr column in BatchGetByOperationIDs test.
Copy the byte slice from the database driver instead of
referencing it directly. The pgx driver reuses its internal
buffer across rows, so without copying, all scanned XDRBytea
values end up pointing to the same (overwritten) buffer.
@aditya1702 aditya1702 changed the title Opxdr bytea 2 operations table: Store the operation_xdr as BYTEA Feb 5, 2026
@aditya1702 aditya1702 requested a review from Copilot February 11, 2026 13:54
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the operations.operation_xdr storage to use PostgreSQL BYTEA (raw XDR bytes) while preserving the GraphQL API contract by continuing to expose operationXdr as a base64-encoded String.

Changes:

  • Change operations.operation_xdr from TEXT to BYTEA in the schema/migration and update DB write paths accordingly.
  • Introduce types.XDRBytea to scan/store raw bytes and encode to base64 when presented as a string.
  • Update GraphQL schema/resolvers and tests to reflect the new storage/encoding behavior.

Reviewed changes

Copilot reviewed 15 out of 16 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
internal/services/ingest_test.go Updates ingest test fixtures to use types.XDRBytea.
internal/serve/graphql/schema/operation.graphqls Forces resolver for operationXdr so GraphQL continues returning a base64 string.
internal/serve/graphql/resolvers/transaction_resolvers_test.go Updates expectations for base64-encoded XDR output.
internal/serve/graphql/resolvers/test_utils.go Stores operation XDR as raw bytes in test DB setup.
internal/serve/graphql/resolvers/queries_resolvers_test.go Updates query resolver tests to compare base64 encoding of raw bytes.
internal/serve/graphql/resolvers/operation.resolvers.go Adds OperationXdr field resolver to base64-encode XDRBytea.
internal/serve/graphql/resolvers/account_resolvers_test.go Updates account resolver tests for base64-encoded XDR output.
internal/serve/graphql/generated/generated.go Regenerates gqlgen output to wire the forced resolver.
internal/indexer/types/types.go Adds XDRBytea type and updates types.Operation.OperationXDR to use it.
internal/indexer/processors/utils_test.go Updates ConvertOperation test to compare raw bytes vs base64 string.
internal/indexer/processors/utils.go Stores raw XDR bytes via MarshalBinary() instead of base64 strings.
internal/db/migrations/2025-06-10.3-operations.sql Changes operation_xdr column type to BYTEA (but see migration concern).
internal/data/transactions_test.go Updates test inserts to pass XDRBytea for operation_xdr.
internal/data/operations_test.go Updates operation model tests for bytea storage and base64 string comparisons.
internal/data/operations.go Updates batch insert/copy paths to send operation_xdr as bytea.
internal/data/accounts_test.go Updates account model tests inserting operations to use XDRBytea.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@aditya1702 aditya1702 marked this pull request as ready for review February 20, 2026 19:44
@aristidesstaffieri
Copy link
Contributor

Code Review — Unsafe type assertions on .Value() results

Severity: High (latent panic)

In 4 locations, .Value() results are type-asserted with .([]byte) without a nil guard. Both AddressBytea.Value() and HashBytea.Value() return (nil, nil) for empty strings, causing a panic on nil.([]byte).

Locations:

The safe pattern already exists in query_utils.go:77:

if val == nil {
    return nil, nil
}
return val.([]byte), nil

While current production paths filter empty values upstream, BatchInsert is a public API callable from tests and backfill paths.

@aristidesstaffieri
Copy link
Contributor

Code Review — Thresholds and SignerWeights resolvers silently map NULL to 0

Severity: Medium (incorrect data for GraphQL clients)

The Thresholds resolver never checks .Valid on sql.NullInt16 fields:

// Formats the old/new threshold values as a JSON object for backward compatibility.
func (r *signerThresholdsChangeResolver) Thresholds(ctx context.Context, obj *types.SignerThresholdsStateChangeModel) (string, error) {
// Format as {"old": "X", "new": "Y"} for backward compatibility with JSONB format
// Values are stored as ints but returned as quoted strings in JSON (0-255 range)
return fmt.Sprintf(`{"old": "%d", "new": "%d"}`, obj.ThresholdOld.Int16, obj.ThresholdNew.Int16), nil
}

When a threshold is first set (no prior state), ThresholdOld is NULL (Valid=false, Int16=0). Output: {"old": "0", "new": "5"} — clients cannot distinguish "was 0" from "did not exist."

The SignerWeights resolver has a similar issue — it returns nil only when both fields are NULL, but when one is NULL and the other valid, the NULL one renders as 0:

// SignerWeights is the resolver for the signerWeights field.
// Formats the old/new signer weight values as a JSON object for backward compatibility.
func (r *signerChangeResolver) SignerWeights(ctx context.Context, obj *types.SignerStateChangeModel) (*string, error) {
if !obj.SignerWeightOld.Valid && !obj.SignerWeightNew.Valid {
return nil, nil
}
// Format as {"old": X, "new": Y} for backward compatibility with JSONB format
result := fmt.Sprintf(`{"old": %d, "new": %d}`, obj.SignerWeightOld.Int16, obj.SignerWeightNew.Int16)
return &result, nil
}

The correct pattern exists in TrustlineLimit (same file), which checks .Valid per-field and uses "null" for absent values:

func (r *trustlineChangeResolver) Limit(ctx context.Context, obj *types.TrustlineStateChangeModel) (*string, error) {
if !obj.TrustlineLimitOld.Valid && !obj.TrustlineLimitNew.Valid {
return nil, nil
}
// Format old/new as JSON for backward compatibility with JSONB format
oldVal := "null"
newVal := "null"
if obj.TrustlineLimitOld.Valid {
oldVal = fmt.Sprintf(`"%s"`, obj.TrustlineLimitOld.String)
}
if obj.TrustlineLimitNew.Valid {
newVal = fmt.Sprintf(`"%s"`, obj.TrustlineLimitNew.String)
}
result := fmt.Sprintf(`{"old": %s, "new": %s}`, oldVal, newVal)
return &result, nil
}

@aristidesstaffieri
Copy link
Contributor

Code Review — Getter methods return internal maps without cloning (data race risk)

Severity: Medium (latent data race)

GetTransactionsParticipants() and GetOperationsParticipants() return direct references to internal maps while holding only a read lock. Once the caller has the reference and the lock is released, concurrent Push* calls that modify these maps can trigger Go's fatal error: concurrent map read and map write.

// GetTransactionsParticipants returns a map of transaction ToIDs to its participants.
func (b *IndexerBuffer) GetTransactionsParticipants() map[int64]set.Set[string] {
b.mu.RLock()
defer b.mu.RUnlock()
return b.participantsByToID
}

// GetOperationsParticipants returns a map of operation IDs to its participants.
func (b *IndexerBuffer) GetOperationsParticipants() map[int64]set.Set[string] {
b.mu.RLock()
defer b.mu.RUnlock()
return b.participantsByOpID
}

The safe pattern is already used by GetUniqueSEP41ContractTokensByID() in the same file, which returns maps.Clone(...):

// GetUniqueSEP41ContractTokensByID returns a map of unique SEP-41 contract IDs to their types.
// Thread-safe: uses read lock.
func (b *IndexerBuffer) GetUniqueSEP41ContractTokensByID() map[string]types.ContractType {
b.mu.RLock()
defer b.mu.RUnlock()
return maps.Clone(b.uniqueSEP41ContractTokensByID)
}

@aristidesstaffieri
Copy link
Contributor

Code Review — XDRBytea.Value() returns aliased slice (latent correctness hazard)

Severity: Low (defensive programming)

XDRBytea.Value() returns []byte(x) which shares the same backing array as the receiver. The companion Scan() method was already fixed in b84442d9 with make + copy to avoid pgx driver buffer reuse — Value() has the analogous risk in the write direction.

// Value implements driver.Valuer - returns raw bytes for BYTEA storage
func (x XDRBytea) Value() (driver.Value, error) {
if len(x) == 0 {
return nil, nil
}
return []byte(x), nil
}

The old string-based XDRBytea always produced an independent buffer via base64.DecodeString(). Current drivers (pgx, lib/pq) don't mutate the slice they receive, so this is not an active bug — but a defensive make + copy in Value() would match the Scan() fix and guard against future driver changes.

For comparison, the fixed Scan():

// Scan implements sql.Scanner - reads raw bytes from BYTEA column
func (x *XDRBytea) Scan(value any) error {
if value == nil {
*x = nil
return nil
}
bytes, ok := value.([]byte)
if !ok {
return fmt.Errorf("expected []byte, got %T", value)
}
*x = make([]byte, len(bytes))
copy(*x, bytes)
return nil
}

@aristidesstaffieri
Copy link
Contributor

Code Review — Cursor values interpolated directly into SQL instead of parameterized

Severity: Low (code consistency)

BatchGetByToID and BatchGetByOperationID use fmt.Sprintf with %d to embed cursor int64 values directly into SQL, while BatchGetByAccountAddress in the same file uses parameterized $N placeholders. No SQL injection risk since these are strongly-typed int64, but the inconsistency is worth noting.

if sortOrder == DESC {
queryBuilder.WriteString(fmt.Sprintf(`
AND (to_id, operation_id, state_change_order) < (%d, %d, %d)
`, cursor.ToID, cursor.OperationID, cursor.StateChangeOrder))
} else {
queryBuilder.WriteString(fmt.Sprintf(`
AND (to_id, operation_id, state_change_order) > (%d, %d, %d)
`, cursor.ToID, cursor.OperationID, cursor.StateChangeOrder))
}
}

Compare with the parameterized approach in BatchGetByAccountAddress:

queryBuilder.WriteString(fmt.Sprintf(`
AND (to_id, operation_id, state_change_order) < ($%d, $%d, $%d)
`, argIndex, argIndex+1, argIndex+2))
args = append(args, cursor.ToID, cursor.OperationID, cursor.StateChangeOrder)
argIndex += 3
} else {
queryBuilder.WriteString(fmt.Sprintf(`

@aditya1702 aditya1702 merged commit 02ca3dd into hash-bytea Feb 26, 2026
7 checks passed
@aditya1702 aditya1702 deleted the opxdr-bytea-2 branch February 26, 2026 22:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants