Skip to content

fix: Fix default value support for all type of properties#63

Merged
zhanglei1949 merged 34 commits intoalibaba:mainfrom
zhanglei1949:fix-str-defaul-val
Mar 19, 2026
Merged

fix: Fix default value support for all type of properties#63
zhanglei1949 merged 34 commits intoalibaba:mainfrom
zhanglei1949:fix-str-defaul-val

Conversation

@zhanglei1949
Copy link
Collaborator

@zhanglei1949 zhanglei1949 commented Mar 16, 2026

Fix #62

Greptile Summary

This PR fixes default value support for all property types (including VARCHAR) by moving the responsibility for applying defaults from column construction time to resize time. Instead of storing a default_value_ member in each TypedColumn, the default is now passed explicitly through a new resize(size_t, const Property&) overload on ColumnBase and Table. For string columns, a single physical copy of the default string is written and all new slots are made to share the same {offset, length} pointer via the new set_string_item helper; compaction is updated with offset-deduplication logic to avoid amplifying this shared data on disk.

Key changes:

  • Column API: Removes default_value_ from TypedColumn<T> and TypedColumn<std::string_view> constructors; adds resize(size_t, const Property&) to ColumnBase, all TypedColumn specialisations, and Table.
  • Compaction: prepare_compaction_plan now tracks reused_size (bytes from shared offsets); compact() and stream_compact_and_dump() deduplicate entries via unordered_map<offset → new_offset>, correctly updating all items including duplicates while writing each unique string only once.
  • Call sites: EdgeTable::EnsureCapacity, VertexTable::EnsureCapacity, and dropAndCreateNewUnbundledCSR all now pass schema default_property_values to the resize call, replacing a manual per-column fill loop.
  • Bug fixes: multiple fclose-before-throw leaks in stream_compact_and_dump addressed; compact() made private; reset() added after the non-streaming dump path; is_writable_ guard added to set_string_item.
  • Tests: new C++ unit test covers dump/reload round-trip of string defaults; two new Python integration tests cover node and edge DDL default scenarios.

Confidence Score: 3/5

  • Core logic is sound and well-tested, but a non-deterministic Python test assertion and a public API behaviour change without a deprecation warning should be addressed before merging.
  • The compaction deduplication logic is correct (traced through compact() and stream_compact_and_dump()), shared-offset reads are sound, file-handle leaks are fixed, and call sites are properly updated. The score is reduced because: (1) the final edge-date assertion in the Python test has no ORDER BY and will fail non-deterministically when traversal order changes; (2) the public resize(size_t) no-default overload silently zero-initialises new rows instead of applying schema defaults, with no deprecation signal for downstream callers; (3) the VLOG in stream_compact_and_dump reports an inflated size.
  • tools/python_bind/tests/test_ddl.py (flaky assertion), include/neug/utils/property/column.h (undeprecated no-default resize), include/neug/utils/mmap_array.h (VLOG reports wrong size)

Important Files Changed

Filename Overview
include/neug/utils/mmap_array.h Core low-level string storage: adds offset-deduplication to compaction (unordered_map), makes compact() private, adds set_string_item/get_string_item friends, fixes multiple fclose-before-throw leaks, adds reset() after non-streaming dump. The VLOG in stream_compact_and_dump incorrectly reports plan.total_size (inflated by reused bytes) instead of the effective written size.
include/neug/utils/property/column.h Removes per-column default_value_ storage; adds new resize(size_t, const Property&) overload to all column types. String specialisation efficiently shares one written copy of the default string across all new items via set_string_item. The no-default resize(size_t) still exists without a deprecation marker despite semantics change.
src/utils/property/table.cc Implements the new Table::resize(size_t, defaults) and adds a bounds-check on default_property_values in add_columns. Removes default value propagation from initColumns. All changes are consistent with header.
src/storages/graph/edge_table.cc Updates Open/OpenInMemory/OpenWithHugepages signatures; changes EnsureCapacity and dropAndCreateNewUnbundledCSR to call resize(capacity, defaults) and removes the manual per-column default-fill loop. Correct use of the new API.
tools/python_bind/tests/test_ddl.py Adds two end-to-end tests for VARCHAR default values. Uses distinct /tmp paths (previous review issue resolved). The final edge-date assertion in test_get_varchar_default_value_2 lacks ORDER BY, making it non-deterministic.

Sequence Diagram

sequenceDiagram
    participant Caller as EnsureCapacity
    participant Table as Table
    participant Column as StringColumn
    participant Buffer as mmap_array_string

    Caller->>Table: "resize(capacity, defaults)"
    Table->>Column: "resize(size, default_value)"
    Note over Column: Acquire rw_mutex_
    Column->>Buffer: "buffer_.resize(size_, data_capacity)"
    Note over Buffer: New items zero-initialised to offset=0 length=0
    Column->>Column: "set_value(old_size, default_str)"
    Note over Column: Writes default string at pos_ then advances pos_
    Column->>Buffer: "get_string_item(old_size)"
    Buffer-->>Column: "{offset, length}"
    loop "i = old_size+1 to size-1"
        Column->>Buffer: "set_string_item(i, string_item)"
        Note over Buffer: All new items share same offset+length pointer
    end

    Note over Column: On dump()
    Column->>Buffer: "prepare_compaction_plan()"
    Note over Buffer: Tracks seen_offsets and computes reused_size
    Buffer->>Buffer: "stream_compact_and_dump()"
    Note over Buffer: Deduplicates via old_offset_to_new map, writes each unique string once
Loading

Last reviewed commit: "fix"

Greptile also left 1 inline comment on this PR.

@zhanglei1949
Copy link
Collaborator Author

@greptile

@zhanglei1949
Copy link
Collaborator Author

@greptile

zhanglei1949 and others added 3 commits March 17, 2026 08:53
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
@zhanglei1949
Copy link
Collaborator Author

@greptile

@zhanglei1949
Copy link
Collaborator Author

@greptile

@zhanglei1949
Copy link
Collaborator Author

@greptile

@zhanglei1949
Copy link
Collaborator Author

@greptile

@zhanglei1949
Copy link
Collaborator Author

@greptile

items_.dump(items_filename);
}

// Should only be used internally when we are sure the idx is valid
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

private methods, only called by friend class TypedColumn<std::string_view>

@zhanglei1949
Copy link
Collaborator Author

@greptile

@zhanglei1949
Copy link
Collaborator Author

@greptile

@zhanglei1949
Copy link
Collaborator Author

@greptile

@zhanglei1949 zhanglei1949 changed the title fix: Fix default value support for string type. fix: Fix default value support for all type of properties Mar 18, 2026
@zhanglei1949
Copy link
Collaborator Author

@greptile

@zhanglei1949
Copy link
Collaborator Author

@greptile

@zhanglei1949
Copy link
Collaborator Author

@greptile

@zhanglei1949 zhanglei1949 merged commit f71a09f into alibaba:main Mar 19, 2026
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Currently, the specified default string value doesn't take effect

2 participants