fix: Fix default value support for all type of properties#63
Merged
zhanglei1949 merged 34 commits intoalibaba:mainfrom Mar 19, 2026
Merged
fix: Fix default value support for all type of properties#63zhanglei1949 merged 34 commits intoalibaba:mainfrom
zhanglei1949 merged 34 commits intoalibaba:mainfrom
Conversation
Collaborator
Author
|
@greptile |
…neug into fix-str-defaul-val
Collaborator
Author
|
@greptile |
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
…neug into fix-str-defaul-val
Collaborator
Author
|
@greptile |
Collaborator
Author
|
@greptile |
Collaborator
Author
|
@greptile |
Collaborator
Author
|
@greptile |
Collaborator
Author
|
@greptile |
zhanglei1949
commented
Mar 18, 2026
| items_.dump(items_filename); | ||
| } | ||
|
|
||
| // Should only be used internally when we are sure the idx is valid |
Collaborator
Author
There was a problem hiding this comment.
private methods, only called by friend class TypedColumn<std::string_view>
Collaborator
Author
|
@greptile |
Collaborator
Author
|
@greptile |
Collaborator
Author
|
@greptile |
Collaborator
Author
|
@greptile |
Collaborator
Author
|
@greptile |
Collaborator
Author
|
@greptile |
luoxiaojian
approved these changes
Mar 18, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fix #62
Greptile Summary
This PR fixes default value support for all property types (including
VARCHAR) by moving the responsibility for applying defaults from column construction time to resize time. Instead of storing adefault_value_member in eachTypedColumn, the default is now passed explicitly through a newresize(size_t, const Property&)overload onColumnBaseandTable. For string columns, a single physical copy of the default string is written and all new slots are made to share the same{offset, length}pointer via the newset_string_itemhelper; compaction is updated with offset-deduplication logic to avoid amplifying this shared data on disk.Key changes:
default_value_fromTypedColumn<T>andTypedColumn<std::string_view>constructors; addsresize(size_t, const Property&)toColumnBase, allTypedColumnspecialisations, andTable.prepare_compaction_plannow tracksreused_size(bytes from shared offsets);compact()andstream_compact_and_dump()deduplicate entries viaunordered_map<offset → new_offset>, correctly updating all items including duplicates while writing each unique string only once.EdgeTable::EnsureCapacity,VertexTable::EnsureCapacity, anddropAndCreateNewUnbundledCSRall now pass schemadefault_property_valuesto the resize call, replacing a manual per-column fill loop.fclose-before-throw leaks instream_compact_and_dumpaddressed;compact()madeprivate;reset()added after the non-streaming dump path;is_writable_guard added toset_string_item.Confidence Score: 3/5
resize(size_t)no-default overload silently zero-initialises new rows instead of applying schema defaults, with no deprecation signal for downstream callers; (3) the VLOG instream_compact_and_dumpreports an inflated size.tools/python_bind/tests/test_ddl.py(flaky assertion),include/neug/utils/property/column.h(undeprecated no-default resize),include/neug/utils/mmap_array.h(VLOG reports wrong size)Important Files Changed
compact()private, addsset_string_item/get_string_itemfriends, fixes multiplefclose-before-throw leaks, addsreset()after non-streaming dump. The VLOG instream_compact_and_dumpincorrectly reportsplan.total_size(inflated by reused bytes) instead of the effective written size.default_value_storage; adds newresize(size_t, const Property&)overload to all column types. String specialisation efficiently shares one written copy of the default string across all new items viaset_string_item. The no-defaultresize(size_t)still exists without a deprecation marker despite semantics change.Table::resize(size_t, defaults)and adds a bounds-check ondefault_property_valuesinadd_columns. Removes default value propagation frominitColumns. All changes are consistent with header.Open/OpenInMemory/OpenWithHugepagessignatures; changesEnsureCapacityanddropAndCreateNewUnbundledCSRto callresize(capacity, defaults)and removes the manual per-column default-fill loop. Correct use of the new API./tmppaths (previous review issue resolved). The final edge-date assertion intest_get_varchar_default_value_2lacks ORDER BY, making it non-deterministic.Sequence Diagram
sequenceDiagram participant Caller as EnsureCapacity participant Table as Table participant Column as StringColumn participant Buffer as mmap_array_string Caller->>Table: "resize(capacity, defaults)" Table->>Column: "resize(size, default_value)" Note over Column: Acquire rw_mutex_ Column->>Buffer: "buffer_.resize(size_, data_capacity)" Note over Buffer: New items zero-initialised to offset=0 length=0 Column->>Column: "set_value(old_size, default_str)" Note over Column: Writes default string at pos_ then advances pos_ Column->>Buffer: "get_string_item(old_size)" Buffer-->>Column: "{offset, length}" loop "i = old_size+1 to size-1" Column->>Buffer: "set_string_item(i, string_item)" Note over Buffer: All new items share same offset+length pointer end Note over Column: On dump() Column->>Buffer: "prepare_compaction_plan()" Note over Buffer: Tracks seen_offsets and computes reused_size Buffer->>Buffer: "stream_compact_and_dump()" Note over Buffer: Deduplicates via old_offset_to_new map, writes each unique string onceLast reviewed commit: "fix"