Release dev to main: recall refinements, stability fixes, and CI/security updates by GoZumie · Pull Request #49 · BigInformatics/wagl

GoZumie · 2026-03-06T18:01:44Z

Summary

Promote dev to main so we can proceed with next integration/release steps.

This includes recent recall and stability work, notably:

deterministic recall fixture testing for regression safety
sqlite-vec loading/threading safety fix on DB connect
multi-pass recall strategy improvements
recall output budget controls
keyword fallback + temporal intent/recency ranking improvements
security/CI permission and dependency alert fixes

Notes

Branch: dev → main
This PR is intended as the next-step release gate for downstream work (including openclaw-wagl update/release follow-up).

PR Review by Greptile

Greptile Summary

This PR promotes dev → main, bringing a substantial recall-quality improvement batch that refines how wagl recall retrieves, ranks, and budgets its output. The core changes introduce configurable scoring weights (--salience-weight, --dscore-weight), a keyword-fallback expansion step for multilingual queries, temporal-intent detection (yesterday, last week, etc.) with per-window score boosts, multi-pass context packs (high-valence and open-todo passes), and output budget controls (--max-canon, --min-relevant). On the infrastructure side, the sqlite-vec loading is moved from sqlite3_auto_extension (pre-connection, caused a threading panic) to a per-connection load_extension call, and the CI workflow gains a permissions: contents: read scope restriction.

Key points:

Temporal scoring comment is inaccurate: the comment claims salience=0.15 in temporal mode (it uses w_salience, default 0.2) and states out-of-window items cap at 0.70 (actual max with defaults is 0.85).
last_week temporal window starts at 0 h: unlike yesterday (24–48 h) and last_night (8–32 h), the last_week window includes items created moments ago (window_start_hours: 0.0), which is semantically inconsistent.
RECALL_FIXTURES.md scoring formula is stale: the new norm_abs_dscore × w_dscore term is omitted, and the table does not reflect that salience and dscore weights are now configurable; the example JSON in docs/cli/recall.md also omits the new dscore key from meta.weights.
Redundant double-clamp on composite_score (already clamped inside the match arm, then shadowed by an identical clamp).
The fixture-driven regression harness (recall_quality.rs) and the comprehensive smoke tests are a strong addition for ongoing regression safety.

Confidence Score: 3/5

Mergeable with minor fixes recommended — no data loss or security issues, but two logic-level inaccuracies (temporal comment and last_week window) and stale documentation should be addressed before or shortly after merge.
The sqlite-vec threading fix and the recall improvements are well-tested with extensive new fixture/smoke tests. However, the last_week temporal window starting at 0 h is a semantic bug that will silently boost items created seconds ago for "last week" queries, and the inaccurate comment about the temporal scoring ceiling could mislead future contributors tuning weights. The stale scoring formulas in newly introduced docs compound the confusion. These are non-trivial correctness concerns in the core ranking logic, though they do not affect data integrity or security.
Pay close attention to crates/core/src/temporal_intent.rs (last_week window bounds) and crates/cli/src/main.rs (temporal scoring comment and double-clamp). Also review docs/RECALL_FIXTURES.md and docs/cli/recall.md for the stale scoring formula.

Important Files Changed

Filename	Overview
crates/cli/src/main.rs	Major expansion of the `recall` command: adds configurable score weights, keyword fallback, temporal intent, multi-pass context, and output budget controls. Two issues found: an inaccurate comment about temporal-mode max score ceiling (salience weight stated as 0.15 in comment but 0.2 in code), and a redundant double-clamp on `composite_score`.
crates/core/src/temporal_intent.rs	New module providing `parse_temporal_intent` for detecting time-window keywords in recall queries. The `last_week` window is inconsistently defined (starts at 0 h, meaning items created right now qualify) compared to the analogous `yesterday` and `last_night` patterns which exclude the most-recent period.
crates/db/src/vector_ext.rs	Replaces the `sqlite3_auto_extension`-based approach with a per-connection `load_extension` call to fix the libsql threading assertion panic (issue #34). The `load_extension_disable` is always invoked before the load result is propagated, correctly ensuring extensions are never left enabled on error.
crates/db/src/lib.rs	Adds `query_recent_high_valence` and `query_open_todos` DB methods along with a shared `collect_memory_rows` helper. The `query_open_todos` SQL uses `LOWER(tags) LIKE '%...'` patterns which perform full table scans, but this is acceptable given the existing schema has no tag index.
docs/RECALL_FIXTURES.md	New contributor guide for the recall-quality regression harness. The scoring formula shown is stale — it omits the `norm_abs_dscore × w_dscore` term added in this PR and the quick-reference table does not mention that salience and dscore weights are now configurable.
.github/workflows/ci.yml	Adds a top-level `permissions: contents: read` declaration to limit the workflow's default token scope — a straightforward, correct security hardening change.
crates/cli/tests/recall_quality.rs	New fixture-driven recall-quality regression harness with 8 deterministic test scenarios covering salience ordering, EV ranking, recency, multilingual queries, tag/type matching, and empty-result safety. Tests correctly force text-only mode for determinism.
docs/cli/recall.md	Updated CLI reference documentation for `recall`. The example JSON output is missing the new `dscore` key in `meta.weights` that the code now always emits.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[wagl recall query] --> B[Resolve weights\nw_salience, w_dscore\nCLI > env > default]
    B --> C[Fetch canon items\nall canon:* tags]
    C --> D[Dedup canon\n1 per tag, text-dedup\ncap at max_canon]
    D --> E{Semantic search\navailable?}
    E -- Yes --> F[Vector similarity search\nget semantic_scores]
    E -- No --> G[Text LIKE query]
    F --> H{max_score >=\nfallback_threshold?}
    G --> H
    H -- No / text-only --> I[Keyword fallback\ntokenize query\nper-token LIKE searches]
    H -- Yes --> J[Filter out all canon: items]
    I --> J
    J --> K[parse_temporal_intent\ndetect time-window keywords]
    K --> L{Temporal hint\npresent?}
    L -- Yes --> M[Temporal scoring\nsemantic×0.35 + salience×w_s\n+ recency×0.10 + ev×0.10\n+ dscore×w_d + temporal_boost]
    L -- No --> N[Base scoring\nsemantic×0.5 + salience×w_s\n+ recency×0.15 + ev×0.15\n+ dscore×w_d]
    M --> O[Sort by rank_score DESC\nclamp composite_score to 0–1]
    N --> O
    O --> P[Truncate to\nmax limit, min_relevant]
    P --> Q{--multi-pass?}
    Q -- Yes --> R[Pass 2: high-valence\nquery_recent_high_valence]
    Q -- Yes --> S[Pass 3: open todos\nquery_open_todos]
    Q -- No --> T[Output JSON]
    R --> T
    S --> T

Comments Outside Diff (1)

crates/cli/src/main.rs, line 1319-1323 (link)

Inaccurate comment: temporal weights and max-score ceiling are both wrong

The comment states that salience weight is 0.15 in temporal mode and that out-of-window items score at most 0.70, but neither is correct.
1. The code uses salience * w_salience where w_salience defaults to 0.2 (same as the non-temporal path), not 0.15 as the comment implies.
2. With the actual default weights in temporal mode (semantic=0.35, salience=0.2, recency=0.10, ev=0.10, dscore=0.1), the maximum score for an out-of-window item is 0.35 + 0.2 + 0.10 + 0.10 + 0.1 = 0.85, not 0.70.
The 0.70 ceiling only holds when w_salience=0.15 and w_dscore=0.0, which are non-default values. The comment appears to pre-date the addition of the configurable dscore weight and was never updated.

_{Last reviewed commit: 547085a}

Greptile also left 4 inline comments on this PR.

Bumps the npm_and_yarn group with 1 update in the /docs directory: [svgo](https://github.com/svg/svgo). Updates `svgo` from 4.0.0 to 4.0.1 - [Release notes](https://github.com/svg/svgo/releases) - [Commits](svg/svgo@v4.0.0...v4.0.1) --- updated-dependencies: - dependency-name: svgo dependency-version: 4.0.1 dependency-type: indirect dependency-group: npm_and_yarn ... Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: GoZumie <258471731+GoZumie@users.noreply.github.com>

- Add `temporal_intent` module to `wagl-core` with: - `TemporalHint` struct (window_start_hours, window_end_hours, boost) - `parse_temporal_intent()` recognizing: yesterday, today, recent/recently/lately, last night, this morning/afternoon/evening, last week (most-specific first, case-insensitive) - 14 unit tests covering all keywords, boundary values, and specificity ordering - Update `wagl recall` composite scoring: - Without temporal hint: unchanged (semantic*0.5 + salience*0.2 + recency*0.15 + ev*0.15) - With temporal hint: semantic*0.35 + salience*0.15 + recency*0.10 + ev*0.10 + boost (up to 0.30 for in-window items, 0.0 for out-of-window items, capped at 1.0) - Items outside the temporal window score at most 0.70; in-window items score up to 1.00 - Emit `meta.temporal_intent` (null when no hint) and update `meta.weights` - Add CLI integration tests: - yesterday hint makes 36h-old item rank above 30-day-old identical item - non-temporal query emits null temporal_intent and uses standard weights Co-authored-by: GoZumie <258471731+GoZumie@users.noreply.github.com>

Co-authored-by: GoZumie <258471731+GoZumie@users.noreply.github.com>

crates/cli/src/main.rs

docs/RECALL_FIXTURES.md

crates/core/src/temporal_intent.rs

docs/cli/recall.md

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 547085acaf

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

crates/cli/src/main.rs

… docs

GoZumie · 2026-03-06T19:56:37Z

Addressed the review findings in commit 75538d4.

Fixed

Canonical retrieval now filters by tag prefix in SQL before LIMIT (query_by_tag_prefix), so newer non-canon rows cannot crowd out canon candidates.
Recall candidate fetch budgets now scale with effective_limit = max(limit, min_relevant) for semantic and text paths, so --min-relevant is satisfiable for larger values.
Removed redundant double clamp on composite_score.
Updated temporal parsing for last week to the previous-week window (168–336h), plus matching unit test updates.
Updated docs to include the dscore scoring term, configurable weight behavior, and clamping note.
Updated recall docs JSON example to include meta.weights.dscore.

Added regression tests

recall_canon_not_starved_by_text_noise
recall_min_relevant_large_budget_is_satisfiable

Local validation

cargo test -p wagl --test recall_budget_smoke
cargo test -p wagl --test temporal_recall_smoke
cargo test -p wagl-core temporal_intent_tests

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 75538d4a56

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

crates/cli/src/main.rs

…ted-at

GoZumie · 2026-03-06T20:08:27Z

Follow-up pass complete for newly surfaced conversations.

Implemented and pushed cd2208e:

Added semantic undersize backfill: when semantic candidates are present but below budget, recall now backfills from text query and de-duplicates IDs. This covers partial-embedding databases and keeps min_relevant satisfiable.
Added strict --created-at validation/normalization for put (RFC3339 required; normalized to UTC RFC3339 string) to avoid malformed timestamps degrading ordering/recency logic.
Added regression test: put_rejects_invalid_created_at_override.

Local validation:

cargo test -p wagl --test put_area_smoke
cargo test -p wagl --test recall_budget_smoke
cargo test -p wagl --test temporal_recall_smoke

I reviewed all current PR review conversations and resolved the addressed threads.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cd2208e627

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

crates/cli/src/main.rs

GoZumie · 2026-03-06T20:32:19Z

I reviewed the two newly surfaced P2 findings.

They are valid refinement items but not release blockers for this PR, so I tracked them in backlog issues:

Recall: enforce min-relevant after canon filtering in canon-heavy datasets #50 Recall: enforce min-relevant after canon filtering in canon-heavy datasets
Recall: keyword fallback should support short Unicode/CJK tokens #51 Recall: keyword fallback should support short Unicode/CJK tokens

Given current behavior and risk profile, I recommend proceeding with PR approval and handling these in the next refinement pass.

GoZumie and others added 30 commits March 4, 2026 04:05

ci: add explicit permissions block (contents: read)

9d49d73

ci: add explicit GITHUB_TOKEN permissions (contents: read)

3cb1b07

fix: dependabot security updates + CI token permissions (#35)

b52efb7

Initial plan

c133e5c

Initial plan

5dc9c28

Initial plan

899bf31

Initial plan

8e248fd

Initial plan

40f42ae

Initial plan

e0a52ac

Add d_score boost to recall ranking with configurable weights and tests

9e8a7dd

Co-authored-by: GoZumie <258471731+GoZumie@users.noreply.github.com>

feat: keyword fallback query expansion for multilingual recall

c7c5f64

Co-authored-by: GoZumie <258471731+GoZumie@users.noreply.github.com>

feat: multi-pass startup recall (identity + valence + todos)

f9c5063

Co-authored-by: GoZumie <258471731+GoZumie@users.noreply.github.com>

feat: recall QA fixture-driven regression harness

882494c

Co-authored-by: GoZumie <258471731+GoZumie@users.noreply.github.com>

feat: add recall output budget controls (--max-canon, --min-relevant)

77b75a0

Co-authored-by: GoZumie <258471731+GoZumie@users.noreply.github.com>

fix(recall): validate and normalize weighted score inputs

8747b49

test: format recall ranking smoke test

a7c1821

fix(recall): rank by raw composite score before output clamp

412ca45

security: fix open code/dependabot alerts

6671784

recall: parse weight flags as f64 end-to-end

f6a6e71

[WIP] Incorporate salience and d_score into recall ranking (#40)

ba77aca

[WIP] Add fixture-driven regression harness for ranking quality (#47)

fefda2b

Fix open security alerts (CI permissions + svgo lockfile) (#48)

596b43d

Merge PR #41 into dev: temporal intent + recency ranking

e3d1825

Merge PR #43 into dev: keyword fallback for recall

de56d06

Merge PR #45 into dev: recall output budget controls

37d4ff6

Merge PR #46 into dev: multi-pass recall strategy

21dd802

fix(db): load sqlite-vec after connect to avoid libsql threading panic

38c2eca

test(recall): make fixtures deterministic in text-only mode

547085a

GoZumie requested a review from ChrisCompton as a code owner March 6, 2026 18:01

greptile-apps bot reviewed Mar 6, 2026

View reviewed changes

crates/cli/src/main.rs Outdated Show resolved Hide resolved

docs/RECALL_FIXTURES.md Show resolved Hide resolved

crates/core/src/temporal_intent.rs Outdated Show resolved Hide resolved

docs/cli/recall.md Show resolved Hide resolved

chatgpt-codex-connector bot reviewed Mar 6, 2026

View reviewed changes

crates/cli/src/main.rs Outdated Show resolved Hide resolved

crates/cli/src/main.rs Outdated Show resolved Hide resolved

fix(recall): address PR findings for canon, budgets, temporal window,…

75538d4

… docs

chatgpt-codex-connector bot reviewed Mar 6, 2026

View reviewed changes

crates/cli/src/main.rs Show resolved Hide resolved

crates/cli/src/main.rs Outdated Show resolved Hide resolved

fix(recall): backfill undersized semantic results and validate --crea…

cd2208e

…ted-at

chatgpt-codex-connector bot reviewed Mar 6, 2026

View reviewed changes

crates/cli/src/main.rs Show resolved Hide resolved

crates/cli/src/main.rs Show resolved Hide resolved

This was referenced Mar 6, 2026

Recall: enforce min-relevant after canon filtering in canon-heavy datasets #50

Open

Recall: keyword fallback should support short Unicode/CJK tokens #51

Open

ChrisCompton approved these changes Mar 6, 2026

View reviewed changes

GoZumie merged commit f79d655 into main Mar 6, 2026
6 checks passed

This was referenced Mar 6, 2026

fix: compact wagl recall context + bump 0.1.5 BigInformatics/openclaw-wagl#13

Merged

Vulnerability Audit: Update transitive dependencies #61

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release dev to main: recall refinements, stability fixes, and CI/security updates#49

Release dev to main: recall refinements, stability fixes, and CI/security updates#49
GoZumie merged 32 commits intomainfrom
dev

GoZumie commented Mar 6, 2026 •

edited by greptile-apps bot

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

Uh oh!

GoZumie commented Mar 6, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

Uh oh!

GoZumie commented Mar 6, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

Uh oh!

GoZumie commented Mar 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

GoZumie commented Mar 6, 2026 • edited by greptile-apps bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Notes

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Flowchart

Comments Outside Diff (1)

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

GoZumie commented Mar 6, 2026

Fixed

Added regression tests

Local validation

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

GoZumie commented Mar 6, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

GoZumie commented Mar 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

GoZumie commented Mar 6, 2026 •

edited by greptile-apps bot

Loading