Conversation
Bumps the npm_and_yarn group with 1 update in the /docs directory: [svgo](https://github.com/svg/svgo). Updates `svgo` from 4.0.0 to 4.0.1 - [Release notes](https://github.com/svg/svgo/releases) - [Commits](svg/svgo@v4.0.0...v4.0.1) --- updated-dependencies: - dependency-name: svgo dependency-version: 4.0.1 dependency-type: indirect dependency-group: npm_and_yarn ... Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: GoZumie <258471731+GoZumie@users.noreply.github.com>
- Add `temporal_intent` module to `wagl-core` with:
- `TemporalHint` struct (window_start_hours, window_end_hours, boost)
- `parse_temporal_intent()` recognizing: yesterday, today, recent/recently/lately,
last night, this morning/afternoon/evening, last week (most-specific first,
case-insensitive)
- 14 unit tests covering all keywords, boundary values, and specificity ordering
- Update `wagl recall` composite scoring:
- Without temporal hint: unchanged (semantic*0.5 + salience*0.2 + recency*0.15 + ev*0.15)
- With temporal hint: semantic*0.35 + salience*0.15 + recency*0.10 + ev*0.10 + boost
(up to 0.30 for in-window items, 0.0 for out-of-window items, capped at 1.0)
- Items outside the temporal window score at most 0.70; in-window items score up to 1.00
- Emit `meta.temporal_intent` (null when no hint) and update `meta.weights`
- Add CLI integration tests:
- yesterday hint makes 36h-old item rank above 30-day-old identical item
- non-temporal query emits null temporal_intent and uses standard weights
Co-authored-by: GoZumie <258471731+GoZumie@users.noreply.github.com>
Co-authored-by: GoZumie <258471731+GoZumie@users.noreply.github.com>
Co-authored-by: GoZumie <258471731+GoZumie@users.noreply.github.com>
Co-authored-by: GoZumie <258471731+GoZumie@users.noreply.github.com>
Co-authored-by: GoZumie <258471731+GoZumie@users.noreply.github.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 547085acaf
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
Addressed the review findings in commit Fixed
Added regression tests
Local validation
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 75538d4a56
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
Follow-up pass complete for newly surfaced conversations. Implemented and pushed
Local validation:
I reviewed all current PR review conversations and resolved the addressed threads. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: cd2208e627
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
I reviewed the two newly surfaced P2 findings. They are valid refinement items but not release blockers for this PR, so I tracked them in backlog issues:
Given current behavior and risk profile, I recommend proceeding with PR approval and handling these in the next refinement pass. |
Summary
Promote
devtomainso we can proceed with next integration/release steps.This includes recent recall and stability work, notably:
Notes
dev→mainPR Review by Greptile
Greptile Summary
This PR promotes
dev→main, bringing a substantial recall-quality improvement batch that refines howwagl recallretrieves, ranks, and budgets its output. The core changes introduce configurable scoring weights (--salience-weight,--dscore-weight), a keyword-fallback expansion step for multilingual queries, temporal-intent detection (yesterday,last week, etc.) with per-window score boosts, multi-pass context packs (high-valence and open-todo passes), and output budget controls (--max-canon,--min-relevant). On the infrastructure side, the sqlite-vec loading is moved fromsqlite3_auto_extension(pre-connection, caused a threading panic) to a per-connectionload_extensioncall, and the CI workflow gains apermissions: contents: readscope restriction.Key points:
salience=0.15in temporal mode (it usesw_salience, default 0.2) and states out-of-window items cap at0.70(actual max with defaults is0.85).last_weektemporal window starts at 0 h: unlikeyesterday(24–48 h) andlast_night(8–32 h), thelast_weekwindow includes items created moments ago (window_start_hours: 0.0), which is semantically inconsistent.RECALL_FIXTURES.mdscoring formula is stale: the newnorm_abs_dscore × w_dscoreterm is omitted, and the table does not reflect that salience and dscore weights are now configurable; the example JSON indocs/cli/recall.mdalso omits the newdscorekey frommeta.weights.composite_score(already clamped inside the match arm, then shadowed by an identical clamp).recall_quality.rs) and the comprehensive smoke tests are a strong addition for ongoing regression safety.Confidence Score: 3/5
last_weektemporal window starting at 0 h is a semantic bug that will silently boost items created seconds ago for "last week" queries, and the inaccurate comment about the temporal scoring ceiling could mislead future contributors tuning weights. The stale scoring formulas in newly introduced docs compound the confusion. These are non-trivial correctness concerns in the core ranking logic, though they do not affect data integrity or security.crates/core/src/temporal_intent.rs(last_week window bounds) andcrates/cli/src/main.rs(temporal scoring comment and double-clamp). Also reviewdocs/RECALL_FIXTURES.mdanddocs/cli/recall.mdfor the stale scoring formula.Important Files Changed
recallcommand: adds configurable score weights, keyword fallback, temporal intent, multi-pass context, and output budget controls. Two issues found: an inaccurate comment about temporal-mode max score ceiling (salience weight stated as 0.15 in comment but 0.2 in code), and a redundant double-clamp oncomposite_score.parse_temporal_intentfor detecting time-window keywords in recall queries. Thelast_weekwindow is inconsistently defined (starts at 0 h, meaning items created right now qualify) compared to the analogousyesterdayandlast_nightpatterns which exclude the most-recent period.sqlite3_auto_extension-based approach with a per-connectionload_extensioncall to fix the libsql threading assertion panic (issue #34). Theload_extension_disableis always invoked before the load result is propagated, correctly ensuring extensions are never left enabled on error.query_recent_high_valenceandquery_open_todosDB methods along with a sharedcollect_memory_rowshelper. Thequery_open_todosSQL usesLOWER(tags) LIKE '%...'patterns which perform full table scans, but this is acceptable given the existing schema has no tag index.norm_abs_dscore × w_dscoreterm added in this PR and the quick-reference table does not mention that salience and dscore weights are now configurable.permissions: contents: readdeclaration to limit the workflow's default token scope — a straightforward, correct security hardening change.recall. The example JSON output is missing the newdscorekey inmeta.weightsthat the code now always emits.Flowchart
%%{init: {'theme': 'neutral'}}%% flowchart TD A[wagl recall query] --> B[Resolve weights\nw_salience, w_dscore\nCLI > env > default] B --> C[Fetch canon items\nall canon:* tags] C --> D[Dedup canon\n1 per tag, text-dedup\ncap at max_canon] D --> E{Semantic search\navailable?} E -- Yes --> F[Vector similarity search\nget semantic_scores] E -- No --> G[Text LIKE query] F --> H{max_score >=\nfallback_threshold?} G --> H H -- No / text-only --> I[Keyword fallback\ntokenize query\nper-token LIKE searches] H -- Yes --> J[Filter out all canon: items] I --> J J --> K[parse_temporal_intent\ndetect time-window keywords] K --> L{Temporal hint\npresent?} L -- Yes --> M[Temporal scoring\nsemantic×0.35 + salience×w_s\n+ recency×0.10 + ev×0.10\n+ dscore×w_d + temporal_boost] L -- No --> N[Base scoring\nsemantic×0.5 + salience×w_s\n+ recency×0.15 + ev×0.15\n+ dscore×w_d] M --> O[Sort by rank_score DESC\nclamp composite_score to 0–1] N --> O O --> P[Truncate to\nmax limit, min_relevant] P --> Q{--multi-pass?} Q -- Yes --> R[Pass 2: high-valence\nquery_recent_high_valence] Q -- Yes --> S[Pass 3: open todos\nquery_open_todos] Q -- No --> T[Output JSON] R --> T S --> TComments Outside Diff (1)
crates/cli/src/main.rs, line 1319-1323 (link)Inaccurate comment: temporal weights and max-score ceiling are both wrong
The comment states that salience weight is
0.15in temporal mode and that out-of-window items score at most0.70, but neither is correct.salience * w_saliencewherew_saliencedefaults to0.2(same as the non-temporal path), not0.15as the comment implies.semantic=0.35, salience=0.2, recency=0.10, ev=0.10, dscore=0.1), the maximum score for an out-of-window item is0.35 + 0.2 + 0.10 + 0.10 + 0.1 = 0.85, not0.70.The
0.70ceiling only holds whenw_salience=0.15andw_dscore=0.0, which are non-default values. The comment appears to pre-date the addition of the configurable dscore weight and was never updated.Last reviewed commit: 547085a