fix: extend envelope stripping to broader channel/system markers by jlin53882 · Pull Request #481 · CortexReach/memory-lancedb-pro

jlin53882 · 2026-04-03T06:57:21Z

Summary

Extends envelope metadata stripping (from PR #444) to cover 8 additional channel/system envelope patterns that were still leaking into memory extraction.

Changes

1. src/smart-extractor.ts — stripEnvelopeMetadata()

Added per-line stripping for 8 new patterns:
- <<<EXTERNAL_UNTRUSTED_CONTENT / <<<END_EXTERNAL_UNTRUSTED_CONTENT (Discord channel forwarded message wrappers)
- Sender (untrusted metadata): + JSON block
- Conversation info (untrusted metadata): + JSON block
- Thread starter (untrusted, for context): block
- Forwarded message context (untrusted metadata): block
- [Queued messages while agent was busy]
Fixed a pre-existing bug in PR fix: filter subagent runtime wrappers from auto-capture #444: subagent wrapper boilerplate on the same line as the wrapper prefix was not stripped. Changed to entire-line matching for reliability.

2. src/noise-filter.ts — ENVELOPE_NOISE_PATTERNS

New exported array of 8 RegExp patterns
Pre-filter in filterNoiseByEmbedding() skips envelope-pattern texts before embedding-based noise check

3. src/tools.ts — registerMemoryStoreTool

Added envelope metadata guard: rejects input matching any ENVELOPE_NOISE_PATTERNS with a clear error

4. test/strip-envelope-metadata.test.mjs

8 new regression test cases (one per new pattern)
1 mixed integration test covering all Phase 2 patterns together

Testing

node --test test/strip-envelope-metadata.test.mjs
22/22 pass

References

Closes #394 — Envelope Metadata Leak into Memory #446
Builds on PR fix: filter subagent runtime wrappers from auto-capture #444 (merged 2026-04-03)

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: aa7f6d8f4b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-03T07:00:33Z

src/smart-extractor.ts

+
+    // Pre-filter: skip known envelope patterns before embedding check
+    if (texts.length > 0) {
+      const nonEnvelopeTexts = texts.filter(t => !ENVELOPE_NOISE_PATTERNS.some(p => p.test(t)));


Do not discard mixed-content messages as envelope noise

SmartExtractor.filterNoiseByEmbedding now drops any text where ENVELOPE_NOISE_PATTERNS matches, but this runs before stripEnvelopeMetadata in the auto-capture path, so messages that contain both envelope lines and real user content are removed entirely instead of cleaned. For example, a single text containing System: [...] plus a user fact will be filtered out, which can reduce cleanTexts below minMessages and skip extraction, causing valid memories to be lost.

Useful? React with 👍 / 👎.

P1 addressed — pushed. The pre-filter has been removed from filterNoiseByEmbedding. Reasoning: in the extraction path, stripEnvelopeMetadata runs before filterNoiseByEmbedding, so the pre-filter would fire on already-cleaned text. The ENVELOPE_NOISE_PATTERNS guard is now only used in the memory_store tool path (see below).

chatgpt-codex-connector · 2026-04-03T07:00:34Z

src/tools.ts


        try {
+          // Guard: reject envelope metadata before any processing
+          if (ENVELOPE_NOISE_PATTERNS.some(p => p.test(text))) {


Avoid rejecting memory_store input on partial envelope match

The new memory_store guard rejects the whole write if any envelope regex matches (ENVELOPE_NOISE_PATTERNS.some(...)), but these patterns are line-level markers and can appear alongside legitimate memory content in the same payload. In that mixed-content case, the tool returns envelope_metadata_rejected and stores nothing, which drops explicit user-provided facts instead of preserving content after metadata stripping.

Useful? React with 👍 / 👎.

P2 addressed — pushed. The memory_store guard now calls stripEnvelopeMetadata first, then checks if the result is empty. Mixed-content messages (envelope markers + real facts) will have their envelope stripped and the facts preserved. Only purely-envelope text gets rejected.

…se 2) - Extend stripEnvelopeMetadata() with 8 new patterns: <<<EXTERNAL_UNTRUSTED_CONTENT, <<<END_EXTERNAL_UNTRUSTED_CONTENT, Sender/Conversation info (untrusted metadata), Thread starter, Forwarded message context, [Queued messages while agent was busy] - Add ENVELOPE_NOISE_PATTERNS to noise-filter.ts for pre-embedding guard - Add memory_store tool guard in tools.ts (strip-then-check approach) - Add 8 regression test cases in strip-envelope-metadata.test.mjs - Fix PR CortexReach#444 regex bug: subagent wrapper lines now stripped via entire-line matching (/^\[Subagent Context|Subagent Task\].*$/gm) - P1 fix: remove pre-filter from filterNoiseByEmbedding (runs before stripEnvelopeMetadata in extraction path, would cause false positives) - P2 fix: memory_store guard now strips first then checks if empty, preserving mixed-content messages Fixes CortexReach#446

app3apps · 2026-04-03T10:27:38Z

Current head cb636e2 is still broken: src/smart-extractor.ts has a stray } around filterNoiseByEmbedding(), which makes the module fail to parse.

I can reproduce it with:

jiti("./src/smart-extractor.ts")
node --test test/strip-envelope-metadata.test.mjs

This needs to be fixed before merge.

AliceLJY

Review: PR #481 — fix: extend envelope stripping to broader channel/system markers

The envelope stripping patterns and test coverage are good, but there are two code-level issues that need to be fixed before merge.

Codex P1 — "Do not discard mixed-content messages as envelope noise"

Valid concern, and the author has addressed it (confirmed by jlin53882's comment). The pre-filter was removed from filterNoiseByEmbedding, and the memory_store guard now strips first and only rejects if nothing remains. The current diff reflects the fix.

Codex P2 — "Avoid rejecting memory_store input on partial envelope match"

Valid concern, and the author has addressed it. The memory_store guard in tools.ts now calls stripEnvelopeMetadata(text) first, then checks if the stripped result is empty. Mixed-content messages are preserved after stripping. Correct.

Blocking issues

1. Stray closing brace breaks filterNoiseByEmbedding

In src/smart-extractor.ts, the diff shows:

  async filterNoiseByEmbedding(texts: string[]): Promise<string[]> {
    const noiseBank = this.config.noiseBank;
+
+    }
+
    if (!noiseBank || !noiseBank.initialized) return texts;

There is a stray } on its own line that prematurely closes the filterNoiseByEmbedding method. The if (!noiseBank ...) line and everything after it would then be outside the method body, causing a compile error. This appears to be a leftover from removing the envelope pre-filter block — the block's closing brace was not removed along with its opening.

2. Character encoding corruption throughout smart-extractor.ts

Multiple comment lines have their em-dash characters (—) replaced with ??. Examples:

"Smart Memory Extractor ??LLM-powered extraction pipeline" (was — LLM-powered)
"Pipeline: conversation ??LLM extract ??candidates ??dedup ??persist" (was →)
"LLM returned zero candidates ??strongest noise signal" (was —)
"Same brand, different item ??should not be deduped" (was →)
Chinese characters also corrupted: "?�欢麦�??��??�烧鸡腿??" (was 喜欢麦当劳的板烧鸡腿堡)

This is a systematic encoding issue — likely the file was saved or committed with an incorrect encoding. All corrupted characters need to be restored to their originals.

Non-blocking notes

ENVELOPE_NOISE_PATTERNS export in noise-filter.ts is well-structured and the patterns are specific enough to avoid false positives.
The 8 new test cases in strip-envelope-metadata.test.mjs provide good regression coverage.
tools.ts lost its trailing newline (\ No newline at end of file). Minor.

Please fix the stray brace and the encoding corruption, then this is ready for another look.

…ractor.ts P1: Remove stray closing brace in filterNoiseByEmbedding that caused compile error P2: Restore em-dashes (—) and arrows (→) that were corrupted to ?? throughout comments P3: Fix garbled Chinese text in preference-slot guard comment T1: Add missing trailing newline to tools.ts

jlin53882 · 2026-04-03T19:14:27Z

All blocking issues have been fixed:

P1: Removed stray } in ilterNoiseByEmbedding that caused the compile error
P2: Restored corrupted characters throughout smart-extractor.ts — em-dashes (—), arrows (→), and Chinese text all fixed
T1: Added trailing newline to ools.ts

Please re-review when you have a chance 🙏

AliceLJY

Re-reviewed after latest commit. Both blocking issues from my previous review are fixed:

Stray closing brace — removed, filterNoiseByEmbedding method body is clean ✅
Encoding corruption — CJK characters (喜欢麦当劳的板烧鸡腿堡) and em-dashes (—) are now correct ✅

Code quality:

ENVELOPE_NOISE_PATTERNS export is well-structured
stripEnvelopeMetadata extensions handle Discord/channel markers properly (per-line, no blanket rejection)
tools.ts guard correctly strips first → rejects only if empty (P2 fix)
8 new test cases provide solid regression coverage

LGTM. @rwmjhb 请 review 后合并。

…ing false positive, commit rules

jlin53882 mentioned this pull request Apr 3, 2026

#394 — Envelope Metadata Leak into Memory #446

Open

chatgpt-codex-connector bot reviewed Apr 3, 2026

View reviewed changes

jlin53882 force-pushed the fix/envelope-stripping-phase2 branch from aa7f6d8 to cb636e2 Compare April 3, 2026 07:10

AliceLJY requested changes Apr 3, 2026

View reviewed changes

AliceLJY approved these changes Apr 4, 2026

View reviewed changes

AliceLJY assigned rwmjhb Apr 4, 2026

jlin53882 added 4 commits April 5, 2026 20:27

learn: PR466 衝突修復流程 + gh comment 編碼 + PR head branch 查詢 3則

97c519d

active_state: PR466 完成，更新待合併 PR 狀態

7912f02

memory: append PR466 autoRecallTimeoutMs session to 2026-04-05

998fc8d

learn: PR466 self-review — rebase strategy, JSON conflict, Select-Str…

0fa5d7c

…ing false positive, commit rules

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: extend envelope stripping to broader channel/system markers#481

fix: extend envelope stripping to broader channel/system markers#481
jlin53882 wants to merge 6 commits intoCortexReach:masterfrom
jlin53882:fix/envelope-stripping-phase2

jlin53882 commented Apr 3, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Apr 3, 2026

Uh oh!

jlin53882 Apr 3, 2026

Uh oh!

chatgpt-codex-connector bot Apr 3, 2026

Uh oh!

jlin53882 Apr 3, 2026

Uh oh!

app3apps commented Apr 3, 2026

Uh oh!

AliceLJY left a comment

Uh oh!

jlin53882 commented Apr 3, 2026

Uh oh!

AliceLJY left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

jlin53882 commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Testing

References

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

jlin53882 Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

jlin53882 Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

app3apps commented Apr 3, 2026

Uh oh!

AliceLJY left a comment

Choose a reason for hiding this comment

Review: PR #481 — fix: extend envelope stripping to broader channel/system markers

Codex P1 — "Do not discard mixed-content messages as envelope noise"

Codex P2 — "Avoid rejecting memory_store input on partial envelope match"

Blocking issues

Non-blocking notes

Uh oh!

jlin53882 commented Apr 3, 2026

Uh oh!

AliceLJY left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jlin53882 commented Apr 3, 2026 •

edited

Loading