Skip to content

fix(self-reflection): robust JSON extraction with markdown fallback#78

Merged
jorparad merged 2 commits intomasterfrom
fix/self-reflection-context-accumulation
Dec 9, 2025
Merged

fix(self-reflection): robust JSON extraction with markdown fallback#78
jorparad merged 2 commits intomasterfrom
fix/self-reflection-context-accumulation

Conversation

@anabelle
Copy link
Owner

@anabelle anabelle commented Dec 9, 2025

Summary

Fixes #28 - Self-Reflection Context Accumulation Issues

Problem

The self-reflection system suffered from context accumulation issues:

  1. Weak JSON Extraction: Only tried to parse first JSON block, failed silently on malformed responses
  2. No Markdown Fallback: Unlike the fixed timeline lore system, no fallback for markdown-formatted LLM responses
  3. Missing Validation: No structural validation of reflection data before including in context
  4. Insufficient Prompt Emphasis: JSON output requirements were not strict enough

Solution

1. Enhanced JSON Extraction with Validation

  • _extractJson() now validates that parsed JSON has required fields (strengths, weaknesses, recommendations)
  • Falls back to markdown extraction if JSON is malformed or missing fields

2. Markdown Fallback Extraction

  • New _extractFieldsFromMarkdown() method parses common markdown patterns:
    • Headers with bullet points (### Strengths:, - item)
    • Alternative labels (What you're doing well:, Actionable changes:)
    • Quoted examples (Best reply: "text")
    • Unicode bullets (•), asterisks (*), hyphens (-)

3. Validation Methods

  • _isValidReflection(): Strict validation requiring all three arrays
  • _hasMinimalReflectionData(): Lenient validation for markdown (requires 2+ populated fields)
  • getReflectionHistory() now filters using these validators

4. Strengthened Prompt

  • Changed OUTPUT JSON ONLY: to explicit requirements:
    IMPORTANT: OUTPUT VALID JSON ONLY - NO MARKDOWN, NO EXPLANATIONS, NO CODE BLOCKS.
    Your entire response must be a single valid JSON object with this exact structure:
    

Testing

  • Added 21 new tests in selfReflection.extraction.test.js
  • Tests cover:
    • Valid JSON extraction
    • JSON embedded in text
    • JSON validation failures
    • Markdown header/bullet extraction
    • Alternative label formats
    • Quoted example extraction
    • Improvements/regressions extraction
    • Edge cases (null, undefined, empty)
    • Integration scenarios (mixed JSON/markdown)
  • All 343 tests pass

Impact

  • Prevents context loss when LLM returns non-JSON responses
  • Ensures longitudinal self-reflection works correctly
  • Aligns with the timeline lore system's robust extraction approach

Summary by CodeRabbit

  • New Features
    • Add markdown-based extraction so reflections can be parsed from free-form markdown when JSON is missing or incomplete.
    • Switch to a JSON-first extraction flow with graceful markdown fallback when needed.
  • Bug Fixes
    • Tighten filtering to exclude reflections lacking minimal valid analysis.
    • Improve handling and logging for malformed or incomplete reflection data.
  • Tests
    • Add extensive tests covering JSON and markdown extraction, validation, and fallback scenarios.

✏️ Tip: You can customize this high-level summary in your review settings.

Fixes #28 - Self-Reflection Context Accumulation Issues

Changes:
- Enhanced _extractJson() with markdown fallback when JSON parsing fails
- Added _extractFieldsFromMarkdown() to parse markdown-formatted LLM responses
- Added _isValidReflection() for strict JSON structure validation
- Added _hasMinimalReflectionData() for lenient markdown validation
- Updated getReflectionHistory() to filter out malformed reflections
- Strengthened prompt output requirements to emphasize JSON-only output

Testing:
- Added 21 new tests for extraction methods
- Tests cover: valid JSON, embedded JSON, markdown headers/bullets,
  quoted examples, edge cases, and mixed format scenarios
- All 343 tests pass
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 9, 2025

Walkthrough

Switch reflection extraction to JSON-first parsing with a markdown fallback, add markdown parsing to build structured reflections, and introduce validators to filter reflections missing required or minimal data. Tests for JSON and markdown extraction and validators were added.

Changes

Cohort / File(s) Summary
Extraction & Validation (core)
plugin-nostr/lib/selfReflection.js
Add _extractFieldsFromMarkdown(text), _isValidReflection(analysis), and _hasMinimalReflectionData(analysis); change extraction flow to try JSON parse first, then fall back to markdown parsing when JSON is missing/invalid; skip stored reflections that fail minimal-data checks.
Tests
plugin-nostr/test/selfReflection.extraction.test.js
Add comprehensive unit tests covering JSON extraction (valid/invalid), markdown fallback (headers, bullets, semicolons, quoted examples), validator edge cases, and integration scenarios (JSON preferred, markdown fallback).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Inspect regexes and parsing logic in _extractFieldsFromMarkdown for edge cases (bullet types, quoted blocks, short items).
  • Verify _isValidReflection and _hasMinimalReflectionData correctly enforce required arrays and minimal-field acceptance.
  • Confirm integration points where extraction is used (history processing) now skip invalid reflections without breaking flow.
  • Run the new test suite and check for flaky cases around whitespace/Unicode bullets.

Possibly related issues

  • #28: Self-Reflection Context Accumulation Issues — This PR implements JSON-first + markdown fallback and stronger validation, addressing the core causes described (weak JSON extraction, no markdown fallback, insufficient filtering).
  • Test coverage for selfReflection.js (39.89% → 100%) #53 — Matches same module and methods added (_extractFieldsFromMarkdown, validation helpers) and increases test coverage for extraction/fallback behavior.

Poem

🐰 I nibble on lines both strict and free,
I parse the JSON, then the markdown spree,
I fetch the strengths, the fixes, and the why,
Saving wise hops so lessons don't fly—
A rabbit’s reflection, tidy and spry.

Pre-merge checks and finishing touches

✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely summarizes the main change: adding robust JSON extraction with markdown fallback to the self-reflection system.
Linked Issues check ✅ Passed All primary objectives from issue #28 are met: JSON extraction enhanced with validation, markdown fallback implemented with field extraction, reflection history filtered with validators, and prompt strengthened.
Out of Scope Changes check ✅ Passed All changes directly address issue #28 requirements; no unrelated modifications detected in JSON/markdown extraction, validation, filtering, or test coverage.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/self-reflection-context-accumulation

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
plugin-nostr/lib/selfReflection.js (1)

1359-1384: Regex pattern may cause ReDoS with crafted input.

The extractListItems function builds a regex from a pattern and applies it to potentially large, untrusted LLM output. The nested quantifiers ([^\\n]+(?:\\n[-*•]\\s*[^\\n]+)*) combined with the gi flags could cause catastrophic backtracking on adversarial input.

Consider adding input length limits or using a simpler, iterative parsing approach:

   const extractListItems = (pattern) => {
     const matches = [];
+    // Limit input to prevent ReDoS on large responses
+    const limitedText = text.slice(0, 10000);
     const regex = new RegExp(pattern + '[:\\s]*([^\\n]+(?:\\n[-*•]\\s*[^\\n]+)*)', 'gi');
-    const match = text.match(regex);
+    const match = limitedText.match(regex);
plugin-nostr/test/selfReflection.extraction.test.js (1)

207-251: Consider adding test for very short items being filtered.

The implementation filters items with length > 3. A test verifying this behavior would document the intentional filtering of short/noise items.

+    it('filters out very short items (length <= 3)', () => {
+      const text = `Strengths:
+- OK
+- A valid strength item
+- Yes`;
+
+      const result = engine._extractFieldsFromMarkdown(text);
+      expect(result.strengths).toEqual(['A valid strength item']);
+      expect(result.strengths).not.toContain('OK');
+      expect(result.strengths).not.toContain('Yes');
+    });
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b85ac19 and 8b5fe6b.

📒 Files selected for processing (2)
  • plugin-nostr/lib/selfReflection.js (3 hunks)
  • plugin-nostr/test/selfReflection.extraction.test.js (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
plugin-nostr/test/selfReflection.extraction.test.js (1)
plugin-nostr/lib/selfReflection.js (3)
  • require (1-1)
  • require (2-2)
  • require (3-3)
🔇 Additional comments (11)
plugin-nostr/lib/selfReflection.js (6)

489-492: LGTM! Good validation filtering.

The reflection history now properly filters out invalid reflections using _hasMinimalReflectionData, which aligns with the PR objective to ensure only valid reflections are included in context.


1295-1296: LGTM! Prompt strengthening addresses root cause.

The explicit "NO MARKDOWN, NO EXPLANATIONS, NO CODE BLOCKS" instruction directly addresses the issue of LLMs returning non-JSON responses.


1312-1340: LGTM! Solid JSON-first extraction with fallback.

The extraction flow correctly:

  1. Attempts JSON parsing first
  2. Validates parsed JSON has required fields
  3. Falls back to markdown extraction when JSON fails or is incomplete
  4. Returns null only when both paths fail

The debug logging at each step aids troubleshooting.


1395-1403: Good handling of curly/straight quotes for example extraction.

The regex correctly handles both straight quotes (") and Unicode curly quotes (\u201c, \u201d), which is important since LLMs may output either format.


1408-1418: LGTM! Clear strict validation logic.

_isValidReflection enforces the required contract: strengths, weaknesses, and recommendations must all be arrays. This matches the PR objective for strict JSON validation.


1420-1428: LGTM! Lenient validation for markdown fallback.

_hasMinimalReflectionData provides appropriate flexibility for markdown-extracted data by requiring only 2+ populated fields, acknowledging that markdown responses may not have all fields.

plugin-nostr/test/selfReflection.extraction.test.js (5)

1-11: LGTM! Clean test setup.

Minimal mock runtime is appropriate for unit testing the extraction logic. Using console as the logger is acceptable for tests.


13-60: LGTM! Good coverage of JSON extraction scenarios.

Tests cover the happy path, embedded JSON in text, and validation failure for missing required fields. The assertions verify the correct fields are extracted.


62-162: LGTM! Comprehensive markdown fallback tests.

Good coverage of:

  • Headers with bullet points
  • Alternative label formats ("What you're doing well")
  • Quoted example extraction
  • Improvements and regressions
  • Insufficient data returns null

These tests validate the PR's core objective of handling non-JSON LLM responses.


164-205: LGTM! Validation helper tests cover edge cases.

Both _isValidReflection and _hasMinimalReflectionData are tested with valid inputs, missing arrays, null, empty objects, and boundary conditions (exactly 2 fields vs. 1 field).


253-314: LGTM! Integration tests verify prioritization logic.

Tests correctly verify that:

  1. Valid JSON is preferred over markdown
  2. Malformed JSON triggers markdown fallback
  3. JSON with missing required fields triggers markdown fallback

This matches the intended behavior from the PR objectives.

- Add input length limit (10KB) to prevent ReDoS on large responses
- Fix regex pattern to prevent 'your' from matching 'you're/youre'
- Add tests for short item filtering (length > 3)
- Add test for 'your' vs 'you're' distinction

All 345 tests pass.
@anabelle
Copy link
Owner Author

anabelle commented Dec 9, 2025

Addressed CodeRabbit Feedback

Fixed all issues from the review:

1. ⚠️ Critical: Regex Pattern Bug (Fixed)

Changed from you'?re? to explicit (?:you're|youre) to prevent 'your' from matching. Added test to verify.

2. 🧹 Nitpick: ReDoS Prevention (Fixed)

Added input length limit: const limitedText = text.length > 10000 ? text.slice(0, 10000) : text;

3. 🧹 Nitpick: Short Item Filtering Test (Added)

Added test verifying items with length ≤ 3 are filtered out.

All 345 tests pass ✅

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
plugin-nostr/lib/selfReflection.js (1)

1342-1409: LGTM! Well-implemented markdown extraction with proper safeguards.

The markdown extraction logic is solid:

  • ReDoS protection via 10,000-char limit (line 1360)
  • Comprehensive regex patterns covering various markdown formats and alternative labels
  • Extraction of both bullet lists and quoted examples
  • Proper handling of fancy quotes in addition to standard quotes

The regex pattern at line 1390 correctly requires the complete "you're" or "youre" string, preventing false matches with "your". The inline comment correctly documents this fix.

Optional: Consider edge case in comma/semicolon splitting.

Lines 1380-1381 split non-bullet content on , or ;, which could incorrectly split items containing these punctuation marks (e.g., "Be concise, clear, and direct" → ["Be concise", "clear", "and direct"]). This is likely an acceptable tradeoff for the fallback extraction, but you may want to add a minimum length check or only split if multiple substantial phrases are detected.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8b5fe6b and 87343c5.

📒 Files selected for processing (2)
  • plugin-nostr/lib/selfReflection.js (3 hunks)
  • plugin-nostr/test/selfReflection.extraction.test.js (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • plugin-nostr/test/selfReflection.extraction.test.js
🔇 Additional comments (5)
plugin-nostr/lib/selfReflection.js (5)

489-490: LGTM! Improved reflection history filtering.

The addition of _hasMinimalReflectionData validation ensures only reflections with meaningful content are included in the context, preventing accumulation of malformed entries. This directly addresses the context accumulation issues described in the linked issue.


1295-1296: LGTM! Strengthened prompt instructions.

The explicit requirement for a single valid JSON object with no markdown, explanations, or code blocks directly addresses the root cause identified in issue #28. This defensive prompt engineering should reduce the frequency of markdown responses.


1317-1340: LGTM! Robust extraction flow with proper fallback.

The enhanced extraction logic correctly implements the JSON-first approach with markdown fallback:

  1. Attempts JSON extraction and validation
  2. Falls back to markdown parsing on failure
  3. Uses appropriate validators for each format (_isValidReflection for JSON, _hasMinimalReflectionData for markdown)
  4. Logs each step for debugging

This addresses the core issue of silent failures when LLM returns non-JSON responses.


1411-1421: LGTM! Appropriate strict validation for JSON reflections.

The strict validation requiring strengths, weaknesses, and recommendations arrays ensures JSON-parsed reflections have the complete expected structure. This validator is appropriately strict for JSON input while allowing the more lenient _hasMinimalReflectionData for markdown-extracted content.


1423-1431: LGTM! Appropriately lenient validation for markdown extraction.

The lenient validation requiring at least 2 populated fields strikes a good balance for markdown-extracted content. This allows reflections with partial data to be preserved while filtering out truly empty or malformed entries. The threshold of 2 fields is reasonable and prevents loss of valuable insights when the LLM returns markdown instead of JSON.

@jorparad jorparad merged commit 91681a3 into master Dec 9, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Self-Reflection Context Accumulation Issues

2 participants