fix: enforce JSON-only resource processor output #73

Jasper-Zhang-A · 2025-11-14T13:47:29Z

Description

This PR hardens the ResourceProcessor contract so that the paper downloader always returns strict JSON, and the orchestration layer validates it before continuing. This prevents the CLI pipeline from crashing with:

Input is neither a valid file path nor JSON

when the model responds with explanations or Markdown instead of a raw JSON object.

Related Issues

No GitHub issue number, but addresses the recurring CLI failure where mixed JSON + commentary cannot be parsed as either a file path or JSON.

Changes Made

Updated the downloader prompt (prompts/code_prompts.py) to:
- Explicitly forbid Markdown fences (```json) and any explanatory text.
- Require that the final assistant message be a raw JSON object matching the expected schema.
- Clarify that any reasoning should be handled via tool calls, not in the final message.
Updated run_resource_processor (workflows/agent_orchestration_engine.py) to:
- Log the raw LLM response for easier debugging.
- Use the existing extract_clean_json(...) helper to strip any accidental leading/trailing text.
- Run json.loads(...) on the cleaned string to guarantee the output is valid JSON.
- Raise a clear error if JSON parsing fails, pointing to a contract violation instead of the generic “Input is neither a valid file path nor JSON”.

Checklist

Changes tested locally (CLI URL and file modes on a fresh Linux/AutoDL setup)
Code reviewed
Documentation updated (if necessary)
Unit tests added (if applicable)

Additional Notes

Reproduction on main:

Run the CLI on a paper URL or local PDF.
When the ResourceProcessor model returns JSON wrapped in Markdown fences or with extra commentary, the orchestration layer fails with:
- Input is neither a valid file path nor JSON
The pipeline stops even though the underlying JSON content is essentially correct.

With this PR:

The prompt enforces a JSON-only final message.
The orchestration layer cleans and validates the JSON before proceeding.
The multi-agent pipeline can continue past the resource processing stage on the same inputs.

fix: enforce JSON-only resource processor output

61d2060

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: enforce JSON-only resource processor output #73

fix: enforce JSON-only resource processor output #73

Uh oh!

Jasper-Zhang-A commented Nov 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fix: enforce JSON-only resource processor output #73

Are you sure you want to change the base?

fix: enforce JSON-only resource processor output #73

Uh oh!

Conversation

Jasper-Zhang-A commented Nov 14, 2025

Description

Related Issues

Changes Made

Checklist

Additional Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants