Skip to content

Conversation

@Jasper-Zhang-A
Copy link

Description

This PR hardens the ResourceProcessor contract so that the paper downloader always returns strict JSON, and the orchestration layer validates it before continuing. This prevents the CLI pipeline from crashing with:

Input is neither a valid file path nor JSON

when the model responds with explanations or Markdown instead of a raw JSON object.

Related Issues

  • No GitHub issue number, but addresses the recurring CLI failure where mixed JSON + commentary cannot be parsed as either a file path or JSON.

Changes Made

  • Updated the downloader prompt (prompts/code_prompts.py) to:

    • Explicitly forbid Markdown fences (```json) and any explanatory text.
    • Require that the final assistant message be a raw JSON object matching the expected schema.
    • Clarify that any reasoning should be handled via tool calls, not in the final message.
  • Updated run_resource_processor (workflows/agent_orchestration_engine.py) to:

    • Log the raw LLM response for easier debugging.
    • Use the existing extract_clean_json(...) helper to strip any accidental leading/trailing text.
    • Run json.loads(...) on the cleaned string to guarantee the output is valid JSON.
    • Raise a clear error if JSON parsing fails, pointing to a contract violation instead of the generic “Input is neither a valid file path nor JSON”.

Checklist

  • Changes tested locally (CLI URL and file modes on a fresh Linux/AutoDL setup)
  • Code reviewed
  • Documentation updated (if necessary)
  • Unit tests added (if applicable)

Additional Notes

Reproduction on main:

  1. Run the CLI on a paper URL or local PDF.
  2. When the ResourceProcessor model returns JSON wrapped in Markdown fences or with extra commentary, the orchestration layer fails with:
    • Input is neither a valid file path nor JSON
  3. The pipeline stops even though the underlying JSON content is essentially correct.

With this PR:

  • The prompt enforces a JSON-only final message.
  • The orchestration layer cleans and validates the JSON before proceeding.
  • The multi-agent pipeline can continue past the resource processing stage on the same inputs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants