Skip to content

feat: add MinerU cloud API support with local/cloud toggle UI#191

Open
technoadnan wants to merge 5 commits intoTHU-MAIC:mainfrom
technoadnan:feat/mineru-cloud
Open

feat: add MinerU cloud API support with local/cloud toggle UI#191
technoadnan wants to merge 5 commits intoTHU-MAIC:mainfrom
technoadnan:feat/mineru-cloud

Conversation

@technoadnan
Copy link
Copy Markdown

@technoadnan technoadnan commented Mar 21, 2026

Summary

When configuring https://mineru.net as the MinerU base URL, requests were incorrectly routed to the self-hosted code path (POST /file_parse), causing 413 errors. The PDF would then silently fall back to unpdf instead of using MinerU cloud. This happens because the cloud API follows this structure, upload + async polling + ZIP download.

Changes

  • lib/pdf/pdf-providers.ts — detect mineru.net base URL and route to cloud v4 path
  • lib/pdf/mineru-cloud.ts — new file handling the full cloud v4 flow (upload → poll → ZIP parse)
  • lib/pdf/mineru-parser.ts — new file normalizing MinerU output into ParsedPdfContent, shared by both self-hosted and cloud paths
  • lib/pdf/types.ts — added mineruModelVersion field to PDFParserConfig to support switching between vlm and pipeline model versions

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)

Verification

Steps to reproduce the tests:

  1. Set MinerU base URL to https://mineru.net with a valid API token
  2. Upload a PDF
  3. Observe logs show cloud v4 routing, presigned upload, and successful parse
  4. Verify self-hosted deployments with other base URLs work unchanged

Proof

Before

image image

After

image

What you personally verified:

  • https://mineru.net correctly routes to cloud v4
  • Self-hosted URLs fall through to the original code path unchanged
  • tsc --noEmit passes cleanly

Checklist

  • I have performed a self-review of my code
  • My changes do not introduce new warnings

@fubaobao2023
Copy link
Copy Markdown

你的意思是https://mineru.net为mineru的base url吗?但是
image
image
我这样设置了 我去前端上传 PDF 依然解析不了 出现,
ddc6f0598c2054e7b43ca5de793e52a1
ddc6f0598c2054e7b43ca5de793e52a1

@fubaobao2023
Copy link
Copy Markdown

mineru 这个地方就不能引导性的 填写一个标准的 base URL地址吗?让用户自己去猜是不是不太好,就好比其他API接口 都有填写 这个API地址是多少

@fubaobao2023
Copy link
Copy Markdown

而且mineru的官方说明是:
image也就是:https://mineru.net/api/v4/extract/task或者https://mineru.net/api/v4/file-urls/batch,但是就算按照官方的 这两个API 填写 然后输入KEY 点击openmaic 点击测试,依然显示通过,但是去上传PDF 依然没办法解析 返回错误,你这个接口这里必须优化一下

@fubaobao2023
Copy link
Copy Markdown

的意思是https://mineru.net为mineru的base url吗? 我这样设置了我去前端上传你PDF依然解析不了出现, 图像 图像 ddc6f0598c2054e7b43ca5de793e52a1 ddc6f0598c2054e7b43ca5de793e52a1

这个提示的失败我刚刚测试了,不管是用UNPDF 还是mineru 都是返回这样的失败 需要你们进一步排查

@fubaobao2023
Copy link
Copy Markdown

image 前端对PDF的文件大小做了限制?这里可否在设置API哪里让用户 根据调用接口不一样选择呢?让用户傻瓜式操作 这是我的curl [新建文本文档.txt](https://github.com/user-attachments/files/26228793/default.txt)

@fubaobao2023
Copy link
Copy Markdown

mineru官方技术反馈 接口代码有问题
image

@fubaobao2023
Copy link
Copy Markdown

核心问题确认:
OpenMAIC调用的是自托管MinerU API(/file_parse文件上传),但是MinerU官方云API(需要url参数)

解决方案:需要修改OpenMAIC代码来适配官方API
所以这里在mineru选择这里 就要新增选项 是采用API 还是自托管

@fubaobao2023
Copy link
Copy Markdown

@wyuc @claude @technoadnan

@technoadnan technoadnan changed the title add MinerU cloud API support feat: add MinerU cloud API support with local/cloud toggle UI Apr 1, 2026
@technoadnan
Copy link
Copy Markdown
Author

@fubaobao2023

The core issue has been confirmed: OpenMAIC is calling the self-hosted MinerU API (/file_parse file upload), but the official MinerU cloud API (requires URL parameters) is not.
Solution: The OpenMAIC code needs to be modified to adapt to the official API. Therefore, in the Mineru settings, you need to add an option to choose between using the API or self-hosting.

Solution added
I have added support for both the official MinerU cloud API and local. The settings UI now has separate Cloud and Local configuration sections, each with their own Base URL, API Key, and a test connection button. The routing logic automatically detects whether the provided Base URL points to mineru.net (cloud v4) or a self-hosted instance and calls the correct API accordingly.

Proof

image
test.mp4

Copy link
Copy Markdown
Contributor

@wyuc wyuc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few things to address:

  1. Cloud and self-hosted should be two separate providers (parallel tabs), not merged into one. They have different protocols, different config fields, and different auth. Matches how other provider categories work in the project (e.g. TTS providers are parallel tabs).

  2. extractMinerUResult in pdf-providers.ts still exists alongside the new one in mineru-parser.ts. Remove the old one and import from the shared module.

  3. sourceFileName is never passed to the config in parse-pdf/route.ts, so cloud uploads always use document.pdf. Pass pdfFile.name through.

  4. Test connection for cloud URLs just does a GET on the base URL — invalid API keys still show "success". Either validate credentials with a real API call, or show a disclaimer.

  5. Cloud request URL hint in pdf-settings.tsx shows /file_parse but cloud v4 uses /file-urls/batch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants