Convert any URL to clean Markdown, with built-in support for login-required pages (X/Twitter, WeChat, Feishu/Lark docs, etc.)
将任意 URL 转为干净的 Markdown,支持需要登录的页面(X/Twitter、微信公众号、飞书文档等)。
Send any URL to Claude, and it automatically fetches the full content as Markdown. Five content types have dedicated extraction:
| URL Type | Method | Why |
|---|---|---|
WeChat Articles (mp.weixin.qq.com) |
Built-in Playwright script | Anti-scraping protection requires headless browser |
Feishu/Lark Docs (feishu.cn, larksuite.com) |
Built-in Feishu API script | Requires API authentication, auto-converts to Markdown |
| YouTube | Dedicated YouTube skill | Video content has its own toolchain |
| PDF (remote or local) | Built-in PDF extraction (extract_pdf.sh) |
Three-method cascade: marker-pdf → pdftotext → pypdf |
| All other URLs | Proxy cascade via fetch.sh: r.jina.ai → defuddle.md → agent-fetch |
Free, no API key, content validation built-in |
- Claude Code installed
- curl (built-in on macOS/Linux)
- (Optional - WeChat scraping) Python 3.8+ with playwright
pip install playwright beautifulsoup4 lxml playwright install chromium
- (Optional - PDF extraction) One of:
- marker-pdf (best quality):
pip install marker-pdf - pdftotext (fast):
brew install poppler - pypdf (fallback):
pip install pypdf
- marker-pdf (best quality):
- (Optional - Proxy fallback) agent-fetch
npx agent-fetch --help # No pre-install needed, npx auto-downloads - (Optional - Feishu docs) Environment variables
FEISHU_APP_IDandFEISHU_APP_SECRETecho $FEISHU_APP_ID # Verify configured
npx skills add joeseesun/qiaomu-markdown-proxyVerify:
ls ~/.claude/skills/qiaomu-markdown-proxy/SKILL.mdJust send Claude a URL:
- "Read this article: https://example.com/post"
- "Fetch this tweet: https://x.com/user/status/123456"
- "Read this WeChat article: https://mp.weixin.qq.com/s/abc123"
- "Convert this Feishu doc to Markdown: https://xxx.feishu.cn/docx/xxxxxxxx"
- r.jina.ai — Most complete content, preserves image links
- defuddle.md — Cleaner output with YAML frontmatter
- agent-fetch — Local tool, no network proxy needed
- defuddle CLI — Local CLI, good for standard web pages
Built-in fetch_feishu.py script fetches documents via Feishu Open API and auto-converts to Markdown:
- Supports new docs (docx), legacy docs (doc), and wiki pages
- Auto-parses document blocks into Markdown format
- Supports headings, lists, code blocks, quotes, todos, equations, images, etc.
- Requires
FEISHU_APP_IDandFEISHU_APP_SECRETenvironment variables - App needs
docx:document:readonlypermission
| Issue | Solution |
|---|---|
| WeChat scraping fails | Run playwright install chromium to install browser |
| Feishu returns permission error | Check FEISHU_APP_ID and FEISHU_APP_SECRET env vars, confirm app has document read permission |
| Feishu wiki page fails | Confirm app has wiki:wiki:readonly permission |
| r.jina.ai returns empty | Auto-falls back to defuddle.md (no action needed) |
| All proxies fail | URL may have strict auth restrictions, try npx agent-fetch |
- r.jina.ai — Free URL-to-Markdown proxy by Jina AI
- defuddle.md — Clean article extraction service
- agent-fetch — Local URL content extraction tool
- Playwright — Browser automation for WeChat scraping
- Feishu Open Platform — Feishu Document API
给 Claude 发一个 URL,自动抓取完整内容并转为 Markdown。支持五种内容类型的专用抓取:
| URL 类型 | 抓取方式 | 原因 |
|---|---|---|
微信公众号 (mp.weixin.qq.com) |
内置 Playwright 脚本 | 公众号有反爬,需无头浏览器 |
飞书文档 (feishu.cn/docx/, /wiki/, /docs/) |
内置飞书 API 脚本 | 需要 API 认证,自动转 Markdown |
| YouTube | 专用 YouTube skill | 视频内容有专用工具链 |
| PDF(远程 URL 或本地文件) | 内置 PDF 提取(extract_pdf.sh) |
三级 fallback:marker-pdf → pdftotext → pypdf |
| 其他所有 URL | 代理级联 fetch.sh:r.jina.ai → defuddle.md → agent-fetch |
免费、无需 API key、内置内容验证 |
- 已安装 Claude Code
- curl(macOS/Linux 自带)
- (可选 - 公众号抓取)Python 3.8+ 及 playwright
pip install playwright beautifulsoup4 lxml playwright install chromium
- (可选 - PDF 提取)以下任一:
- marker-pdf(最佳质量):
pip install marker-pdf - pdftotext(速度快):
brew install poppler - pypdf(兜底):
pip install pypdf
- marker-pdf(最佳质量):
- (可选 - 代理降级)agent-fetch
npx agent-fetch --help # 无需预装,npx 自动下载 - (可选 - 飞书抓取)环境变量
FEISHU_APP_ID和FEISHU_APP_SECRETecho $FEISHU_APP_ID # 验证已配置
npx skills add joeseesun/qiaomu-markdown-proxy验证:
ls ~/.claude/skills/qiaomu-markdown-proxy/SKILL.md直接给 Claude 发 URL:
- "帮我读一下这篇文章:https://example.com/post"
- "抓取这条推文:https://x.com/user/status/123456"
- "读一下这篇公众号:https://mp.weixin.qq.com/s/abc123"
- "把这个飞书文档转成 Markdown:https://xxx.feishu.cn/docx/xxxxxxxx"
- "读一下这个飞书知识库页面:https://xxx.feishu.cn/wiki/xxxxxxxx"
- "提取这个 PDF:https://example.com/paper.pdf"
- "转换本地 PDF:/path/to/document.pdf"
- r.jina.ai — 内容最完整,保留图片链接
- defuddle.md — 输出更干净,带 YAML frontmatter
- agent-fetch — 本地工具,无需网络代理
- defuddle CLI — 本地 CLI,适合普通网页
内置 fetch_feishu.py 脚本,通过飞书开放 API 抓取文档内容并自动转为 Markdown:
- 支持新版文档(docx)、旧版文档(doc)、知识库页面(wiki)
- 自动解析文档 blocks 并转换为 Markdown 格式
- 支持标题、列表、代码块、引用、待办、公式、图片等
- 需要飞书应用的
FEISHU_APP_ID和FEISHU_APP_SECRET环境变量 - 应用需要
docx:document:readonly权限
| 问题 | 解决方法 |
|---|---|
| 公众号抓取失败 | 运行 playwright install chromium 安装浏览器 |
| 飞书文档返回权限错误 | 检查 FEISHU_APP_ID 和 FEISHU_APP_SECRET 环境变量,确认应用有文档读取权限 |
| 飞书知识库页面抓取失败 | 确认应用有 wiki:wiki:readonly 权限 |
| PDF 提取失败 | 安装任一工具:pip install marker-pdf、brew install poppler、pip install pypdf |
| r.jina.ai 返回空内容 | 自动降级到 defuddle.md(无需手动操作) |
| 所有代理都失败 | URL 可能有严格认证限制,尝试 npx agent-fetch |
- r.jina.ai — Jina AI 提供的免费 URL 转 Markdown 代理
- defuddle.md — 干净的文章提取服务
- agent-fetch — 本地 URL 内容提取工具
- Playwright — 微信公众号抓取的浏览器自动化
- 飞书开放平台 — 飞书文档 API
- X (Twitter): @vista8
- 微信公众号「向阳乔木推荐看」
