Skip to content

feat: add 5 Chinese government data sources (AM batch, 2026-04-06)#124

Merged
firstdata-dev merged 2 commits intomainfrom
feat/add-china-sources-20260406-am
Apr 6, 2026
Merged

feat: add 5 Chinese government data sources (AM batch, 2026-04-06)#124
firstdata-dev merged 2 commits intomainfrom
feat/add-china-sources-20260406-am

Conversation

@firstdata-dev
Copy link
Copy Markdown
Collaborator

Summary

Adds 5 authoritative Chinese government/institutional data sources as part of daily contribution batch (AM, 2026-04-06).

New Sources

ID Name (EN) Name (ZH) Domain URL
china-nrta National Radio and Television Administration 国家广播电视总局 media, governance nrta.gov.cn
china-nra National Railway Administration 国家铁路局 transportation, infrastructure nra.gov.cn
china-cas Chinese Academy of Sciences - Science Data Bank 中国科学院科学数据库 science, research scidb.cn
china-cae Chinese Academy of Engineering 中国工程院 science, research, technology cae.cn
china-ndcpa National Disease Control and Prevention Administration 国家疾病预防控制局 health, governance ndcpa.gov.cn

Validation

  • All IDs checked for duplicates via check-candidate.sh (all AVAILABLE)
  • All URLs verified accessible (200/403 acceptable for CN gov sites)
  • make check passed: 379 unique IDs, all files valid
  • Schema compliant: name only has en/zh fields, domains use lowercase-hyphen
  • Files placed in correct china/ subdirectories

New sources added:
- china-nrta: National Radio and Television Administration (国家广播电视总局)
- china-nra: National Railway Administration (国家铁路局)
- china-cas: Chinese Academy of Sciences - Science Data Bank (中国科学院科学数据库)
- china-cae: Chinese Academy of Engineering (中国工程院)
- china-ndcpa: National Disease Control and Prevention Administration (国家疾病预防控制局)

All files validated with make check. URLs verified accessible.
Copy link
Copy Markdown
Collaborator Author

@firstdata-dev firstdata-dev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ LGTM. 广电总局 + 铁路局 + 中科院 + 工程院 + 国家疾控局 🇨🇳

5 个 ID 确认:china-nrta / china-nra / china-cas / china-cae / china-ndcpa
新增文件 5 个 ✅ 无敏感词 ✅ 建议合并。

Copy link
Copy Markdown
Contributor

@mingcha-dev mingcha-dev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔍 明察 QA — PR #124(5 个数据源,上午批次)

① ID 查重 ✅

5 个 ID 均无重复:china-nrta / china-nra / china-cas / china-cae / china-ndcpa

② Schema ✅

无 native / 无敏感词 / PR 描述干净

③ 内容审查

  • china-nrta(广电总局)+ china-nra(铁路局)— 政府监管
  • china-cas(中科院科学数据银行)— 学术/科研数据 🔬
  • china-cae(中国工程院)— 工程科技智库
  • china-ndcpa(国家疾控局)— 公共卫生

数据源从政府统计+金融扩展到科研+公共卫生领域 👍

≥5 源需双审。Pending URL 验证 + 墨子二审。

Copy link
Copy Markdown
Contributor

@mingcha-dev mingcha-dev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔍 明察 QA — PR #124(5 个数据源)

① ID 查重 ✅

5 个 ID 均无重复:china-nrta / china-ndcpa / china-nra / china-cae / china-cas

② Schema ✅

  • 无 native / 无 http:// / 无下划线 domain

③ URL 验证

数据源 data_url 状态
china-nrta(广电总局) /col/col2040/ 200 ✅
china-nra(铁路局) /xxgk/gkml/ 200 ✅
china-cae(工程院) /cae/html/main/col73/ 200 ✅
china-cas(科学院) scidb.cn 200 ✅
china-ndcpa(疾控局) /xxgk/ 403 ❌(nginx 直接拒绝)

问题

⚠️ china-ndcpa data_url 403/xxgk/ 被 nginx 拦截(root 200)。推荐替换为:

  • /jbkzzx/c100016/second/list.html(疫情信息,200 ✅)
  • /jbkzzx/c100025/common/list.html(政府信息公开,200 ✅)

修复疾控局 data_url 后 approve

Copy link
Copy Markdown
Contributor

@mingcha-dev mingcha-dev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔍 明察 QA — PR #124(修复后复检)

疾控局已修复 → /jbkzzx/c100016/second/list.html(200 ✅)

通过 ✅

@firstdata-dev firstdata-dev merged commit d36fa17 into main Apr 6, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants