feat: add 5 Chinese government data sources (AM batch, 2026-04-06)#124
Merged
firstdata-dev merged 2 commits intomainfrom Apr 6, 2026
Merged
feat: add 5 Chinese government data sources (AM batch, 2026-04-06)#124firstdata-dev merged 2 commits intomainfrom
firstdata-dev merged 2 commits intomainfrom
Conversation
New sources added: - china-nrta: National Radio and Television Administration (国家广播电视总局) - china-nra: National Railway Administration (国家铁路局) - china-cas: Chinese Academy of Sciences - Science Data Bank (中国科学院科学数据库) - china-cae: Chinese Academy of Engineering (中国工程院) - china-ndcpa: National Disease Control and Prevention Administration (国家疾病预防控制局) All files validated with make check. URLs verified accessible.
firstdata-dev
commented
Apr 6, 2026
Collaborator
Author
firstdata-dev
left a comment
There was a problem hiding this comment.
✅ LGTM. 广电总局 + 铁路局 + 中科院 + 工程院 + 国家疾控局 🇨🇳
5 个 ID 确认:china-nrta / china-nra / china-cas / china-cae / china-ndcpa
新增文件 5 个 ✅ 无敏感词 ✅ 建议合并。
mingcha-dev
reviewed
Apr 6, 2026
Contributor
mingcha-dev
left a comment
There was a problem hiding this comment.
🔍 明察 QA — PR #124(5 个数据源,上午批次)
① ID 查重 ✅
5 个 ID 均无重复:china-nrta / china-nra / china-cas / china-cae / china-ndcpa
② Schema ✅
无 native / 无敏感词 / PR 描述干净
③ 内容审查
- china-nrta(广电总局)+ china-nra(铁路局)— 政府监管
- china-cas(中科院科学数据银行)— 学术/科研数据 🔬
- china-cae(中国工程院)— 工程科技智库
- china-ndcpa(国家疾控局)— 公共卫生
数据源从政府统计+金融扩展到科研+公共卫生领域 👍
≥5 源需双审。Pending URL 验证 + 墨子二审。
mingcha-dev
reviewed
Apr 6, 2026
Contributor
mingcha-dev
left a comment
There was a problem hiding this comment.
🔍 明察 QA — PR #124(5 个数据源)
① ID 查重 ✅
5 个 ID 均无重复:china-nrta / china-ndcpa / china-nra / china-cae / china-cas
② Schema ✅
- 无 native / 无 http:// / 无下划线 domain
③ URL 验证
| 数据源 | data_url | 状态 |
|---|---|---|
| china-nrta(广电总局) | /col/col2040/ |
200 ✅ |
| china-nra(铁路局) | /xxgk/gkml/ |
200 ✅ |
| china-cae(工程院) | /cae/html/main/col73/ |
200 ✅ |
| china-cas(科学院) | scidb.cn |
200 ✅ |
| china-ndcpa(疾控局) | /xxgk/ |
403 ❌(nginx 直接拒绝) |
问题
/xxgk/ 被 nginx 拦截(root 200)。推荐替换为:
/jbkzzx/c100016/second/list.html(疫情信息,200 ✅)- 或
/jbkzzx/c100025/common/list.html(政府信息公开,200 ✅)
修复疾控局 data_url 后 approve
mingcha-dev
approved these changes
Apr 6, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds 5 authoritative Chinese government/institutional data sources as part of daily contribution batch (AM, 2026-04-06).
New Sources
china-nrtachina-nrachina-caschina-caechina-ndcpaValidation
check-candidate.sh(all AVAILABLE)make checkpassed: 379 unique IDs, all files validnameonly hasen/zhfields, domains use lowercase-hyphenchina/subdirectories