Skip to content

Conversation

@jh-RLI
Copy link

@jh-RLI jh-RLI commented Dec 4, 2025

Fixes #1240 .

This PR implements updates for the update to Zensus 2022. The Zensus data in 2022 format was converted to 2011 format to keep the code changes minimal. This conversion was done outside egon-data using several scripts. The conversion result was analyzed, results are reasonable. The analyses was done by comparing relative/absulte change, scatter plots to get an overview of the data quantity value distribution across attributes. These results are also available but not part of egon data so far.

Not all attributes from 2011 are available in zensus 2022 but all attributes which are in use currently are available. See the overview in the linked issue. If we ever need more attributes data, we can get the data from Destatis using the contact form and requesting a "Sonderauswertung".

As the data is not downloaded during pipeline run currently one must first download the databundle and replace the zensus 2011 data with 2022 data manually. At least until the bundle is updated. Then start another run which does not download the data bundle again. To do that, i simply added an empty function in the databundle class called "skip" and replaced the download task with that function.

The databundle will be updated once a full test run is done.

📋 Pull Request Guidelines

Please read the Pull Request Guidelines carefully before creating your PR.


🧑‍💻 Contributor Checklist

Before requesting a review, make sure you've completed all of the following:

  • All tests pass locally or via CI
    (for more information on local test, check tox in the Contributing section)
    (CI tests are automatically executed when creating a PR, you can see the results of the checks below)
  • Workflow has run at least once in Test mode
    (optional if no dataset changes are involved)
  • Relevant documentation is updated (API, new features, etc.)
  • Dataset-versions are updated when existing datasets are adjusted.
  • Added a note to CHANGELOG.rst about the changes
  • Added yourself to AUTHORS.rst

Optional:

  • Changes have been tested in Everything mode
  • Extend the checklist for reviewers: Which aspects should be reviewed in particular?
<!-- Example:
Please focus on validating the data handling in file XYZ.
-->

🔍 Reviewer Checklist

During your review, please check the following:

  • Is the code clean, readable, and efficient? Are there any oddities or obvious inefficiencies?
  • Does the code work as expected? (should already be verified by contributor)
  • Do all tests pass? (see CI results)
  • Is the documentation complete and up to date?
  • Is CHANGELOG.rst updated accordingly?
  • Is all necessary metadata complete and correct?
    • If metadata is pending: Is there an appropriate issue filed?

📝 Additional Notes (optional)


💡 Tip: If you add multiple reviewers, clarify who should check what — this saves time and avoids duplicated efforts.

- fix linter errors mostly line to long / import unused
@jh-RLI jh-RLI requested a review from nesnoj December 4, 2025 11:08
@jh-RLI jh-RLI self-assigned this Dec 4, 2025
@jh-RLI
Copy link
Author

jh-RLI commented Dec 4, 2025

I have to update some docstrings before this PR is ready for review

…adapting the get census functionality:

- The original files is downloaded from the Zensus database and will be added to the databundle
- The get function is adapted to first transform the data into expected 2011 like shape
- The pandas dataframe is returned all other functionality is unchanged and works with the same data structure as it was in 2011 just with 2022 data
- update docstrings mentioning 2011 Zensus to 2022
@jh-RLI jh-RLI marked this pull request as ready for review December 15, 2025 14:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] Update census data to 2022

2 participants