Feature/#1240 update to zensus2022 #1374
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes #1240 .
This PR implements updates for the update to Zensus 2022. The Zensus data in 2022 format was converted to 2011 format to keep the code changes minimal. This conversion was done outside egon-data using several scripts. The conversion result was analyzed, results are reasonable. The analyses was done by comparing relative/absulte change, scatter plots to get an overview of the data quantity value distribution across attributes. These results are also available but not part of egon data so far.
Not all attributes from 2011 are available in zensus 2022 but all attributes which are in use currently are available. See the overview in the linked issue. If we ever need more attributes data, we can get the data from Destatis using the contact form and requesting a "Sonderauswertung".
As the data is not downloaded during pipeline run currently one must first download the databundle and replace the zensus 2011 data with 2022 data manually. At least until the bundle is updated. Then start another run which does not download the data bundle again. To do that, i simply added an empty function in the databundle class called "skip" and replaced the download task with that function.
The databundle will be updated once a full test run is done.
📋 Pull Request Guidelines
🧑💻 Contributor Checklist
Before requesting a review, make sure you've completed all of the following:
(for more information on local test, check
toxin the Contributing section)(CI tests are automatically executed when creating a PR, you can see the results of the checks below)
(optional if no dataset changes are involved)
CHANGELOG.rstabout the changesAUTHORS.rstOptional:
🔍 Reviewer Checklist
During your review, please check the following:
CHANGELOG.rstupdated accordingly?📝 Additional Notes (optional)
💡 Tip: If you add multiple reviewers, clarify who should check what — this saves time and avoids duplicated efforts.