-
-
Notifications
You must be signed in to change notification settings - Fork 67
Add Wikipedia processing and report #235
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
@TimidRobot I am considering classifying the languages by regions. Would that be quite meaningful? |
Rather, the data file doesn't have any region information. The processing phase shouldn't fetch any additional information. If there is region data, the fetch script needs to be updated. Though, given the various diasporas, I'm skeptical that region information is helpful. |
Ohh okay. |
|
I also think think you should update and reorder plots:
|
|
We use sentence case for titles and headings because we generally follow the Google developer documentation style guide.
Sentence case improves readability and allows consistent capitalization (otherwise knowing which words to capitalize can become quite hard to remember). See also: Documentation Guidelines — Creative Commons Open Source |
| import sys | ||
| import traceback | ||
|
|
||
| # Third-party |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update this script to match established scripts:
from pygments import highlight
from pygments.formatters import TerminalFormatter
from pygments.lexers import PythonTracebackLexer...
| except Exception: | ||
| LOGGER.exception(f"(1) Unhandled exception: {traceback.format_exc()}") | ||
| sys.exit(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update to match established scripts:
except Exception:
traceback_formatted = textwrap.indent(
highlight(
traceback.format_exc(),
PythonTracebackLexer(),
TerminalFormatter(),
),
" ",
)
LOGGER.critical(f"(1) Unhandled exception:\n{traceback_formatted}")
sys.exit(1)| except SystemExit as e: | ||
| LOGGER.error(f"System exit with code: {e.code}") | ||
| sys.exit(e.code) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update to match established scripts:
except SystemExit as e:
if e.code != 0:
LOGGER.error(f"System exit with code: {e.code}")
sys.exit(e.code)| Processing count data: language representation | ||
| """ | ||
| LOGGER.info(process_language_representation.__doc__.strip()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the other log messages, the first word after the colon is capitalized. I think language should be capitalized here (Language).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ohh okay.
Fixes
Description
Added python scripts for processing and reporting wikipedia data.
Covered analysis around top 10 highest language usage, classification of represented and underrepresented languages, average count of article per language and percentage of all Wikipedia articles that belong to the top 10 languages.
Checklist
Update index.md).mainormaster).visible errors.
Developer Certificate of Origin
For the purposes of this DCO, "license" is equivalent to "license or public domain dedication," and "open source license" is equivalent to "open content license or public domain dedication."
Developer Certificate of Origin