Reports localization#2452
Reports localization#2452emilyanndavis wants to merge 19 commits intonatcap:feature/reports-localizationfrom
Conversation
…ges in HTML templates.
… model spec is localized before it is passed to the report.
… are localized on demand.
…endtrans tags where needed.
…essage extraction.
| @@ -0,0 +1,9 @@ | |||
| # Extract messages from Python source files | |||
There was a problem hiding this comment.
I named this file babel_config.ini because the Babel docs on mapping files indicate "[t]he configuration file syntax is based on the format commonly found in .INI files on Windows systems […]." Since I didn't find any requirements specifying how this file should be named, I suspect we could call it whatever we want. Open to suggestions if anyone has a preference for some other name or file type.
| <h2 class="section-header">{% trans %}Results{% endtrans %}</h2> | ||
|
|
||
| {{ accordion_section( | ||
| 'Aggregate Results', | ||
| gettext('Aggregate Results'), |
There was a problem hiding this comment.
Note that while we prefer trans/endtrans tags for readability wherever possible (especially where text contains a variable), gettext() is appropriate for string literals, such as those passed as args to macros.
| @@ -8,18 +8,14 @@ | |||
| import json | |||
There was a problem hiding this comment.
Removed some unused imports.
| f'{preprocessed_args.get("results_suffix", "")}.html')) | ||
|
|
||
| with natcap.invest.reports.configure_libraries(): | ||
| reporter_module = importlib.import_module(self.reporter) |
There was a problem hiding this comment.
I had moved this line down because I was experimenting with a second context manager that would handle reloading the modules a particular reporter depends on, before importing the reporter module—until I realized there was a better way (see changes to report_constants.py).
I left this line here because it seems like a readability improvement to import the module immediately before invoking its report method, but otherwise I don't think it makes any real difference.
| def get_locale(): | ||
| """Get the current locale code.""" | ||
| return LOCALE_CODE |
There was a problem hiding this comment.
This getter is needed so we can pass the current locale code to the base HTML template. If we import and reference LOCALE_CODE directly, it will always be 'en' because that was its initial value—it won't change even if it has been modified by set_locale.
There was a problem hiding this comment.
I should have done this originally, but may I suggest renaming the variable to _LOCALE_CODE to emphasize that it's not meant to be used outside this module?
…orts-localization
… they'll still pass if/when CBC has a reporter.
| target = ( | ||
| 'natcap.invest.coastal_blue_carbon.coastal_blue_carbon.execute') | ||
|
|
||
| with (unittest.mock.patch(target, return_value=None) as patched_model, | ||
| unittest.mock.patch('importlib.reload')): |
There was a problem hiding this comment.
Each CLI test that mocks the CBC execute method was failing, because invoking cli.main with the run command would reload the model module, effectively overwriting the mock and forcing the real execute method to, well, execute. (And then the test would fail because there seemed to be some problem with the test datastack and/or the test workspace setup I didn't quite get to the bottom of, but I think it was merely an artifact of this particular test suite's environment and that the model itself is OK since I see no CBC test failures elsewhere.)
I updated cli.main so that it reloads the model module only if the model has a reporter, which makes this issue moot as long as CBC doesn't have a reporter. I also updated the affected tests so they mock importlib.reload, avoiding a true reload (and overwrite of the mocked execute method) so these tests will still pass if/when CBC has a reporter. (I tested and confirmed this locally.)
davemfish
left a comment
There was a problem hiding this comment.
Thanks, @emilyanndavis , this looks great to me. I reviewed all the changes but don't have too much experience with our translations infrastructure.
| if generate_report: | ||
| # Reload model module to ensure text defined in model spec is | ||
| # localized before it is passed to the report. | ||
| importlib.reload(model_module) |
There was a problem hiding this comment.
So in the past the model_module would call execute without localizing first, and that was okay because localization was not necessary for any reason during/after execute? But now it is necessary because the report uses localized strings from model_module?
There was a problem hiding this comment.
Correct. The report uses strings from model_module—specifically, from model_module.MODEL_SPEC—and although those strings are wrapped in gettext, those calls to gettext are executed only when the model_module loads. When model_module first loads, the locale is en, so all those strings are in their default English. When the locale changes, we need a mechanism to localize those strings on demand, i.e., by forcing all those calls to gettext to run again. I noticed that, when cli.main is invoked with the validate command, we are reloading natcap.invest.validation_messages to ensure the validation messages are localized, so I figured an analogous approach would make sense here. Hypothetically, if we were to move model specs out to their own modules (something we've entertained at least once or twice), we could get away with reloading only the (hypothetical) model spec module. As long as the MODEL_SPEC is defined on the model_module, however, reloading model_module seems to be the way to go.
But also … your question got me wondering how we manage to localize model_spec-defined strings in the Workbench, and it looks like we are actually missing a piece of the puzzle, since I found some text that is appearing in English even though we have a Spanish translation for it. I'll be digging into that separate from this PR.
There was a problem hiding this comment.
The same thing happens in ui_server.py - we reload the model module before returning the spec or validation messages.
tests/reports/test_base_template.py
Outdated
|
|
||
| for arg in ['report_script', 'invest_version', | ||
| 'report_filepath', 'timestamp']: | ||
| # self.assertIn(render_args[arg], soup.footer.string) |
|
As we discussed in our call earlier today, I've updated this PR to point to a feature branch instead of the main branch, so we can hold off on releasing these changes until after we have complete translations for the reports we have so far. If we were to release this now, "translated" reports would be mostly in English, with the exception of the model name and perhaps a few stray words or phrases we happen to already have translations for. I worry this would look more like a bug than a feature, especially since reports are new and therefore potentially under outsized scrutiny. We should plan to use the feature branch to generate the message catalogs we will send to translators, then merge the feature branch into the appropriate release branch after we've received the updated translations. |
claire-simpson
left a comment
There was a problem hiding this comment.
Thanks @emilyanndavis!
Co-authored-by: Claire Simpson <112011324+claire-simpson@users.noreply.github.com>
megannissel
left a comment
There was a problem hiding this comment.
Thanks, @emilyanndavis!
| output_raster_stats_table=output_raster_stats_table, | ||
| input_raster_stats_table=input_raster_stats_table, | ||
| stats_table_note=report_constants.STATS_TABLE_NOTE, | ||
| stats_table_note=report_constants.stats_table_note(), |
There was a problem hiding this comment.
IMO, it would be a bit more intuitive to keep the strings as constants and wrap them directly in gettext here:
| stats_table_note=report_constants.stats_table_note(), | |
| stats_table_note=gettext(report_constants.STATS_TABLE_NOTE), |
There was a problem hiding this comment.
I agree—it doesn't seem quite right to define "constants" as functions—but I think the reason I ended up doing that is because I wanted the constants module itself to handle the localization. While your suggestion makes sense (and works), it shifts the responsibility to each reporter module, and as more reports are added I worry that it would be too easy to forget to wrap those constants in gettext, and we don't currently have a mechanism to catch that mistake.
I was trying to avoid reloading the report_constants module, but since I'm not thrilled with these alternatives, I'm now thinking it might be best to follow the same pattern we use for localizing validation messages, and just go ahead and reload the report_constants module when we need it (in this case, that would be just before reloading the model module when we're about to run a model that has a reporter). Do you see any reason not to do that?
There was a problem hiding this comment.
Hmm, well it is nice to avoid needing to reload the module. Reloading is really a kind of hacky antipattern, and I have wondered before if we should refactor other parts of invest to wrap translated objects in a function, rather than reloading. Since the reference to stats_table_note here is already wrapped in render(), it is evaluated in the correct language at runtime, so I think I prefer your current solution vs. introducing a need to reload.
I'd be curious what others think about the value of the wrapper functions as a "reminder" mechanism vs. directly calling gettext. If we keep the current design, I'd suggest renaming report_constants to report_messages or report_strings.
| 'in GIS to assess its accuracy.' | ||
| ) | ||
| # All `gettext`-wrapped strings defined in this module should be returned from | ||
| # a function, _not_ defined on the module object itself, since text defined at |
There was a problem hiding this comment.
I'm not sure that the wrapper functions are adding anything here. It should be sufficient if we just make sure that natcap.invest.gettext is being called after the locale has been set.
src/natcap/invest/spec.py
Outdated
| from . import gettext | ||
| from .unit_registry import u | ||
| from . import validation_messages | ||
| import natcap.invest.reports |
There was a problem hiding this comment.
Since natcap.invest is already imported, is this unnecessary?
There was a problem hiding this comment.
Maybe. I only did this to appease my code editor's static analysis tools that seemed unconvinced of the existence of natcap.invest.reports. On further inspection, it looks like natcap.invest is imported only in order to access the __version__, so I could probably streamline these imports a bit.
| if na_count: | ||
| # There is some redundancy here to ensure the checkbox label is | ||
| # localized properly. Note that passing an f-string to gettext will not | ||
| # work; we must explicitly use .format() to populate variable values. |
There was a problem hiding this comment.
Is the issue that "point(s)" might not translate well into other languages?
There was a problem hiding this comment.
Yep! Some languages use different adjective forms for singular vs. plural nouns; verbs often must change as well (that's true in English, but sometimes we can sidestep that with implied verbs, e.g., "1 point [is] missing" vs. "2 points [are] missing"); and changes in word order are not out of the question. It's typically best to provide the full context for both the singular option and the plural option.
(Shorthand forms that rely on the ability to perceive punctuation, like "point(s)" or "is/are", aren't great for screenreader users either, but that's a bit of an aside—the primary motivation here is support for translations.)
| Message catalog template file. This contains all the strings ("messages") that are translated, without any translations. All the PO files are derived from this. | ||
|
|
||
| ### `babel_config.ini` | ||
| Mappings file that tells pybabel where to look when extracting messages into the message catalog. By default, pybabel will extract messages from Python source files; we need this mappings file to ensure it also extracts messages from the Jinja templates that are used for HTML reports. |
There was a problem hiding this comment.
Thanks for adding this explanation!
| * `MODEL_SPEC` `name` and `about` text | ||
| * Validation messages | ||
| * Strings that appear in the UI, such as button labels and tooltip text | ||
| * Strings that appear in the Workbench UI, such as button labels and tooltip text |
There was a problem hiding this comment.
It's true that workbench UI strings are eventually localized, but there is a different process for that, which is documented in workbench/readme.md. The process in this doc doesn't apply to those strings, so I suggest not mentioning them here.
There was a problem hiding this comment.
This line was already here—I just added the word Workbench—but you raise a good point. I think it's still worth mentioning these items here (since they are translated), but I'll clarify that they're handled separately.
Description
Resolves #2319. Specifically:
gettext()(for string literals passed to macros) ortrans/endtranstags (everything else), to ensure messages are extracted.langattribute and UG URL.Checklist
- [ ] Updated HISTORY.rst and link to any relevant issue (if these changes are user-facing)- [ ] Updated the user's guide (if needed)