Skip to content

Reports localization#2452

Open
emilyanndavis wants to merge 19 commits intonatcap:feature/reports-localizationfrom
emilyanndavis:feature/2319-reports-localization
Open

Reports localization#2452
emilyanndavis wants to merge 19 commits intonatcap:feature/reports-localizationfrom
emilyanndavis:feature/2319-reports-localization

Conversation

@emilyanndavis
Copy link
Copy Markdown
Member

Description

Resolves #2319. Specifically:

  • Configures Babel and Jinja to extract translatable messages from report templates.
  • Wraps all translatable text in HTML templates in gettext() (for string literals passed to macros) or trans/endtrans tags (everything else), to ensure messages are extracted.
  • Uses current locale code in base template to populate lang attribute and UG URL.
  • Miscellaneous updates to Python modules to ensure localized text is rendered as expected.

Checklist

- [ ] Updated HISTORY.rst and link to any relevant issue (if these changes are user-facing)
- [ ] Updated the user's guide (if needed)

  • Tested the Workbench UI (if relevant)

@@ -0,0 +1,9 @@
# Extract messages from Python source files
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I named this file babel_config.ini because the Babel docs on mapping files indicate "[t]he configuration file syntax is based on the format commonly found in .INI files on Windows systems […]." Since I didn't find any requirements specifying how this file should be named, I suspect we could call it whatever we want. Open to suggestions if anyone has a preference for some other name or file type.

Comment on lines +19 to +22
<h2 class="section-header">{% trans %}Results{% endtrans %}</h2>

{{ accordion_section(
'Aggregate Results',
gettext('Aggregate Results'),
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that while we prefer trans/endtrans tags for readability wherever possible (especially where text contains a variable), gettext() is appropriate for string literals, such as those passed as args to macros.

@@ -8,18 +8,14 @@
import json
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed some unused imports.

f'{preprocessed_args.get("results_suffix", "")}.html'))

with natcap.invest.reports.configure_libraries():
reporter_module = importlib.import_module(self.reporter)
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had moved this line down because I was experimenting with a second context manager that would handle reloading the modules a particular reporter depends on, before importing the reporter module—until I realized there was a better way (see changes to report_constants.py).

I left this line here because it seems like a readability improvement to import the module immediately before invoking its report method, but otherwise I don't think it makes any real difference.

Comment on lines +46 to +48
def get_locale():
"""Get the current locale code."""
return LOCALE_CODE
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This getter is needed so we can pass the current locale code to the base HTML template. If we import and reference LOCALE_CODE directly, it will always be 'en' because that was its initial value—it won't change even if it has been modified by set_locale.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should have done this originally, but may I suggest renaming the variable to _LOCALE_CODE to emphasize that it's not meant to be used outside this module?

Comment on lines +58 to +62
target = (
'natcap.invest.coastal_blue_carbon.coastal_blue_carbon.execute')

with (unittest.mock.patch(target, return_value=None) as patched_model,
unittest.mock.patch('importlib.reload')):
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each CLI test that mocks the CBC execute method was failing, because invoking cli.main with the run command would reload the model module, effectively overwriting the mock and forcing the real execute method to, well, execute. (And then the test would fail because there seemed to be some problem with the test datastack and/or the test workspace setup I didn't quite get to the bottom of, but I think it was merely an artifact of this particular test suite's environment and that the model itself is OK since I see no CBC test failures elsewhere.)

I updated cli.main so that it reloads the model module only if the model has a reporter, which makes this issue moot as long as CBC doesn't have a reporter. I also updated the affected tests so they mock importlib.reload, avoiding a true reload (and overwrite of the mocked execute method) so these tests will still pass if/when CBC has a reporter. (I tested and confirmed this locally.)

@emilyanndavis emilyanndavis marked this pull request as ready for review March 24, 2026 15:35
Copy link
Copy Markdown
Contributor

@davemfish davemfish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @emilyanndavis , this looks great to me. I reviewed all the changes but don't have too much experience with our translations infrastructure.

if generate_report:
# Reload model module to ensure text defined in model spec is
# localized before it is passed to the report.
importlib.reload(model_module)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So in the past the model_module would call execute without localizing first, and that was okay because localization was not necessary for any reason during/after execute? But now it is necessary because the report uses localized strings from model_module?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. The report uses strings from model_module—specifically, from model_module.MODEL_SPEC—and although those strings are wrapped in gettext, those calls to gettext are executed only when the model_module loads. When model_module first loads, the locale is en, so all those strings are in their default English. When the locale changes, we need a mechanism to localize those strings on demand, i.e., by forcing all those calls to gettext to run again. I noticed that, when cli.main is invoked with the validate command, we are reloading natcap.invest.validation_messages to ensure the validation messages are localized, so I figured an analogous approach would make sense here. Hypothetically, if we were to move model specs out to their own modules (something we've entertained at least once or twice), we could get away with reloading only the (hypothetical) model spec module. As long as the MODEL_SPEC is defined on the model_module, however, reloading model_module seems to be the way to go.

But also … your question got me wondering how we manage to localize model_spec-defined strings in the Workbench, and it looks like we are actually missing a piece of the puzzle, since I found some text that is appearing in English even though we have a Spanish translation for it. I'll be digging into that separate from this PR.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same thing happens in ui_server.py - we reload the model module before returning the spec or validation messages.


for arg in ['report_script', 'invest_version',
'report_filepath', 'timestamp']:
# self.assertIn(render_args[arg], soup.footer.string)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leftover comment here

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, thanks!

@emilyanndavis emilyanndavis changed the base branch from main to feature/reports-localization March 25, 2026 17:40
@emilyanndavis
Copy link
Copy Markdown
Member Author

As we discussed in our call earlier today, I've updated this PR to point to a feature branch instead of the main branch, so we can hold off on releasing these changes until after we have complete translations for the reports we have so far. If we were to release this now, "translated" reports would be mostly in English, with the exception of the model name and perhaps a few stray words or phrases we happen to already have translations for. I worry this would look more like a bug than a feature, especially since reports are new and therefore potentially under outsized scrutiny.

We should plan to use the feature branch to generate the message catalogs we will send to translators, then merge the feature branch into the appropriate release branch after we've received the updated translations.

Copy link
Copy Markdown
Contributor

@claire-simpson claire-simpson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @emilyanndavis!

Co-authored-by: Claire Simpson <112011324+claire-simpson@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

@megannissel megannissel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @emilyanndavis!

output_raster_stats_table=output_raster_stats_table,
input_raster_stats_table=input_raster_stats_table,
stats_table_note=report_constants.STATS_TABLE_NOTE,
stats_table_note=report_constants.stats_table_note(),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, it would be a bit more intuitive to keep the strings as constants and wrap them directly in gettext here:

Suggested change
stats_table_note=report_constants.stats_table_note(),
stats_table_note=gettext(report_constants.STATS_TABLE_NOTE),

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree—it doesn't seem quite right to define "constants" as functions—but I think the reason I ended up doing that is because I wanted the constants module itself to handle the localization. While your suggestion makes sense (and works), it shifts the responsibility to each reporter module, and as more reports are added I worry that it would be too easy to forget to wrap those constants in gettext, and we don't currently have a mechanism to catch that mistake.

I was trying to avoid reloading the report_constants module, but since I'm not thrilled with these alternatives, I'm now thinking it might be best to follow the same pattern we use for localizing validation messages, and just go ahead and reload the report_constants module when we need it (in this case, that would be just before reloading the model module when we're about to run a model that has a reporter). Do you see any reason not to do that?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, well it is nice to avoid needing to reload the module. Reloading is really a kind of hacky antipattern, and I have wondered before if we should refactor other parts of invest to wrap translated objects in a function, rather than reloading. Since the reference to stats_table_note here is already wrapped in render(), it is evaluated in the correct language at runtime, so I think I prefer your current solution vs. introducing a need to reload.

I'd be curious what others think about the value of the wrapper functions as a "reminder" mechanism vs. directly calling gettext. If we keep the current design, I'd suggest renaming report_constants to report_messages or report_strings.

'in GIS to assess its accuracy.'
)
# All `gettext`-wrapped strings defined in this module should be returned from
# a function, _not_ defined on the module object itself, since text defined at
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that the wrapper functions are adding anything here. It should be sufficient if we just make sure that natcap.invest.gettext is being called after the locale has been set.

from . import gettext
from .unit_registry import u
from . import validation_messages
import natcap.invest.reports
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since natcap.invest is already imported, is this unnecessary?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe. I only did this to appease my code editor's static analysis tools that seemed unconvinced of the existence of natcap.invest.reports. On further inspection, it looks like natcap.invest is imported only in order to access the __version__, so I could probably streamline these imports a bit.

if na_count:
# There is some redundancy here to ensure the checkbox label is
# localized properly. Note that passing an f-string to gettext will not
# work; we must explicitly use .format() to populate variable values.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the issue that "point(s)" might not translate well into other languages?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep! Some languages use different adjective forms for singular vs. plural nouns; verbs often must change as well (that's true in English, but sometimes we can sidestep that with implied verbs, e.g., "1 point [is] missing" vs. "2 points [are] missing"); and changes in word order are not out of the question. It's typically best to provide the full context for both the singular option and the plural option.

(Shorthand forms that rely on the ability to perceive punctuation, like "point(s)" or "is/are", aren't great for screenreader users either, but that's a bit of an aside—the primary motivation here is support for translations.)

Message catalog template file. This contains all the strings ("messages") that are translated, without any translations. All the PO files are derived from this.

### `babel_config.ini`
Mappings file that tells pybabel where to look when extracting messages into the message catalog. By default, pybabel will extract messages from Python source files; we need this mappings file to ensure it also extracts messages from the Jinja templates that are used for HTML reports.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this explanation!

* `MODEL_SPEC` `name` and `about` text
* Validation messages
* Strings that appear in the UI, such as button labels and tooltip text
* Strings that appear in the Workbench UI, such as button labels and tooltip text
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's true that workbench UI strings are eventually localized, but there is a different process for that, which is documented in workbench/readme.md. The process in this doc doesn't apply to those strings, so I suggest not mentioning them here.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line was already here—I just added the word Workbench—but you raise a good point. I think it's still worth mentioning these items here (since they are translated), but I'll clarify that they're handled separately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Reports: localization of strings defined in report templates

5 participants