Clarify LANGUAGES and ALL_LANGUAGES settings

`openedx-platform` has two very similar language settings, and this is my attempt to define and document the difference. I think we need to clarify the differences and perhaps make them more consistent. Also, it's unclear if `ALL_LANGUAGES` needs to be a setting at all.

### LANGUAGES:

_Should_ represent the languages supported by the Open edX platform (i.e. available localizations of the Open edX platform user interface), but it seems to list too many languages.

https://github.com/openedx/openedx-platform/blob/e5ebde83f25bfc51eb4fd62a80846a28cd316ba7/openedx/envs/common.py#L230-L253

* This is a [standard Django setting](https://docs.djangoproject.com/en/6.0/ref/settings/#languages)
* Full language names are currently specified using the local name (endonym) like "Deutsch" for German; this seems worse than the Django [recommendation](https://docs.djangoproject.com/en/6.0/ref/settings/#languages) of naming them in English and marking them for translation, e.g. `("de", _("German")),`
* All lowercase, separated by hyphens.
* Has non-standard "languages" used for development and testing purposes:
    - `rtl` Right-to-Left Test Language
    - `eo` Dummy Language for coverage testing ([docs](https://docs.openedx.org/en/latest/developers/references/developer_guide/internationalization/i18n.html#coverage-testing))
    - `en@lolcat` LOLCAT English 😸  (why do we have this 🤔)
    - `en@pirate` Pirate English 🏴‍☠️ (why do we have this 🤔)
* Weirdly uses an `@` sign (`ca@valencia`) as the code for "Catalan (Valencia)", which is an old GNU libc / gettext practice and not usually used for internet localization purposes. `ca-es-valencia` or `ca-valencia` would be more common.
* Is pretty inconsistent with locale vs. language codes.
    - Uses `it-it`, `jp-jp`, `tr-tr`, `fi-fi` instead of just `it`, `jp`, `tr`, `fi` for Italian, Japanese, Turkish, Finnish, etc. where there is one dominant country that uses each language
    - But uses just `en` for English, `fr` for French, and `ru` for Russian, all of which are spoken in a very wide variety of countries with many regional differences; for example, `fr-fr` (France French) and `fr-ca` (Canadian French) is an important distinction for a lot of Open edX users.
* Three Chinese language codes: `zh-cn` (Mandarin/Mainland China/simplified), `zh-hk` (Cantonese/Hong Kong/traditional), and `zh-tw` (Chinese-Taiwan). This seems correct, but as you'll see, other parts of the platform use totally different codes. (Django upstream uses two, `zh-hans` for simplified and `zh-hant` for traditional, which some argue is technically more correct but seems to not really be used in practice; as I understand it, browsers typically send/expect `zh-cn` etc.)
* Is copied into a dict as `settings.LANGUAGE_DICT`
* **Use case**: a _subset_ of `LANGUAGES` [is used](https://github.com/openedx/openedx-platform/blob/e5ebde83f25bfc51eb4fd62a80846a28cd316ba7/openedx/core/djangoapps/lang_pref/api.py#L65-L69) by the `lang_pref` API and powers the "Site Language" setting of the Accounts MFE:
   <img width="799" height="222" alt="Image" src="https://github.com/user-attachments/assets/24545f5a-cb3a-405c-b1ae-6a53da245dba" />
* **Use case**: Used for the "Languages" taxonomy that's managed by the system and can be used to apply language tags to content. (This seems to be an oversight - `ALL_LANGUAGES` is likely a better fit, see below)

### ALL_LANGUAGES:

Intended to represent "all" languages, regardless of whether or not you can use the Open edX platform in this language.

https://github.com/openedx/openedx-platform/blob/e5ebde83f25bfc51eb4fd62a80846a28cd316ba7/openedx/envs/common.py#L554-L564

* Not a standard Django setting, and the code doesn't explain why it exists other than to be used for `LanguageProficiency`
* Full language names are specified in English only and not translated
* No fake/development/weird languages, but does have `eo` "Esperanto" which is the languages code we use as a dummy language for coverage testing (see above)
* Pretty consistently has only language codes without locale suffix; only one entry `es` "Spanish" for example, whereas `LANGUAGES` has six different Spanish locales.
* Two Chinese language codes: `zh_HANS` (Simplified Chinese / Mandarin) and `zh_HANT` (Traditional Chinese / Cantonese). These introduce UPPERCASE and `_` (underscore), inconsistent with the lowercase-hyphenated format of `LANGUAGES` and inconsistent with the ISO 639-1 standard and all the other codes in the list, which have only two letters.
* **Use case**: Used for the "Course language" setting on Studio's "Schedule & Details" page
   <img width="1082" height="190" alt="Image" src="https://github.com/user-attachments/assets/92e2324a-d88b-4413-9be0-9abf30545371" />
* **Use case**: Used to defined the choices of the [`LanguageProficiency` model](https://github.com/openedx/openedx-platform/blob/e5ebde83f25bfc51eb4fd62a80846a28cd316ba7/common/djangoapps/student/models/user.py#L1571), part of the user's public profile (different from their platform language setting). Because it's used in a model's `choices` field, changing this setting will result in a new migration needing to be created. I guess the thought here was that users may want to list languages on their profile even if those languages are not supported by the system, hence `ALL_LANGUAGES` was needed to be different from `LANGUAGES` ????
* **Use case**: Used as the list of languages for picking a transcript language [in the legacy video editor](https://github.com/openedx/openedx-platform/blob/f70063d3d65493f68bc370c00fe698ca8c355848/xmodule/video_block/video_block.py#L654)
* **Use case**: used to define the choices of [the `language` field of `CourseTeam`](https://github.com/openedx/openedx-platform/blob/e5ebde83f25bfc51eb4fd62a80846a28cd316ba7/lms/djangoapps/teams/models.py#L138)
* **Use case**: [Used](https://github.com/openedx/openedx-platform/blob/e5ebde83f25bfc51eb4fd62a80846a28cd316ba7/openedx/core/djangoapps/video_config/transcripts_utils.py#L1147-L1157) in `transcript_utils` to get the name of a language, if it can't find the name in `LANGUAGES`/`LANGUAGES_DICT`.


### Other notes

* There is a management command, [`migrate_user_profile_langs`](https://github.com/openedx/openedx-platform/blob/e5ebde83f25bfc51eb4fd62a80846a28cd316ba7/openedx/core/djangoapps/user_api/management/commands/migrate_user_profile_langs.py) that can migrate users' language preferences to help with cleaning this sort of thing up, with the example given being to go from `zh-cn` (old) to `zh-hans` (new), despite the `zh-cn` one being correct according to the current `LANGUAGES` setting values.
* There is a setting called `EXTENDED_VIDEO_TRANSCRIPT_LANGUAGES` - "Additional languages that should be supported for video transcripts, not included in ALL_LANGUAGES". Which seems to go against the spirit of "all" languages already being included in "ALL_LANGUAGES" 😛 . There [wasn't really any explanation for this](https://github.com/openedx/openedx-platform/pull/36964).
* The standard browser `Intl` API prefers mixed capitalization with hyphens for locale codes, but understands the lowercase version that matches our backend `settings.LANGUAGES` (It doesn't recognize `zh_HANS` with underscores as seen in the `ALL_LANGUAGES` setting):
    <img width="501" height="46" alt="Image" src="https://github.com/user-attachments/assets/634bf4fd-b7f3-4c91-a2a4-b8d6b11bcdcf" />
    <img width="348" height="39" alt="Image" src="https://github.com/user-attachments/assets/c16caad7-e94a-4c1a-af4e-a7ce61a3beab" />

### Thoughts

The `LANGUAGES` setting is not really useful on its own - it has too many languages, so you have to use [this API](https://github.com/openedx/openedx-platform/blob/e5ebde83f25bfc51eb4fd62a80846a28cd316ba7/openedx/core/djangoapps/lang_pref/api.py#L32) to merge it with `DarkLangConfig` to get the _actual_ list of supported languages. So it seems to me that `LANGUAGES` is already somewhat playing the role of `ALL_LANGUAGES`, and with a bit more cleanup we could merge them, such that `LANGUAGES` is "all languages", and the subset of `LANGUAGES`+`DarkLangConfig` represents the available languages of the UI. Cleaning this up will be a fair amount of work, but it would be good to resolve the many inconsistencies between the two lists.

Or, if we need to keep `ALL_LANGUAGES`, can it be moved out of `settings`? Does anyone ever override it?

	# Sourced from http://www.localeplanet.com/icu/ and wikipedia
	LANGUAGES = [
	('en', 'English'),
	('rtl', 'Right-to-Left Test Language'),
	('eo', 'Dummy Language (Esperanto)'), # Dummy languaged used for testing

	('am', 'አማርኛ'), # Amharic
	('ar', 'العربية'), # Arabic
	('az', 'azərbaycanca'), # Azerbaijani
	('bg-bg', 'български (България)'), # Bulgarian (Bulgaria)
	('bn-bd', 'বাংলা (বাংলাদেশ)'), # Bengali (Bangladesh)
	('bn-in', 'বাংলা (ভারত)'), # Bengali (India)
	('bs', 'bosanski'), # Bosnian
	('ca', 'Català'), # Catalan
	('ca@valencia', 'Català (València)'), # Catalan (Valencia)
	('cs', 'Čeština'), # Czech
	('cy', 'Cymraeg'), # Welsh
	('da', 'dansk'), # Danish
	('de-de', 'Deutsch (Deutschland)'), # German (Germany)
	('el', 'Ελληνικά'), # Greek
	('en-uk', 'English (United Kingdom)'), # English (United Kingdom)
	('en@lolcat', 'LOLCAT English'), # LOLCAT English
	('en@pirate', 'Pirate English'), # Pirate English
	('es-419', 'Español (Latinoamérica)'), # Spanish (Latin America)

	# Source:
	# http://loc.gov/standards/iso639-2/ISO-639-2_utf-8.txt according to http://en.wikipedia.org/wiki/ISO_639-1
	# Note that this is used as the set of choices to the `code` field of the `LanguageProficiency` model.
	ALL_LANGUAGES = [
	["aa", "Afar"],
	["ab", "Abkhazian"],
	["af", "Afrikaans"],
	["ak", "Akan"],
	["sq", "Albanian"],
	["am", "Amharic"],
	["ar", "Arabic"],

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarify LANGUAGES and ALL_LANGUAGES settings #38036

LANGUAGES:

ALL_LANGUAGES:

Other notes

Thoughts

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Clarify LANGUAGES and ALL_LANGUAGES settings #38036

Description

LANGUAGES:

ALL_LANGUAGES:

Other notes

Thoughts

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions