-
Notifications
You must be signed in to change notification settings - Fork 4.2k
Description
openedx-platform has two very similar language settings, and this is my attempt to define and document the difference. I think we need to clarify the differences and perhaps make them more consistent. Also, it's unclear if ALL_LANGUAGES needs to be a setting at all.
LANGUAGES:
Should represent the languages supported by the Open edX platform (i.e. available localizations of the Open edX platform user interface), but it seems to list too many languages.
openedx-platform/openedx/envs/common.py
Lines 230 to 253 in e5ebde8
| # Sourced from http://www.localeplanet.com/icu/ and wikipedia | |
| LANGUAGES = [ | |
| ('en', 'English'), | |
| ('rtl', 'Right-to-Left Test Language'), | |
| ('eo', 'Dummy Language (Esperanto)'), # Dummy languaged used for testing | |
| ('am', 'አማርኛ'), # Amharic | |
| ('ar', 'العربية'), # Arabic | |
| ('az', 'azərbaycanca'), # Azerbaijani | |
| ('bg-bg', 'български (България)'), # Bulgarian (Bulgaria) | |
| ('bn-bd', 'বাংলা (বাংলাদেশ)'), # Bengali (Bangladesh) | |
| ('bn-in', 'বাংলা (ভারত)'), # Bengali (India) | |
| ('bs', 'bosanski'), # Bosnian | |
| ('ca', 'Català'), # Catalan | |
| ('ca@valencia', 'Català (València)'), # Catalan (Valencia) | |
| ('cs', 'Čeština'), # Czech | |
| ('cy', 'Cymraeg'), # Welsh | |
| ('da', 'dansk'), # Danish | |
| ('de-de', 'Deutsch (Deutschland)'), # German (Germany) | |
| ('el', 'Ελληνικά'), # Greek | |
| ('en-uk', 'English (United Kingdom)'), # English (United Kingdom) | |
| ('en@lolcat', 'LOLCAT English'), # LOLCAT English | |
| ('en@pirate', 'Pirate English'), # Pirate English | |
| ('es-419', 'Español (Latinoamérica)'), # Spanish (Latin America) |
- This is a standard Django setting
- Full language names are currently specified using the local name (endonym) like "Deutsch" for German; this seems worse than the Django recommendation of naming them in English and marking them for translation, e.g.
("de", _("German")), - All lowercase, separated by hyphens.
- Has non-standard "languages" used for development and testing purposes:
rtlRight-to-Left Test LanguageeoDummy Language for coverage testing (docs)en@lolcatLOLCAT English 😸 (why do we have this 🤔)en@piratePirate English 🏴☠️ (why do we have this 🤔)
- Weirdly uses an
@sign (ca@valencia) as the code for "Catalan (Valencia)", which is an old GNU libc / gettext practice and not usually used for internet localization purposes.ca-es-valenciaorca-valenciawould be more common. - Is pretty inconsistent with locale vs. language codes.
- Uses
it-it,jp-jp,tr-tr,fi-fiinstead of justit,jp,tr,fifor Italian, Japanese, Turkish, Finnish, etc. where there is one dominant country that uses each language - But uses just
enfor English,frfor French, andrufor Russian, all of which are spoken in a very wide variety of countries with many regional differences; for example,fr-fr(France French) andfr-ca(Canadian French) is an important distinction for a lot of Open edX users.
- Uses
- Three Chinese language codes:
zh-cn(Mandarin/Mainland China/simplified),zh-hk(Cantonese/Hong Kong/traditional), andzh-tw(Chinese-Taiwan). This seems correct, but as you'll see, other parts of the platform use totally different codes. (Django upstream uses two,zh-hansfor simplified andzh-hantfor traditional, which some argue is technically more correct but seems to not really be used in practice; as I understand it, browsers typically send/expectzh-cnetc.) - Is copied into a dict as
settings.LANGUAGE_DICT - Use case: a subset of
LANGUAGESis used by thelang_prefAPI and powers the "Site Language" setting of the Accounts MFE:

- Use case: Used for the "Languages" taxonomy that's managed by the system and can be used to apply language tags to content. (This seems to be an oversight -
ALL_LANGUAGESis likely a better fit, see below)
ALL_LANGUAGES:
Intended to represent "all" languages, regardless of whether or not you can use the Open edX platform in this language.
openedx-platform/openedx/envs/common.py
Lines 554 to 564 in e5ebde8
| # Source: | |
| # http://loc.gov/standards/iso639-2/ISO-639-2_utf-8.txt according to http://en.wikipedia.org/wiki/ISO_639-1 | |
| # Note that this is used as the set of choices to the `code` field of the `LanguageProficiency` model. | |
| ALL_LANGUAGES = [ | |
| ["aa", "Afar"], | |
| ["ab", "Abkhazian"], | |
| ["af", "Afrikaans"], | |
| ["ak", "Akan"], | |
| ["sq", "Albanian"], | |
| ["am", "Amharic"], | |
| ["ar", "Arabic"], |
- Not a standard Django setting, and the code doesn't explain why it exists other than to be used for
LanguageProficiency - Full language names are specified in English only and not translated
- No fake/development/weird languages, but does have
eo"Esperanto" which is the languages code we use as a dummy language for coverage testing (see above) - Pretty consistently has only language codes without locale suffix; only one entry
es"Spanish" for example, whereasLANGUAGEShas six different Spanish locales. - Two Chinese language codes:
zh_HANS(Simplified Chinese / Mandarin) andzh_HANT(Traditional Chinese / Cantonese). These introduce UPPERCASE and_(underscore), inconsistent with the lowercase-hyphenated format ofLANGUAGESand inconsistent with the ISO 639-1 standard and all the other codes in the list, which have only two letters. - Use case: Used for the "Course language" setting on Studio's "Schedule & Details" page

- Use case: Used to defined the choices of the
LanguageProficiencymodel, part of the user's public profile (different from their platform language setting). Because it's used in a model'schoicesfield, changing this setting will result in a new migration needing to be created. I guess the thought here was that users may want to list languages on their profile even if those languages are not supported by the system, henceALL_LANGUAGESwas needed to be different fromLANGUAGES???? - Use case: Used as the list of languages for picking a transcript language in the legacy video editor
- Use case: used to define the choices of the
languagefield ofCourseTeam - Use case: Used in
transcript_utilsto get the name of a language, if it can't find the name inLANGUAGES/LANGUAGES_DICT.
Other notes
- There is a management command,
migrate_user_profile_langsthat can migrate users' language preferences to help with cleaning this sort of thing up, with the example given being to go fromzh-cn(old) tozh-hans(new), despite thezh-cnone being correct according to the currentLANGUAGESsetting values. - There is a setting called
EXTENDED_VIDEO_TRANSCRIPT_LANGUAGES- "Additional languages that should be supported for video transcripts, not included in ALL_LANGUAGES". Which seems to go against the spirit of "all" languages already being included in "ALL_LANGUAGES" 😛 . There wasn't really any explanation for this. - The standard browser
IntlAPI prefers mixed capitalization with hyphens for locale codes, but understands the lowercase version that matches our backendsettings.LANGUAGES(It doesn't recognizezh_HANSwith underscores as seen in theALL_LANGUAGESsetting):


Thoughts
The LANGUAGES setting is not really useful on its own - it has too many languages, so you have to use this API to merge it with DarkLangConfig to get the actual list of supported languages. So it seems to me that LANGUAGES is already somewhat playing the role of ALL_LANGUAGES, and with a bit more cleanup we could merge them, such that LANGUAGES is "all languages", and the subset of LANGUAGES+DarkLangConfig represents the available languages of the UI. Cleaning this up will be a fair amount of work, but it would be good to resolve the many inconsistencies between the two lists.
Or, if we need to keep ALL_LANGUAGES, can it be moved out of settings? Does anyone ever override it?