Skip to content

How to do language coverage/Google fonts char set coverage reports right. #20

@graphicore

Description

@graphicore

@davelab6 could you please read this and tell me if I got it right.

Our goal is basically that we can take a font and

a) state which languages are supported by the font
b) tell the user which Google Fonts char sets are supported by the font
c) write next to each Google Fonts char set which languages are supported by it.
d) count the languages supported by the font and count the languages supported by each charset and get the same number for both.

Using the CLDR we can take a any Unicode char set and report the language coverage for that char set.
A font is a char set and a Google Fonts char set of course is one as well (a language for this matter is also a char set).

b) and c) are important because the user will have to choose the char subset when embedding a font from Google Fonts. The Google Fonts API will subset the fonts using the Google Fonts char sets. So a font won't include more chars than in the Google Fonts char set. This is unless the subsetter decides a glyph with unicode outside the char set is needed, i.e. for OT-Features, but we can savely ignore this case.

Right now, what I did was getting a list of languages supported by each char set. Then get the char set support of a font. If a font supports a char set (fully) all languages of the char set are supported as well. Otherwise, neither the languages nor the char set are reported as supported.

I think this logic is flawed. It should rather be like this:

A char set is in the end just an instruction for the subsetter. It means, the font won't contain more chars than in the char set. Still, a font that contains less chars than in the char set can still be subsetted using that char set. Also, it can still contain enough chars to support some or all of the languages supported by the char set.
I'd like to get the list of languages supported by the font. Then make the intersection of the languages supported by the char set and the languages supported by the font. If one or more languages are in the intersection, I'd like to report the languages as supported and the char set as supported. The latter only with the actually supported languages of course.

The semantics of this this would be: if you get the font using this Google Fonts char set, you'll get support for the following languages.

There are two upsides of this method:

  1. we can report more accurately which languages are supported
  2. we don't need the fonts to support all of the chars in a char set. This is important, because we have some fonts that don't include certain ligatures that are in GF-Latin-Plus, e.g. "Muli" misses:
0xFB00 // LATIN SMALL LIGATURE FF
0xFB03 // LATIN SMALL LIGATURE FFI
0xFB04 // LATIN SMALL LIGATURE FFL

None of these chars is so essential that we should discard the whole char set and with it a huge amount of supported languages. In fact, none of these chars are in any of the languages.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions