Need a Unicode font?

Every so often I get a request (either from within or outside Adobe) for a “Unicode font.” Unfortunately, that term is not very meaningful to me. The obvious interpretations are:

1) To me as a font geek, the phrase “a Unicode font” “logically” means “a font with a unicode encoding (cmap table).” That would be pretty much every one of the 2400+ OpenType fonts Adobe has in our type library. So that interpretation doesn’t really narrow things much.

2) They could mean “a font that covers all of Unicode.” However, Unicode today has over 100,000 defined code points, and as there is no font format that can include more than 65,535 glyphs, such a font is not technically possible. (There’s a separate question as to whether it would be desirable – see below.)

3) They could also mean “a font that covers some useful subset of Unicode that is more than just the basic WinANSI or MacRoman 8-byte (256-character) set.” However, for that to be meaningful, they’d have to define exactly what writing systems or languages are important to them.

In practice, people usually mean either #2 or #3, and if they meant #2, they’re willing to fall back to #3 once they discover #2 is impossible. So then they sort out more of what writing systems or languages they care about. But even then, things tend to remain complicated. (Wikipedia is a bit vague about their definition – they seem to want to say #2, but of course that’s impossible, and none of their examples fit that definition. I’m working on that.)

These days, there are a fair number of typefaces that have decent Latin, Greek and Cyrillic in a single font. This is reasonable from a design standpoint: the three writing systems share a fair number of character designs and have related origins.

However, there are plenty of other writing systems that are quite dissimilar from the Latin/Greek/Cyrillic triad, such as Arabic, Hebrew, Thai, the various systems of India, or the various Han-derived ideographic systems (Chinese, and the ideographic parts of Japanese and Korean). A typeface such as “Myriad” can meaingfully combine Latin, Greek and Cyrillic. But if somebody wants a “Japanese version of Myriad” or a Hindi version or whatever, that isn’t such a meaningful concept any more. For Japanese, one can reasonably make the serif/sans distinction, and talk about the weight of the strokes, but it’s more a matter of the Japanese glyphs seeming to be reasonably compatible with Myriad than being a Japanese version of Myriad, if you get the distinction.

In such cases, it may make more sense to simply keep separate fonts and let the user or the operating system do some kind of composite font or font fallback mechanism, where a series of physically distinct fonts are used in combination. Such a mechanism allows the user to specify a single “logical font” and be reasonably sure that they’ll get the font they named for the writing systems it covers well, and something reasonable (or at least better than nothing) for just about any other language. But that’s another long story.

Perhaps the biggest problem in making extensive multilingual is Unicode’s Han unification for the ideographic East Asian languages. If you want a font to support more than one of Simplified Chinese, Traditional Chinese, Japanese and Korean in one font, you have a problem. There are slightly (or sometimes very) different designs for certain characters for all these different languages. Currently, the only functional way to distinguish them is to build an OpenType font, pick one of the languages to use for the default forms in the font, and use the OpenType ’locl’ (locale) feature to access the other forms as variants. This requires using applications and/or operating systems which process that feature correctly, for those languages.

I’m not sure how widespread such app/OS support is for the ’locl’ feature with those languages, outside of InDesign CS3, but I know that such fonts are pretty much non-existent in the wild. AFAIK, thus far such fonts have only been built by mad scientists in labs (pace Adobe’s own Dr Lunde & Mr Meyer). [update 2008 08 19: It turns out that Arial Unicode MS has variant forms to deal with this problem. I don’t know of any other shipping/available fonts that do this. The main limitation of Arial Unicode is that it only covers characters encoded in Unicode 2.1, but that’s still interesting and potentially useful.]

Because of the potential for much more compact font files, I am sure we will see more such fonts in the wild; I just don’t know whether it will be next year or four years from now.

[updated later the same day of posting: corrected glyph count to 65,535, and added Wikipedia reference. Updated 21 Aug 2008 to tweak wording around writing systems.]

9 Responses

  1. Angela says:

    I have heard people talk about this, and they usually mean option, #2; however at times they are saying what you have said in #3. Thank you for explaining the difficulty of putting something like this together.

  2. 5566 says:

    Does “Arial Unicode MS” has all glyphs?[Well, it has all of Unicode 2.1 covered. However, Unicode is now up to version 5.1. And of course Arial Unicode MS does not resolve the Chinese/Japanese/Korean issue, either. – T]

  3. Michael Rowley says:

    I should have thought the inquirer was looking for a font like Lucida Sans Unicode and Arial Unicode, which were probably intended as stop-gaps, because they’re enormous files, better replaced by a selection of the few scripts that most people need.

  4. Josh says:

    At the time we built Arial Unicode for Microsoft, option #2 was available, as Unicode 2.1 was the latest version of the standard and there were fewer than 65536 code points (excluding PUA). I suppose that font and its name are “to blame” when people say they want “a Unicode font”.But even then, there was confusion over what “a Unicode font” meant. I’ve learned never to assume what people mean when they say that…*always* qualify it.

  5. James says:

    Why are there no operating system implementations for “logical fonts” comprised of several user-selected fonts? I would love to have this available on an everyday basis, since I use a lot of obscure Unicode characters and mixed writing systems in my work.[I think you’ll get plenty of agreement that there is a problem here, and in fact there have been various attempts at a solution, but they have been too inconsistently or narrowly applied or too inflexible. In recognition of this, there is a current and ongoing discussion between Microsoft, Apple, Adobe and others on this very topic and how to address it, spun off from the folks involved with the OpenType/OFF spec. If you want info on font fallback and font linking on the Windows side today, do a search for “Kaplan font fallback” for several informative blog posts and link sets. – T]

  6. Tom Gewecke says:

    I think you may be able to get a sort of equivalent to the “logical font” at the app level. For example, the OS X app Nisus Writer lets you set both the font and the keyboard to be used for every language, and lets you define custom “languages” as well. You can switch “languages” via a drop down menu.

  7. Josh says:

    Tom wrote: “And of course Arial Unicode MS does not resolve the Chinese/Japanese/Korean issue, either.”If you examine Arial Unicode MS, you’ll find that it contains upwards of 50,000 glyphs. Unicode 2.1 encodes something like 38,000 excluding PUA and Controls, all of which are included in Arial Unicode MS. The balance of unencoded glyphs in Arial Unicode MS are Simplified Chinese, Traditional Chinese, Japanese, and Korean ideographic glyph variants, intended specifically for addressing the “Han De-Unification” issue (as Dirk Meyer refers to it). The font also contains OpenType GSUB lookups to substitute these variant glyphs based on locale (‘locl’ feature). The glyph set was rather carefully chosen to stay within the 65536 glyph limit; as such, not every single ideograph has 4 separate variants, but it is possible to get a reasonably correct locale-specific ideograph using OpenType + Arial Unicode MS.Whether it fully “resolves” the problem is certainly debatable, considering the additions to Unicode that have occurred since then, and especially since there were never any (and still aren’t any) applications that make use of it. But at the time this was considered to have addressed the issue (I am not sure anyone else has attempted to duplicate this approach since then).[I had no idea that Arial Unicode MS has those variants included! I will have to amend my blog post appropriately. I think those ‘locl’ based alternates ought to work in InDesign CS3; we’ll have to test it to see. Thanks! – T]

  8. Erik Swanson says:

    #2 is what the phrase means to me, and given that a single font is no longer possible, I’d re-interpret it as: a collection of fonts, sold/distributed as one, that (collectively) cover 100% of Unicode and are completely consistent in visual style.

  9. What channels would one use to ask for such expansions of font “x” into specified character set ranges?[You’d want to ask the company that develops the font (or font family) in question. If it’s an Adobe font, the “feature request” item on the support section of Adobe’s web site would be the likely channel. Note that we have only a small team, and this is a big undertaking. We’re expanding one of our families to have much broader coverage, and it’s a three-year project. – David L]

Comments are closed.

Thomas Phinney

Adobe type alumnus (1997–2008), now VP at FontLab, also helped create WebINK at Extensis. Lives in Portland (OR), enjoys board games, movies, and loves spicy food.

Workarounds for Win ATM (and MMs) on Vista

Thomas Phinney · July 12, 2008 · Making Type

Extended Latin Character Sets

Thomas Phinney · August 28, 2008 · Making Type