Announcing Speakeasy: A new open-source language tool from Typekit

At Typekit, we’re working to deliver fonts in as many languages as possible, while continuing to satisfy our core requirement: delivering fonts as fast as possible. There is one significant challenge with that goal: As you support more and more languages, your character set (and the size of your font files) gets larger and larger. Subsequently, the speed at which you can deliver fonts decreases. So, the obvious aim would be to find a way to reduce the size of those font files. But how?

One method is called dynamic subsetting. Dynamic subsetting, invented by Steve Lee at Monotype, scans a web page to determine the actual characters in use. It then creates—on the fly—a subsetted version of the requested fonts using only those characters, and stores this version in a cache. Dynamic subsetting is a promising future direction, but there are questions about the time it takes to scan the page, return the font, and cache the results. And it’s difficult to tell how well this approach will handle content that is dynamically created.

Another method is to deliver fonts by script, e.g., Latin, Arabic, or Cyrillic. But few fonts support an entire script. If a font only partially supports the Latin-1 character set, how do you know if it will deliver all the characters for the particular languages that your site supports?

Enter Speakeasy: Speakeasy is an open-source language tool that maps a range of unicode symbols to a particular language. So, for example, if you are publishing a site in both German and English, Speakeasy can provide a map of all the unicode symbols required to represent those two languages. This map can then be used to determine if a particular font includes all of the characters required to support these languages. In fact, we’ve used Speakeasy to release just such a feature on Typekit today.

You can now browse fonts by language support; in the Typekit sidebar, you’ll see a list of languages currently available. (That list includes English, Spanish, Italian, Portuguese, French, German, Polish, Swedish, Czech, Dutch, Catalan, and Russian; more languages are coming soon.) Need a font that supports Spanish, Catalan, and Italian? Simply select all three and you’ll only see fonts that include characters required by all three languages.

Screenshot of the new language tools on Typekit.com
New: browse by language support on Typekit.

We’re really excited to release this feature; but we know it can be better. We’ve done our best to identify the absolute minimum set of required characters for each of these languages (you can see which characters are included in each language here). But we’re not native speakers of many of these languages, so we can’t be sure we’ve gotten it right: that’s where you come in. We’re releasing Speakeasy as an open-source project, so that developers from all over the world can contribute to the character mappings and/or add new languages to the list. You can retrieve the code from GitHub to start working right away. With your contribution, Speakeasy can become a more useful tool for everyone.

This is just the first, small step in our plans to provide more support for international users of Typekit. Next, we’re going to take what we’re learning from these character maps and apply it to how we subset fonts. Our goal is to eventually include only the characters that a given site needs. That’s a win in any language.

8 Responses

  1. To set the record strait, Dynamic Subsetting is not a “future direction”, but a current reality and a feature offered within The Fonts.com Web Fonts solution. It’s very fast. Depending on the page content, Dynamic Subsetting pages serving East Asian scripts can load even faster than Latin only pages. We are currently serving Simplified and Traditional Chinese, Japanese and Korean fonts, and have seen no difficulty supporting dynamically created content. Try it for yourself for free at http://webfonts.fonts.com

    Thanks,

    Chris Roberts

  2. This is BIG news for us in eastern europe. Thanks guys. Loading full charset was not optimal.

  3. Christoph says:

    Great news!
    For German › is missing.

  4. James Bonham says:

    Nice. I like it a lot, very useful. The Danish character set is basically a subset of the Swedish one. So you might as well copy it and add Danish support! 🙂

  5. Michaël says:

    Many characters are missing for French, such as “æ, ù, Ç, ç, à, Ä, ä, ë, Ï, ï, Ö, ö, Ü, ü, Ÿ, ÿ, â, é, ê, è, Î, î, Ô, ô, Û, û and ñ”. As you can see, almost all of them use diacritical marks.

    The most important in this list are “Ç, ç, â, à, é, ê, ë, è, Î, î, Ô, ô, û and ù”. Without them, you cannot write French properly.

  6. Have you guys considered using fontconfig’s orth database? That provides the exact same data you are building here and supports a large number of languages already. Feel free to contact me if you need any help making it work for you.

    Cheers,
    behdad

    1. Matt Colyer says:

      Thanks behdad, we weren’t aware that fontconfig had already gathered this information. We’ll take a closer look.

  7. Peter Kahoun says:

    Similarly to Michaël’s comment, you also have problems with recognizing font supporting Czech. List of czech letters is e.g. here: http://en.wikipedia.org/wiki/Czech_alphabet. Thanks!

Comments are closed.