This is the second in a series of articles from Typekit’s resident type designer, Tim Ahrens, on how we optimize fonts for the web. Read the first article.

Unicode values

Whenever text is stored or transmitted digitally, it is done so using the computer’s favorite language: numbers. If you type an “A”, for example, it is actually stored as “65.” This system only works if all computers agree upon which number corresponds to which character. Fortunately, Unicode provides this standard, and is supported by practically all computers today. When a web page is rendered, the browser reads the text – in other words, the numbers – from the website; then for each character, it looks to the font for the corresponding glyph — the drawer with that number on it, figuratively speaking.

Not all glyphs in a font have a Unicode value attached, however. Those glyphs without a number label can only be used if they are made accessible through OpenType features. These are substitution rules stored in the font, which “redirect” to a different drawer when activated. Not all browsers support OpenType features yet but their support is growing.

Some desktop fonts will have additional glyphs (such as a foundry logo) that have no Unicode value nor are they available via OpenType; since these glyphs can never be used on the web, they should be deleted in order to save file size.

Left, symbols from Allumi and their Unicode values. Right, a symbol from Freight Sans which has no Unicode value and cannot be used safely on the web.

Additionally, some fonts may even have glyphs consciously attached to the wrong code points, and come with usage instructions so you know which character contains which image. This can be a sensible strategy if it allows you to simply type in certain glyphs (even logotypes) through your keyboard. If the document is subsequently printed, this hack is invisible on paper, and all that matters is what you can see.

On the web, however, things are different. Not only should the website look like we want it to, we also need to make sure the underlying code is standards-compliant so that search engines, screen readers, translation tools, or web archives can make sense of the text. This is why we make sure that each “drawer” contains what it says on the label, so as not encourage font use that may look like it should but contradicts the principles of the web. (This is also why we do not accept symbol fonts in our library: since no Unicode values yet exist for most of those characters, the only way to use them is to map the images to letters and numbers. The result is gibberish for anyone using an alternate means of reading the site.)

Adding the non-breaking space and soft-hyphen

The text we put on a web page can contain two very useful characters. The non-breaking space (&#nbsp;) is essentially a regular space with the additional quality that it does not break across lines. The soft hyphen (­) looks the same as a normal hyphen, but rendering engines know that it should be omitted unless it is actually used in a line break.

These two characters should not be necessary in any font since their visual appearance is by definition the same as their standard counterparts. In fact, desktop applications use the latter whenever necessary. Many foundries don’t even include the additional glyphs for non-breaking space and soft-hyphen in their fonts, which is, in some sense, the technically cleaner solution.

However, some browsers switch to fallback fonts when these characters are not present in the font, so at Typekit we add them to the fonts we receive if necessary.

Vertical metrics

On the desktop, vertical metrics values in a font are generally ignored, but on the web, they become incredibly important. When determining the line spacing in a paragraph of text, the browser either follows the line-height specified via CSS, or calculates a default value from the ascender, descender, and line gap values stored in the font. While the line height is something that should ideally be chosen and set consciously by the web designer, not everyone does this; as a web font provider, we have to make sure the default line height is consistent across all browsers. This is not a trivial task for historical reasons: a font contains as many as three values each for the ascender and descender, plus two for the line gap. Applications apply these values in different ways – and by no means in compliance with the specification.

The key vertical metrics: ascender, cap height, x-height, and descender.

For example, above, some of the letters protrude above the ascender and descender values, either because of optical compensation or because they contain diacritics. From a technical point of view, these must be included in the ascender and descender values in order to avoid clipping.

Underline and strike-through

Each font contains information on the ideal position and thickness of the underline and strike-through line. However, these values are completely ignored by all desktop applications, which typically use a hairline at a default position. So far, foundries have had no reason to set these values in their desktop fonts carefully. But many browsers do adhere to these instructions, so they need to be evaluated and adjusted for web fonts. Neither underlines nor strike-throughs can be set on a purely objective basis; just like hinting, they are partly a design decision. For many fonts, we manually set these instructions based on our own assessment of what looks best.

For heavier fonts (here: FacitWeb bold), the underline weight is chosen to match the design. The crossbar of the e is a good visual reference. The position is chosen to strike roughly through the middle of the descenders and to leave an appropriate gap on the baseline.

In our next post, we’ll talk about both TrueType and PostScript hinting; the latter, while not as difficult or critical as TrueType hinting, is still important and, of course, has its own set of unique constraints.

This is the first in a series of articles from Typekit’s resident type designer, Tim Ahrens, on how we optimize fonts for the web.

In our recent blog posts, we’ve talked about some of our efforts — research and font optimization — to improve the quality of the fonts we serve. Hinting, vertical metrics, and choosing between TrueType or Postscript outlines are among the main aspects we focused on.

There are, however, many other technical issues that make up a good web font, all of which need to be taken care of when fonts are prepared for use on the web. In this post, we’ll talk about two elements of that process: outlines and components.

Converting outlines

Web fonts can contain either TrueType or PostScript outlines, which use quadratic and cubic Bézier curves respectively. Today, nearly all type designers draw glyphs in PostScript format and many foundries sell only PostScript-based desktop fonts.

Left to right: The “g” in Omnes in the original PostScript outline; converted to TrueType with FontForge; and converted by FontLab with both regular and with increased resolution.

For display fonts (fonts intended to be used at large sizes, such as in a headline), PostScript outlines are preferred, because they render more smoothly across many Windows browsers. But for text fonts (intended for use at paragraph sizes) TrueType is often a better web font format because of its superior hinting control.

That means the original OpenType fonts that Typekit receives from the foundries often need to be converted. And since we’re dealing with fonts on the web, we have to take into consideration not only the accuracy of the outlines — their fidelity to the designer’s intention — but also the file size (which is proportional to the number of points).

In practice, the conversion itself is far from trivial; different font editors, such as FontLab, Fontographer, and FontForge, give different results. For example, FontForge is generally adept at producing an outline with a small number of points, but it struggles with certain types of curves; see above, where the two curve elements in the lower loop of the lowercase “g” have an an unnecessarily large number of control points. Meanwhile, FontLab treats this curve with more elegance but fails to recognize that some of the simpler curve elements can be expressed with fewer points. Producing the best conversion can mean choosing software based on the design of the font (or even a particular glyph); often, for the best results, it’s necessary to cobble together glyphs from different editors, or even to manually clean up the outlines to remove superfluous control points.

“Recomponentization”

Once we have solid TrueType outlines for a font, what next? There’s more we can do to minimize the file size and ensure a fast browsing experience.

Omnes’ “ä” as stored in a PostScript-based font (left) and as components (right).

In order to avoid duplicate data, TrueType permits us to store accented characters as components, building the glyph out of the base letter and the accent. Since the bulk of the characters in a font are accented letters, this can reduce the file size by a considerable amount. PostScript-based OpenType does not support components, so we need to “recomponentize” our source fonts as they are converted for the web, replacing parts of the contour with their equivalent component. This modification does not affect the appearance of the font, but it can reduce the font size by up to 40%.

Converting the outlines and creating components are but the first steps in optimizing fonts for the web; in our next post, we’ll talk about various ways in which we clean up and augment a font’s glyphs, and how attention to vertical metrics and underlines (long neglected in desktop fonts) becomes a priority for web fonts. Stay tuned!