Optimizing Fonts for the Web: Unicode values, glyph set, underlines, and strike-through
November 3, 2011
Whenever text is stored or transmitted digitally, it is done so using the computer’s favorite language: numbers. If you type an “A”, for example, it is actually stored as “65.” This system only works if all computers agree upon which number corresponds to which character. Fortunately, Unicode provides this standard, and is supported by practically all computers today. When a web page is rendered, the browser reads the text – in other words, the numbers – from the website; then for each character, it looks to the font for the corresponding glyph — the drawer with that number on it, figuratively speaking.
Not all glyphs in a font have a Unicode value attached, however. Those glyphs without a number label can only be used if they are made accessible through OpenType features. These are substitution rules stored in the font, which “redirect” to a different drawer when activated. Not all browsers support OpenType features yet but their support is growing.
Some desktop fonts will have additional glyphs (such as a foundry logo) that have no Unicode value nor are they available via OpenType; since these glyphs can never be used on the web, they should be deleted in order to save file size.
Additionally, some fonts may even have glyphs consciously attached to the wrong code points, and come with usage instructions so you know which character contains which image. This can be a sensible strategy if it allows you to simply type in certain glyphs (even logotypes) through your keyboard. If the document is subsequently printed, this hack is invisible on paper, and all that matters is what you can see.
On the web, however, things are different. Not only should the website look like we want it to, we also need to make sure the underlying code is standards-compliant so that search engines, screen readers, translation tools, or web archives can make sense of the text. This is why we make sure that each “drawer” contains what it says on the label, so as not encourage font use that may look like it should but contradicts the principles of the web. (This is also why we do not accept symbol fonts in our library: since no Unicode values yet exist for most of those characters, the only way to use them is to map the images to letters and numbers. The result is gibberish for anyone using an alternate means of reading the site.)
Adding the non-breaking space and soft-hyphen
The text we put on a web page can contain two very useful characters. The non-breaking space (
&#nbsp;) is essentially a regular space with the additional quality that it does not break across lines. The soft hyphen (
­) looks the same as a normal hyphen, but rendering engines know that it should be omitted unless it is actually used in a line break.
These two characters should not be necessary in any font since their visual appearance is by definition the same as their standard counterparts. In fact, desktop applications use the latter whenever necessary. Many foundries don’t even include the additional glyphs for non-breaking space and soft-hyphen in their fonts, which is, in some sense, the technically cleaner solution.
However, some browsers switch to fallback fonts when these characters are not present in the font, so at Typekit we add them to the fonts we receive if necessary.
On the desktop, vertical metrics values in a font are generally ignored, but on the web, they become incredibly important. When determining the line spacing in a paragraph of text, the browser either follows the line-height specified via CSS, or calculates a default value from the ascender, descender, and line gap values stored in the font. While the line height is something that should ideally be chosen and set consciously by the web designer, not everyone does this; as a web font provider, we have to make sure the default line height is consistent across all browsers. This is not a trivial task for historical reasons: a font contains as many as three values each for the ascender and descender, plus two for the line gap. Applications apply these values in different ways – and by no means in compliance with the specification.
For example, above, some of the letters protrude above the ascender and descender values, either because of optical compensation or because they contain diacritics. From a technical point of view, these must be included in the ascender and descender values in order to avoid clipping.
Underline and strike-through
Each font contains information on the ideal position and thickness of the underline and strike-through line. However, these values are completely ignored by all desktop applications, which typically use a hairline at a default position. So far, foundries have had no reason to set these values in their desktop fonts carefully. But many browsers do adhere to these instructions, so they need to be evaluated and adjusted for web fonts. Neither underlines nor strike-throughs can be set on a purely objective basis; just like hinting, they are partly a design decision. For many fonts, we manually set these instructions based on our own assessment of what looks best.
In our next post, we’ll talk about both TrueType and PostScript hinting; the latter, while not as difficult or critical as TrueType hinting, is still important and, of course, has its own set of unique constraints.