Well, that was just awful: Details on yesterday’s font serving outage
Yesterday was a long day here at Typekit as we dealt with rolling outages from our storage service provider, and we know it was extremely frustrating for many of you. We’d like to apologize for the uncharacteristic hit to our service, and the subsequent effect on your websites.
As some of you may already know, we use Amazon S3 to store fonts and the kit configuration files. According to Amazon, early on the morning of August 10 there was a “misconfiguration,” resulting in an outage on S3 and other services. They reported that they mistakenly pursued the wrong problem at first, which is why the downtime was longer than usual.
Because of this outage, our servers were unable to retrieve fonts and kit configuration files from S3. We first noticed this problem at 12:20 AM PDT (about 10 minutes after the outage started), when we were notified by our automated systems.
We thought the system had stabilized by 4:00 AM PDT, but were unpleasantly surprised by a second outage around 12:00 PM PDT and then a third one shortly after that. Amazon acknowledged the second outage but provided no further insight into what had happened on their end.
What you may have noticed on your website was a “Waiting for use.typekit.net” message, which eventually timed out and then loaded your default fonts instead. Lovely way to start your week, we know.
Outages at the larger service providers like Amazon have effects across huge portions of the internet at a time, which we had the misfortune of witnessing firsthand today. We’re looking into using multiple edge locations for our S3 buckets so we can distribute load over multiple locations. In plain speech, this means that if an outage like this happens again, we’ll be able to switch to different servers which are not affected.
There are a few things you can do to help protect your website from becoming collateral damage during larger service outages.
Last week, we made an update so that new kits will load asynchronously by default. Your current kits may not do this, but it’s easy to get the new version from the kit editor. The result is that if there’s a service outage on our side (or our service provider’s), your fallback fonts will load right away — instead of having to go through the server timeout song and dance.
Speaking of fallback fonts, it’s never a bad idea to put a little thought into what you specify for fallbacks. They’re not just for outages, after all: You never know when someone’s connection speed or custom browser setup might impact the way your site loads or appears. Platform defaults don’t offer quite the typographic palette you might want — that’s probably why you’re using Typekit — but it still gives you a degree of design control. Check our Help documentation for a little detail about styling your fallback fonts using Font Events.
Many of our largest customers who use Typekit Enterprise weren’t affected by this outage, because they’re hosting the entire Typekit solution in their own CDN environment. We’re looking into ways to bring this self-hosting capability to more of our customers in the future. In the meantime, if you’re interested in learning more about Typekit Enterprise, you can contact our Enterprise sales team.
We’re always looking for ways to improve Typekit, and this episode was a powerful reminder of the responsibility we have to demand the highest possible uptime from our partners and vendors, and also to communicate in a timely and effective manner when issues do arise.
In general, if you notice performance issues with the fonts on your site, check our status blog for quick updates on the web font network and any degraded service reports. And as always, feel free to send an email to email@example.com if you have any questions.
Comments are closed.
Matthew, thanks for the in depth explanation of the issue 🙂 BTW, the “Typekit Enterprise” link is a dead end, it should link to: https://typekit.com/plans/enterprise
Fixed that link — thank you for catching that!
Are the new kits really async – in the example linked to the JS is still loaded in a blocking manner so if use.typekit.net is unavailable the page rendering will still be blocked until the browser times the connection out?
We’re planning to promote the advanced embed code more and make it the default so nothing is blocking your site.
Compensate customers when a service goes down – it’s a standard and appropriate thing to do to paying members of a service. Don’t just talk about what went wrong.
That’s a reasonable suggestion. Unfortunately the circumstances of this outage were beyond our control. We think that communicating about what went wrong is the best way we can provide transparency and accountability in cases like this. Please write us at firstname.lastname@example.org and we’ll continue the discussion.
I’m on the Portfolio plan, for $50/year. If you do compensate users for outages, please feel free to wait until you’ve been down a total of a week before sending me my dollar. 😉
you could have put a message on your website, took me ages to figure out who was at fault 🙂
Are the new kits really async – in the example linked to the JS is still loaded in a blocking manner so if use.typekit.net is unavailable the page rendering will still be blocked until the browser times the connection out