The following blog post is part of our guest author series, inviting commentary from leaders around the telecom industry. Cedexis is an independent, third-party CDN load balancing provider. Their OTT-Video solutions are used by many of the biggest M&E companies worldwide as an industry standard best practice. Cedexis optimizes web performance across data centers, content delivery networks (CDNs) and clouds for companies that want to ensure 100% availability and extend their reach to new global markets.
Over-the-Top Video (OTT) is hot and it’s happening now; it’s not some future state. As everyone knows, there are no second chances on the Internet. Consumers who have a less than perfect experience Do. Not. Come. Back.
So how to ensure perfect video performance and 100 percent availability for your business critical Internet TV platform?
One of the most innovative solutions that has emerged in this space is to use multiple CDNs together to ensure performance and availability. Many of the top OTT providers already do this. In fact, according to Dan Rayburn, nearly half of enterprises doing over $100K/MRR are currently using a multi-CDN strategy.
Every multi-CDN solution is unique due in part to two key factors: Demographics and Geography/Network pairings. The viewing audience demographics are extremely important. Each demographic will skew toward a unique time of day, type of ISP and geography. Further, the audience has a tendency for mobile access or large screen. Each CDN has performance strengths and weaknesses in different geographies that can be affected by things like peering, serving capacity and other network elements. These can be represented as an “S-Curve,” as below.
The S-Curve shows, given any two CDNs, the portion of the audience that would have benefited from being on one CDN versus the other. The S-Curve is generated from live RUM data and can be global or restricted to any continent, country, state or region. It can also be restricted to a single or a set of ISPs.
What you see above is actual performance data from two global CDNs. In this example, you can see that CDN1 was significantly better than CDN2 for 35 percent of the audience while CDN2 was preferred (because of better performance) by 15 percent of the audience. That’s 50 percent of your audience that would have had a significantly better viewing experience on one CDN versus the other. The remaining audience fared equally well on either CDN, so we judge them to be a wash from a performance perspective (although from an availability perspective, it is still always better to have two CDNs).
Here are the same two CDNs above, but restricted to the U.S. only. CDN2’s results are quite a bit better, as they now grab 25 percent of the market share while CDN1 shrunk to about the same.
CDN2’s performance is much better when measurements are restricted to the U.S. In fact, they win up to 25 percent of the audience outright. These results continue to change as you change the scope of the measurements and the ISPs being measured. In fact, the better performing CDN for video delivery can change based on peering arrangements, PoP placement, network load and other factors.
There are many ways to execute a multi-CDN strategy. Strategy is important, but execution is critical. So what are the five mandatory elements of execution for a multi-CDN strategy for an OTT-Video platform?
1. Make Sure You are Using the CDNs in an Active–Active Configuration.
Many video distribution platforms will configure a second CDN in a “Disaster Recovery” configuration. This means that when the primary CDN has availability issues, there are alerts sent and then some type of manual failover. This can lead to a couple of issues, including:
- Users that are on the system after the primary CDN starts having issues before the secondary CDN is engaged, will endure bad performance or complete lack of availability during that timeframe.
- Cold starting a CDN can cause serious performance issues on your Origin. When you turn up a CDN that has none of your content cached, it will essentially DDoS your origin. This, of course, causes users to experience even more downtime and worse performance as the cache warms.
The better solution is to deploy your multi-CDN configuration as Active-Active. This means that there is active traffic flowing to both CDNs at all times. This overcomes the second issue because there is always a significant percentage of your data cached on both CDNs at all times. It overcomes the first instance since there is no time where users should be routed to a poorly performing CDN. That’s if you are using performance-based routing. That’s a pretty big “if.” If you are using Geo or Round-Robin routing, you can still have significant issues. We talk more about the types of routing algorithms below.
2. Use a Performance- and Availability-Based Algorithm for Traffic Management
Once you have made the decision to set your multi-CDN configuration to Active-Active, you still have to decide how you are going to distribute the traffic. There are a number of ways, but the most common ways are Round-Robin and its derivatives, Geo and its variations or Performance-based. Let’s look at what these mean.
- Round-Robin load balancing is the most primitive form of traffic management. Round-Robin load balancing is the practice of distributing traffic to multiple destinations in a balanced manner. So if you have two CDNs, you can give each CDN 50 percent of the traffic. You can weight the algorithm so that one CDN gets more traffic than the other, but basically it means each recipient takes its share in turn. This blog tells you more about Round-Robin and its potential pitfalls.
- Geo-based load balancing is the practice of delivering traffic from certain geographies. This might take the form of delivering all traffic in Europe from one CDN, all China traffic from another and everybody else from a We see this often deployed when companies want to move into a BRIC nation and find that their CDN does not perform well there. The basic idea of Geo routing is to offer better performance by considering distances that packets have to travel. Unfortunately, this assumption is often wrong since peering relationships (or lack thereof) and lack of information about location are persistent issues on the Internet. Performance-based on regional geographical location (let alone country), without considering ASN/ISP, paints a very inaccurate picture. We also see that, for CDNs, footprint and peering have a dramatic impact on performance. We provide more detail in this blog.
- Performance-based load balancing uses measurements of the CDN’s or cloud’s performance to determine if they are performing well. These measurements are fed into the load balancing system as the raw information for the decisions to be made. Performance-based load balancing is the state of the art. Performance-based load balancing can be focused on latency, throughput, availability or really any measurement you find is a performance number. For instance, in video, you may wish to measure buffering or video start time depending on your end goals. You can also use Application Performance Monitoring (APM) tools to measure things like server utilization and available memory/disk in your server cluster. These can be fed into a routing algorithm and used to direct traffic. Some type of performance-based algorithm should be used in your traffic management.
3. Make Sure You are Measuring Performance from the Last Mile
The key element to knowing when to deliver which video traffic from which CDN is monitoring of the CDN. By monitoring for availability and performance of the CDN from the last mile, you can determine whether traffic should be delivered from that CDN to those users on that ISP. Synthetic measurements should be eschewed for this purpose. Synthetic measurements are typically taken from a data center from a limited number of networks. These measurements have some value, but ultimately can be gamed by the CDNs and, more importantly, do not reflect the real world. If a major ISP has some availability issues in a large market to a specific CDN, synthetic monitors will usually not pick that up. And yet that is exactly the moment you may want to route that traffic to a CDN that is not experiencing the issues.
The Internet is complex, constantly growing and ever-changing. Peering relationships can take time to administer. Last-mile ISPs’ routers can get congested and it may be months before system administrators get around to upgrading the line-cards. There are over 48,000 Autonomous System Numbers (ASN) on the Internet, many of them carrying large numbers of users. The only way to see all these issues is to have a vast amount of measurements from the end user perspective – aka the last mile.
4. Choosing the Right Origin Configuration
If you decide to manage your origin in-house, use your trusted CDN or cloud storage provider, and be sure to architect it for success. The tried and true method for staying up is basically to configure everything as N+1 – and that includes your origins.
Your origin is also where business logic and competitive features should reside. Smart OTT-Video providers are also starting to do limited delivery from origin. What we mean here is that the origin itself can actually deliver content when it’s in the best position to do so. In some cases we have seen 5-10 percent of delivery come from origin. This can have a positive impact on your CDN bill while improving overall Video Start Time. This hybrid model is gaining traction for two main reasons: (1) it provides viable best performing delivery to the users that are best served from that origin and (2) provides (sometimes) significant costs savings.
Another choice is to use a CDN provider that has its own origin platform. If the CDN provides storage, ensure the platform is inside its network and has geographic redundancy for disaster recovery and high performance egress. The storage platform should be able to deliver your content to third-party CDNs as well as its own to enable proper load balancing across networks. Using a CDN’s internal storage platform also ensures your content is being monitored by the same operations team monitoring your delivery and simplifies your platform management. Time to first byte is important when a CDN calls to the origin, and when delivering video, it’s critical.
For live and linear feeds, things get a little more interesting. If you are delivering live or linear, it’s imperative to have not only multiple live origins, but also the ability to do multiple ingress feeds into two or more CDNs. All CDNs that have significant live traffic provide the ability to stream into primary and secondary ingress points. This should be duplicated across all CDNs involved in the live or linear feed. And this should be done from multiple origins if you want to ensure everyone gets to see every soccer goal.
5. Regularly Monitor Key Regions and Countries for Better Performance Options
Lastly, recognize that much of what you know about the best performing cloud, CDN or ISP could change in six months. Performance demands will increase. Video buffering is already minimally tolerated. Understanding how to improve your users’ video viewing experience via a multi-CDN strategy starts with understanding the dynamic nature of the CDN market. New Points of Presence get deployed. New CDNs enter the marketplace. New peering relationships get developed. Old ones occasionally dissolve. Most top organizations have a performance team that focuses on just these things. If you want minimal buffering and great startup times, you have to make this investment.
The successful OTT providers will learn to execute a multi-CDN strategy for 100 percent availability and the best performance – and don’t forget – remain on budget (perhaps even save a buck). Interesting times in the OTT market. Hang on!