Stateful Data At The Edge Needs A New Kind Of CDN
(Note - there is an interactive code tutorial at the bottom of the post that shows you how a CDN model can be applied to dynamic application data for instant performance.)
CDNs have received much attention in the last few years, with the emergence of newer and richer feature sets. Companies like Fastly and CloudFlare have entered the market with a focus on helping developers leverage the power of edge computing and hyper-locality to users by caching static content close to where it is being used.
That said, much of the promise of CDN has not been realized by developers, as static content can only do so much to accelerate an application that is consistently backhauling dynamic data requests to a cloud datacenter. Some CDNs have responded with Key/Value stores at the edge, but generally a rich user experience requires more than a K/V store. Why?
Today’s web services, APIs and applications are required to be highly responsive and always online. To achieve low latency and high availability, instances of these applications need to be deployed in data centers that are close to their users. Applications need to elastically respond in real time to large changes in usage at peak hours, store ever increasing volumes of data, and make this data available to users in milliseconds. We need data systems that store data at the edge, at distances that are milliseconds away from users and devices. We should be able to think of data — in the cloud or at the edge — as part of a single global system. In short, our data infrastructure should give us data that can be accessed where, when, and by whom we want with minimal latency and effort. This is a natural CDN model, but current CDN providers have not delivered the ability to exploit dynamic edge interactions.
Static content distribution is simple, with content assets replicating to CDN PoPs. Dynamic content, such as is stored in databases requires different replication. In a nutshell, this is the edge distributed data problem - and prevents CDN providers from servicing the most complex applications at the edge.
Performance and Availability are Critical
For computers & networks, distance is a performance killer. If the images of a website are stored locally, but the infrastructure serving dynamic portions of a user experience are extremely remote, the entire user experience suffers - regardless of how quickly a CDN can serve the static portions of content. This is the fundamental reason we use CDNs and stateless function services at the edge wherever we can.
For a database, availability is also a crucial factor. Traditional cloud approaches using a centralized database create single points of failure and significant risk for the business. A failure means lost revenue, damaged reputation, and of course remediation. Depending on the business, every hour of outage can cost millions.
Creating a Global Footprint
One of the biggest issues in using dynamic application data at the edge is the footprint required to maintain hyper-locality to users. Most modern databases and dynamic data backends work well in a single datacenter, but placing the data closer to users requires massive, global distribution. Maintaining consistency across these regions is one of the most challenging aspects of designing distributed applications. Eventual consistency can be attained, but in limited regions with data taking as long as 10 seconds to reach consistency. This creates an enormous exposure to conflicts and severely limits performance. This is why there is so much interest in Conflict Free Replicated Data Types (CRDTs).
A CDN for Stateful Data – Macrometa’s Geo Distributed Fast Data Platform
Most research on CRDTs currently is focused on data types. What a developer needs is not individual data types but regular database and query languages with CRDT technology transparently managed underneath.
Macrometa’s Fast Data platform pushes the boundaries on CRDTs and provides a higher level abstraction to developers i.e., a multi-model (key-value, doc, graph, streams) database along with a query layer (C8QL) to handle both data-at-rest and data-in-motion use cases.
Following is a high-level architecture view of Macrometa’s Fast Data platform. For those interested in learning further the technology parts, you can read more about it here - Macrometa Technology Overview.
Best way to understand a technology is to just try it. At least that is how I do it. So for folks who are interested further, I recommend to go ahead and get a free developer account on Macrometa Fast Data platform. You can do that by signing up here.
Here's an Interactive Program to show how a CDN can work like DB
Macrometa provides a geo distributed platform that delivers a real-time noSQL streaming database in 25 global PoPs. The code sample below shows how data can be stored globally across these regions with strong consistency and high performance. Macrometa provides a free developer account with access to 4 regions but that should be sufficient as the platform behavior is same whether it is on 4 regions or on 25 regions. Following is a small interactive program that you can use to play around with it as a starting point.