We scan new podcasts and send you the top 5 insights daily.
While multi-region availability seems like a best practice, the synchronous writes required introduce significant latency (~60ms in the US) that is unacceptable for most applications. Cowling argues it's better for most companies to accept downtime when a cloud region fails than to bear the cost and complexity of active-active systems.
The physical distance of space-based data centers creates significant latency. This delay renders them impractical for real-time applications like crypto mining, where a block found in space could be orphaned by the time the data reaches Earth. Their best use is for asynchronous, large-scale computations like AI training.
TurboPuffer achieved its massive cost savings by building on slow S3 storage. While this increased write latency by 1000x—unacceptable for transactional systems—it was a perfectly acceptable trade-off for search and AI workloads, which prioritize fast reads over fast writes.
When major infrastructure like AWS or Cloudflare goes down, it affects many companies simultaneously. This creates a collective "mulligan," meaning individual startups aren't heavily penalized by users for the downtime, as the issue is widespread. The exception is for mission-critical services like finance or live events.
Previously, cloud services were built as global instances and partitioned for customers. Now, demands for data sovereignty from countries like Germany require a fundamental architectural shift. Systems must be designed to run entirely within a single country's borders, ending the era of globally-shared cloud infrastructure.
The intense computational demand and latency of AI models are compelling enterprises to use multiple cloud providers. Rather than vendor loyalty, companies now prioritize performance, switching between clouds like AWS and Azure to find the fastest available capacity for their AI workloads, reshaping the cloud market.
To serve Notion on AWS while its core infra was on GCP, Turbopuffer bought dark fiber to reduce cross-cloud latency. This extreme measure was preferable to compromising their core architectural principle of avoiding a stateful consensus layer, showcasing deep product conviction.
An outage at a single dominant cloud provider like AWS can cripple a third of the internet, including competitors' services. This highlights how infrastructure centralization creates systemic vulnerabilities that ripple across the entire digital economy, demanding a new approach to redundancy and regulation.
When splitting jobs across thousands of GPUs, inconsistent communication times (jitter) create bottlenecks, forcing the use of fewer GPUs. A network with predictable, uniform latency enables far greater parallelization and overall cluster efficiency, making it more important than raw 'hero number' bandwidth.
Railway's hybrid strategy uses public clouds like AWS and GCP as a safety valve for demand spikes. This allows them to maintain service availability during hypergrowth while systematically migrating workloads to their own more cost-efficient bare metal infrastructure as they build it out.
MongoDB's CEO highlights a key shift in enterprise priorities. Driven by recent major cloud outages, customers are now more concerned with the high cost of data resiliency (multi-region/multi-cloud setups) than raw storage costs. This makes multi-cloud capabilities a critical competitive differentiator for data platforms.