We scan new podcasts and send you the top 5 insights daily.
Ingress, Stonebraker's first database, couldn't handle non-standard data types like polygons for GIS or custom calendars for financial bonds. Postgres was engineered with an extendable type system to solve this fundamental limitation, making it vastly more flexible for diverse applications beyond standard business data processing.
Stonebraker clarifies that GPUs excel at parallel processing (SIMD), but database indexing (e.g., traversing a B-tree) is a serial process. Each step involves following a pointer to a new memory location, a sequence of operations that cannot be parallelized effectively, making GPUs unsuitable for accelerating this core database function.
Stonebraker asserts that specialized database architectures (e.g., column stores, stream processors) are an order of magnitude faster for their specific use cases than general-purpose row stores like Postgres. While Postgres is a great "lowest common denominator," at the high end, a tailored solution is necessary for optimal performance.
The most difficult engineering tasks aren't flashy UI features, but backend architectural changes. Refactoring a database schema to be more flexible is invisible to users but is crucial for long-term development speed and product scalability. Prioritizing this "boring" work is a key strategic decision.
The long-sought goal of "information at your fingertips," envisioned by Bill Gates, wasn't achieved through structured databases as expected. Instead, large neural networks unexpectedly became the key, capable of finding patterns in messy, unstructured enterprise data where rigid schemas failed.
To build a multi-billion dollar database company, you need two things: a new, widespread workload (like AI needing data) and a fundamentally new storage architecture that incumbents can't easily adopt. This framework helps identify truly disruptive infrastructure opportunities.
Stonebraker's research reveals that on real production data warehouse benchmarks, LLMs achieve 0% accuracy. This is due to messy, non-mnemonic schemas, complex 100+ line queries, and domain-specific data not found in training sets—factors absent from simplified academic benchmarks like Spider and Bird.
The founder used a "Napkin Math" approach, analyzing fundamental computing metrics (disk speed, memory cost). This revealed a viable architecture using cheap S3 storage that incumbents overlooked, creating a 100x cost advantage for his database.
Stonebraker claims the tech world blindly followed Google's lead on MapReduce, which was "ridiculously inefficient" compared to distributed databases. He also slams eventual consistency for failing to guarantee data integrity (e.g., preventing stock from going below zero), a tradeoff most enterprises cannot make. Google later abandoned both concepts.
Truly massive database companies only emerge every ~15 years when three conditions are met: a new ubiquitous workload (like AI), a new underlying storage architecture that predecessors can't adopt (like NVMe SSDs and S3), and a long-term roadmap to handle all possible data queries.
The DBOS project, co-founded by Stonebraker, argues operating systems primarily manage data at scale. Replacing core OS components (like the file system and scheduler) with a database engine can lead to faster performance, built-in high availability, and transactional guarantees for system operations, with "really no downside."