Rethinking and rewriting core systems, like DeepMind's distillation infrastructure, is a prerequisite for advancing research. These large software engineering investments unlock new capabilities, leading to dramatic improvements in model performance and understanding of scaling laws.
Instead of adopting a cynical, "Machiavellian" workplace attitude, focus on being collaborative and helping others shine. This builds a deep sense of trust and support, making people want to contribute to your projects and back your success in the long run.
The most in-demand skill at labs like Google DeepMind is low-level engineering for accelerating LLM runtime. This involves creating efficient, custom software artifacts (kernels) for new neural net architectures and serving techniques at scale.
The distinction between "applied" and "research" roles is blurry at frontier labs. Even product integrations, like using Gemini to improve Google Search, involve fundamental research challenges such as ensuring factuality, citing sources, and assessing source quality.
Roles requiring accountability will persist despite AI's capabilities. An LLM can't be a lawyer because it can't be disbarred; it can't be held responsible. This principle highlights that the need for human validation and liability will protect many professions.
AI research involves exploring a dependency graph where ideas may fail (stochastic). This contrasts with software engineering's more deterministic path. Success requires "research taste"—an intuition for navigating this uncertainty, a skill often honed in PhD programs.
The best path for an internal transfer to a lab like Google DeepMind is to become an expert at applying its models within your own product area. This makes you a key partner for the research team, creating a natural bridge for a potential transfer.
Google DeepMind made high-capacity Mixture of Experts (MOE) models viable for low-latency products by changing the serving pattern. Instead of sharding experts across chips (high communication), they pipelined model layers, reducing communication overhead and making MOEs fast enough for production.
Unlike traditional ML where models are repeatedly trained on a fixed dataset, each frontier LLM pre-training run uses more compute than ever before. This makes it a one-shot endeavor where success hinges on accurately predicting final performance from smaller-scale experiments using scaling laws.
