Purely model-based or rule-based systems have flaws. Stripe combines them for better results. For instance, a transaction with a CVC code mismatch (a rule) is only blocked if its model-generated risk score is also elevated, preventing rejection of good customers who make simple mistakes.

Related Insights

Max Levchin claims any single data point that seems to dramatically improve underwriting accuracy is a red herring. He argues these 'magic bullets' are brittle and fail when market conditions shift. A robust risk model instead relies on aggregating small lifts from many subtle factors.

Instead of teams building their own merchant analysis tools, Stripe created a centralized "Merchant Intelligence" service. This AI agent crawls the web, generates merchant embeddings, and serves insights to diverse teams like risk, credit, and sales, eliminating duplicated effort and creating massive internal leverage.

Binary decisions are brittle. For payments that are neither clearly safe nor clearly fraudulent, Stripe uses a "soft block." This triggers a 3DS authentication step, allowing legitimate users to proceed while stopping fraudsters, resolving ambiguity without losing revenue.

Stripe avoids costly system rebuilds by treating its new payments foundation model as a modular component. Its powerful embeddings are simply added as new features to many existing ML classifiers, instantly boosting their performance with minimal engineering effort.

Stripe's AI model processes payments as a distinct data type, not just text. It analyzes transaction sequences across buyers, cards, devices, and merchants to uncover complex fraud patterns invisible to humans, boosting card testing detection from 59% to 97%.

The model combines insurance (financial protection), standards (best practices), and audits (verification). Insurers fund robust standards, while enterprises comply to get cheaper insurance. This market mechanism aligns incentives for both rapid AI adoption and robust security, treating them as mutually reinforcing rather than a trade-off.

For complex cases like "friendly fraud," traditional ground truth labels are often missing. Stripe uses an LLM to act as a judge, evaluating the quality of AI-generated labels for suspicious payments. This creates a proxy for ground truth, enabling faster model iteration.

By creating dense embeddings for every transaction, Stripe's model identifies subtle patterns of card testing (e.g., tiny, repetitive charges) hidden within high-volume merchants' traffic. These attacks are invisible to traditional ML but appear as distinct clusters to the foundation model, boosting detection on large users from 59% to 97%.

Users distrust "talk to your data" tools they don't understand. Stripe's Sigma product overcomes this by generating a natural language explanation alongside every answer. It details assumptions made, like the specific dates used for "Black Friday," allowing non-technical users to verify the logic.

Most AI "defense in depth" systems fail because their layers are correlated, often using the same base model. A successful approach requires creating genuinely independent defensive components. Even if each layer is individually weak, their independence makes it combinatorially harder for an attacker to bypass them all.