Multimodal LLMs Provide the "Common Sense" Robots Need for Edge Cases

Related Insights

Robotics Data Startups Must Combine Software with Human Operations for a Complete Solution

Unlike LLMs that train on the existing internet, robotics lacks a pre-training dataset for the physical world. This forces companies like Encore to build a full-stack solution combining a software platform for data management with human-led operations for data collection, annotation, and even real-time remote robot piloting for exception handling.

Inside Amazon’s Potential $50B OpenAI Investment, Nvidia’s Impressive Earnings & Stock Fall

The Information's TITV·3 months ago

DeepMind's CEO Believes 'World Models' Are the Missing Link for Real-World Robotics

While language models understand the world through text, Demis Hassabis argues they lack an intuitive grasp of physics and spatial dynamics. He sees 'world models'—simulations that understand cause and effect in the physical world—as the critical technology needed to advance AI from digital tasks to effective robotics.

The Future of Intelligence with Demis Hassabis (Co-founder and CEO of DeepMind)

Google DeepMind: The Podcast·5 months ago

Building a Generalist Robot Brain May Be Easier Than Creating Specialized Ones

The Physical Intelligence thesis is that a foundation model learning from diverse data can achieve a "physical understanding" of the world, making it easier to adapt to new tasks than building single-purpose robots from scratch. Generality leverages broader data, which is ultimately a more scalable approach.

Sergey Levine - Building LLMs for the Physical World - [Invest Like the Best, EP.465]

Invest Like the Best with Patrick O'Shaughnessy·2 months ago

LLMs Fail at Common Sense Because They Are Trained on the 'Maybe Sphere' of Debatable Text

Large Language Models struggle with obvious, real-world facts because their training data (text) over-represents uncertain topics open to debate—the 'maybe sphere.' Bedrock, common-sense knowledge is rarely written down, leaving a significant gap in the AI's world model and creating a need for human oversight on obvious matters.

David Shor and Byrne Hobart on the Politics of a White-Collar Wipeout

Odd Lots·2 months ago

World Models: The Missing Link for Spatial and Embodied AI

Large language models are insufficient for tasks requiring real-world interaction and spatial understanding, like robotics or disaster response. World models provide this missing piece by generating interactive, reason-able 3D environments. They represent a foundational shift from language-based AI to a more holistic, spatially intelligent AI.

The Godmother of AI on jobs, robots & why world models are next | Dr. Fei-Fei Li

Lenny's Podcast: Product | Career | Growth·6 months ago

Robotics AI Can Dynamically 'Think Longer' on Hard Tasks Without Retraining

A new model architecture allows robots to vary their internal 'thinking' iterations at test time. This lets practitioners trade response speed for decision accuracy on a case-by-case basis, boosting performance on complex tasks without needing to retrain the model.

Test-Time Compute Scaling of VLA Models via Latent Iterative Reasoning: An Overview

Machine Learning Tech Brief By HackerNoon·3 months ago

Waive Teaches its AI to Reason Using "World Models" that Simulate Future Scenarios

The AI's ability to handle novel situations isn't just an emergent property of scale. Waive actively trains "world models," which are internal generative simulators. This enables the AI to reason about what might happen next, leading to sophisticated behaviors like nudging into intersections or slowing in fog.

How End-to-End Learning Created Autonomous Driving 2.0: Wayve CEO Alex Kendall

Training Data·6 months ago

Advanced Robot Learning Is Now Bottlenecked By Scene Interpretation, Not Physical Skill

Robots have become so capable at low-level physical tasks that the primary bottleneck has shifted to "mid-level reasoning"—interpreting a scene and choosing the correct next action. This means improvement can come from high-level language-based coaching, not just more physical demonstration data, which is a major breakthrough.

Sergey Levine - Building LLMs for the Physical World - [Invest Like the Best, EP.465]

Invest Like the Best with Patrick O'Shaughnessy·2 months ago

Force LLMs to Uncover Rare Knowledge With Procedurally Generated Prompts

LLMs are trained to produce high-probability, common information, making it hard to surface rare knowledge. The solution is to programmatically create prompts that combine unlikely concepts. This forces the model into an improbable state, compelling it to search the long tail of its knowledge base rather than relying on common associations.

969: The Laws of Thought: The Math of Minds and Machines, with Prof. Tom Griffiths

Super Data Science: ML & AI Podcast with Jon Krohn·3 months ago

Modern Robotics Leaps Forward by Combining LLM Brains with Learned Motion Control

Unlike older robots requiring precise maps and trajectory calculations, new robots use internet-scale common sense and learn motion by mimicking humans or simulations. This combination has “wiped the slate clean” for what is possible in the field.

Uncapped #32 | Kyle Vogt from The Bot Company

Uncapped with Jack Altman·6 months ago

Get your free personalized podcast brief

Related Insights