RiffOn - I Know How to Build Safe Superintelligence | Yoshua Bengio, the most-cited AI researcher

Yoshua Bengio presents his "Scientist AI" approach: building safe superintelligence by training models to be honest predictors of reality.

AIs Should Be Trained With an Integrated Policy and Guardrail to Prevent Exploitation

Bengio argues a separately trained agent could learn to 'jailbreak' its safety guardrail. His solution is to train both the policy (the agent) and the guardrail (the safety monitor) jointly from the same neural network, preventing the agent from being optimized to find loopholes in the guardrail.