The foundational concept for modern LLMs, the attention mechanism, originated from an intern, Dima Badanao, in Yoshua Bengio's lab. The idea was so brilliant that its potential for success was immediately apparent upon explanation, before it was even coded.
To pioneer neural machine translation, Prof. Kyunghyun Cho and his team deliberately limited their review of past research. They believed reading too much would impose false constraints from outdated contexts, preventing them from developing a system from scratch with fresh thinking.
While demoing an early attention-based translation system, Prof. Cho's team discovered it could fill in an "unknown" country token. Given "unknown Korea is an enemy of United States," it output "North Korea," and with "friend," it output "South Korea," revealing emergent world knowledge.
Prof. Kyunghyun Cho contrasts the "isolated" research styles in Korea and Finland with North America's, which he describes as an "extremely collective affair." He believes the constant influx of global talent automatically fosters a collaborative environment that accelerates innovation, a model he aims to replicate.
Prof. Kyunghyun Cho recounts that Yoshua Bengio pushed his lab toward machine translation not just for the task itself, but because it exhibited core AI challenges like handling variable-length sequences and vanishing gradients. Solving translation meant solving these deeper, more general problems.
When Will Falcon, founder of Lightning AI, wanted to build his company while completing his PhD, his advisor Kyunghyun Cho told him to stop. Cho framed both as "200% jobs," arguing that attempting both would compromise the success of each and advised taking a leave of absence.
Prof. Cho argues that modern models already extract most correlations from passive datasets. The next leap in sample efficiency will come from AI agents that can actively choose what data to collect, intentionally making rare, insightful events ("aha moments") more frequent.
The research on re-ranking that influenced Retrieval Augmented Generation (RAG) started with PhD student Rodrigo Nogueira's goal to create an AI researcher. He realized that before an AI could reason, it first needed a scalable way to navigate and retrieve relevant information from vast document sets.
Instead of traditional problem sets, Professor Kyunghyun Cho teaches ML algorithms by building a complete web application from scratch using the concept. He demonstrates his entire workflow, including his prompts and interactions with coding agents, to show students how to build real-world systems.
When designing his machine learning course around AI coding agents, NYU Professor Kyunghyun Cho found that the vast majority (80%) of his 200 advanced computer science students had never installed one. This highlights a major adoption gap even among the most tech-savvy students.
Prof. Cho outlines two competing visions for world models. One camp believes in high-fidelity, step-by-step prediction (e.g., video generation). The other, which he and Yann LeCun favor, argues for abstract, high-level latent models that can plan without simulating every detail, akin to human thinking.
