Schooling has become a victim of Goodhart's Law. When a measure (grades, test scores) becomes a target, it ceases to be a good measure. Students become experts at 'doing school' — maximizing the signal — which is a separate skill from the actual creative and intellectual capabilities the system is supposed to foster.
When AI models achieve superhuman performance on specific benchmarks like coding challenges, it doesn't solve real-world problems. This is because we implicitly optimize for the benchmark itself, creating "peaky" performance rather than broad, generalizable intelligence.
Traditional schools create a zero-sum game by celebrating one metric: grades. By celebrating a wide array of accomplishments—writing a novella, building a film—a culture shifts from competition to collaboration. One student's success no longer diminishes another's, making the entire group feel empowered.
Just as standardized tests fail to capture a student's full potential, AI benchmarks often don't reflect real-world performance. The true value comes from the 'last mile' ingenuity of productization and workflow integration, not just raw model scores, which can be misleading.
According to Goodhart's Law, when a measure becomes a target, it ceases to be a good measure. If you incentivize employees on AI-driven metrics like 'emails sent,' they will optimize for the number, not quality, corrupting the data and giving false signals of productivity.
The idea of a single 'general intelligence' or IQ is misleading because key cognitive abilities exist in a trade-off. For instance, the capacity for broad exploration (finding new solutions) is in tension with the capacity for exploitation (efficiently executing known tasks), which schools and IQ tests primarily measure.
Despite average test scores on a consistent exam dropping by 10 points over 20 years, 60% of all grades at Harvard are now A's, up from 25%. This trend suggests a significant devaluation of academic credentials, where grades no longer accurately reflect student mastery.
AI makes cheating easier, undermining grades as a motivator. More importantly, it enables continuous, nuanced assessment that renders one-off standardized tests obsolete. This forces a necessary shift from a grade-driven to a learning-driven education system.
Generative AI's appeal highlights a systemic issue in education. When grades—impacting financial aid and job prospects—are tied solely to finished products, students rationally use tools that shortcut the learning process to achieve the desired outcome under immense pressure from other life stressors.
When complex entities like universities are judged by simplified rankings (e.g., U.S. News), they learn to manipulate the specific inputs to the ranking formula. This optimizes their score without necessarily making them better institutions, substituting genuine improvement for the appearance of it.
The Gaokao rewards rote memorization and test-taking skills over creativity and boundary-pushing. This educational culture could be a long-term liability for China's ambitions to become a global innovation leader, as it doesn't cultivate the imaginative mindset seen in other tech hubs.