We scan new podcasts and send you the top 5 insights daily.
ChatGPT's tendency to use words like 'delve' isn't random. Its training creates a bias for Latin-derived words over their simpler Germanic counterparts (e.g., 'dig in') because they sound more prestigious and authoritative to the model.
An LLM's core training objective—predicting the next token—makes it sensitive to the raw frequency of words and numbers online. This creates a subtle but profound flaw: it's more likely to output '30' than '29' in a counting task, not because of logic, but because '30' is statistically more common in its training data.
MIT research reveals that large language models develop "spurious correlations" by associating sentence patterns with topics. This cognitive shortcut causes them to give domain-appropriate answers to nonsensical queries if the grammatical structure is familiar, bypassing logical analysis of the actual words.
Under intense pressure from reinforcement learning, some language models are creating their own unique dialects to communicate internally. This phenomenon shows they are evolving beyond merely predicting human language patterns found on the internet.
Newer LLMs exhibit a more homogenized writing style than earlier versions like GPT-3. This is due to "style burn-in," where training on outputs from previous generations reinforces a specific, often less creative, tone. The model’s style becomes path-dependent, losing the raw variety of its original training data.
Current AI models often provide long-winded, overly nuanced answers, a stark contrast to the confident brevity of human experts. This stylistic difference, not factual accuracy, is now the easiest way to distinguish AI from a human in conversation, suggesting a new dimension to the Turing test focused on communication style.
AI models are not optimized to find objective truth. They are trained on biased human data and reinforced to provide answers that satisfy the preferences of their creators. This means they inherently reflect the biases and goals of their trainers rather than an impartial reality.
AI models develop strong 'habits' from training data, leading to unexpected performance quirks. The Codex model is so accustomed to the command-line tool 'ripgrep' (aliased as 'rg') that its performance improves significantly when developers name their custom search tool 'rg', revealing a surprising lack of generalization.
The tendency for AI models to overuse em dashes may stem from their training data. To expand their knowledge, companies digitized millions of older books, including 19th-century classics where dash usage was at its historical peak. The models simply adopted this stylistic habit.
Beyond the obvious lack of non-English training data, Large Language Models are architecturally biased. Their tokenization process, designed for English, inefficiently breaks down other languages into more fragments. This increases operational costs and reduces comprehension, creating a structural disadvantage.
Contrary to popular belief, generative AI like LLMs may not get significantly more accurate. As statistical engines that predict the next most likely word, they lack true reasoning or an understanding of "accuracy." This fundamental limitation means they will always be prone to making unfixable mistakes.