Sequence LLM Training: Use SFT for Structure, Then DPO for Behavior

Related Insights

Generate AI System Prompts by Feeding an LLM Your Ideal Conversations

Instead of manually crafting a system prompt, feed an LLM multiple "golden conversation" examples. Then, ask the LLM to analyze these examples and generate a system prompt that would produce similar conversational flows. This reverses the typical prompt engineering process, letting the ideal output define the instructions.

How this Yelp AI PM works backward from “golden conversations” to create high-quality prototypes using Claude Artifacts and Magic Patterns | Priya Badger

How I AI·4 months ago

Automated LLM Metrics Are Insufficient; Use a 'Golden Set' for Evaluation

Standard automated metrics like perplexity and loss measure a model's statistical confidence, not its ability to follow instructions. To properly evaluate a fine-tuned model, establish a curated "golden set" of evaluation samples to manually or programmatically check if the model is actually performing the desired task correctly.

Fine-Tuning LLMs: A Comprehensive Tutorial

Machine Learning Tech Brief By HackerNoon·16 days ago

Use AI/ML Jargon Like 'Think Step-by-Step' to Unlock Advanced Reasoning in LLMs

Anthropic suggests that LLMs, trained on text about AI, respond to field-specific terms. Using phrases like 'Think step by step' or 'Critique your own response' acts as a cheat code, activating more sophisticated, accurate, and self-correcting operational modes in the model.

Prompt Claude better than 99% of people

The Startup Ideas Podcast·2 months ago

Treat LLM Interactions as a Multi-Stage Project, Not a Single Prompt

Achieve higher-quality results by using an AI to first generate an outline or plan. Then, refine that plan with follow-up prompts before asking for the final execution. This course-corrects early and avoids wasted time on flawed one-shot outputs, ultimately saving time.

Prompt Claude better than 99% of people

The Startup Ideas Podcast·2 months ago

Enterprise AI Value Is Unlocked by Reinforcement Fine-Tuning, Not Simple SFT

Basic supervised fine-tuning (SFT) only adjusts a model's style. The real unlock for enterprises is reinforcement fine-tuning (RFT), which leverages proprietary datasets to create state-of-the-art models for specific, high-value tasks, moving beyond mere 'tone improvements.'

How OpenAI Builds for 800 Million Weekly Users: Model Specialization and Fine-Tuning

a16z Podcast·3 months ago

Prompt AI for Multiple Variations, Then Ask "Which is Best?" to Force Self-Critique

Instead of accepting an AI's first output, request multiple variations of the content. Then, ask the AI to identify the best option. This forces the model to re-evaluate its own work against the project's goals and target audience, leading to a more refined final product.

SPECIAL GUEST!! Michael Stelzner from AI Explored 🔥 Claude > Custom GPT 😮 | Ep. 476

Do This, NOT That: Marketing Tips with Jay Schwedelson·a month ago

To Improve AI Writing Style, Directly Edit the Model's Output as an Inline Example

When an LLM produces text with the wrong style, re-prompting is often ineffective. A superior technique is to use a tool that allows you to directly edit the model's output. This act of editing creates a perfect, in-context example for the next turn, teaching the LLM your preferred style much more effectively than descriptive instructions.

Evals, error analysis, and better prompts: A systematic approach to improving your AI products | Hamel Husain (ML engineer)

How I AI·4 months ago

Refine Failing AI Prompts by Asking the LLM Itself to Critique and Rewrite Them

When a prompt yields poor results, use a meta-prompting technique. Feed the failing prompt back to the AI, describe the incorrect output, specify the desired outcome, and explicitly grant it permission to rewrite, add, or delete. The AI will then debug and improve its own instructions.

ChatGPT agent mode: The “little helper” that transformed recruiting, crafted user personas, and solved parking nightmares | Michal Peled (Honeybook)

How I AI·2 months ago

High-Signal Fine-Tuning Data Comes From the Difficult Examples Where Your AI Fails

Fine-tuning an AI model is most effective when you use high-signal data. The best source for this is the set of difficult examples where your system consistently fails. The processes of error analysis and evaluation naturally curate this valuable dataset, making fine-tuning a logical and powerful next step after prompt engineering.

Evals, error analysis, and better prompts: A systematic approach to improving your AI products | Hamel Husain (ML engineer)

How I AI·4 months ago

Mask Question Tokens During Fine-Tuning to Focus Learning on Answers

When fine-tuning a model for question-answering, tokenize questions and answers separately. Then, use a masking technique to force the training process to ignore the question tokens when calculating loss. This concentrates the model's learning on generating correct answers, improving training efficiency and focus.

Fine-Tuning LLMs: A Comprehensive Tutorial

Machine Learning Tech Brief By HackerNoon·16 days ago