Journalist Casey Newton uses AI tools not to write his columns, but to fact-check them after they're written. He finds that feeding his completed text into an LLM is a surprisingly effective way to catch factual errors, a significant improvement in model capability over the past year.
Simply creating an LLM judge prompt isn't enough. Before deploying it, you must test its alignment with human judgment. Run the judge on your manually labeled data and analyze the results in a confusion matrix. This helps you see where it disagrees with you (false positives/negatives) so you can refine the prompt and build trust.
A significant portion (30-50%) of statistics, news, and niche details from ChatGPT are inferred and not factually accurate. Users must be aware that even official-sounding stats can be completely fabricated, risking credibility in professional work like presentations.
To maintain quality, 6AM City's AI newsletters don't generate content from scratch. Instead, they use "extractive generative" AI to summarize information from existing, verified sources. This minimizes the risk of AI "hallucinations" and factual errors, which are common when AI is asked to expand upon a topic or create net-new content.
Before publishing, feed your work to an AI and ask it to find all potential criticisms and holes in your reasoning. This pre-publication stress test helps identify blind spots you would otherwise miss, leading to stronger, more defensible arguments.
When using LLMs to analyze unstructured data like interview transcripts, they often hallucinate compelling but non-existent quotes. To maintain integrity, always include a specific prompt instruction like "use quotes and cite your sources from the transcript for each quote." This forces the AI to ground its analysis in actual data.
Prompting a different LLM model to review code generated by the first one provides a powerful, non-defensive critique. This "second opinion" can rapidly identify architectural issues, bugs, and alternative approaches without the human ego involved in traditional code reviews.
Treat ChatGPT like a human assistant. Instead of manually editing its imperfect outputs, provide direct feedback and corrections within the chat. This trains the AI on your specific preferences, making it progressively more accurate and reducing your future workload.
AI models tend to be overly optimistic. To get a balanced market analysis, explicitly instruct AI research tools like Perplexity to act as a "devil's advocate." This helps uncover risks, challenge assumptions, and makes it easier for product managers to say "no" to weak ideas quickly.
Unlike consumer chatbots, AlphaSense's AI is designed for verification in high-stakes environments. The UI makes it easy to see the source documents for every claim in a generated summary. This focus on traceable citations is crucial for building the user confidence required for multi-billion dollar decisions.
Advanced AI tools like "deep research" models can produce vast amounts of information, like 30-page reports, in minutes. This creates a new productivity paradox: the AI's output capacity far exceeds a human's finite ability to verify sources, apply critical thought, and transform the raw output into authentic, usable insights.