Guides

Are you looking to build with AI? This is your go-to spot for all the resources you need. From step-by-step guides to deep technical analysis, we've got everything to help you build AI apps.

All All Best Practices Case Studies [OLD]Customer Stories Guides LLM basics Model Comparisons Model Comparisons [OLD]Product Updates Product Updates [OLD]

Guides

AI transformation playbook

Use this playbook to execute a battle tested strategy for AI transformation, that will make your business AI native.

Nicolas Zeeb

Dec 9, 202515 min min read

Guides

The Top LangChain Alternatives in 2026

Explore the top alternatives to Langchain for building production-grade AI apps and agents.

Nicolas Zeeb

Dec 4, 202511 min min read

Guides

The 2026 Guide to AI Agent Workflows

Learn about common architectures, frameworks and discover best practices for building agents from AI experts.

Dec 4, 2025

Guides

How to Manage OpenAI Rate Limits as You Scale Your App?

Learn about the current rate limits and strategies like exponential backoff and caching to help you avoid them.

Mathew Pregasen

Dec 3, 20258 min min read

Guides

Understanding Logprobs: What They Are and How to Use Them

Learn what OpenAI's logprobs are and how can you use them for your LLM applications

Anita Kirkovska

Dec 3, 20258 min min read

Guides

100 Must-Know AI Facts and Statistics for 2025

Authored by:

Mathew Pregasen

Dec 3, 202510 min min read

Guides

The Six Levels of Agentic Behavior

A look at AI's evolution from basic, rule-based systems to fully creative agentic workflows.

Dec 3, 20255 min min read

Guides

Document Data Extraction in 2026: LLMs vs OCRs

A choice dependent on specific needs, document types and business requirements.

Anita Kirkovska

Dec 3, 20257 min min read

Guides

GPT-5 Benchmarks

See how GPT-5 performs across benchmarks; with a big focus on health

Anita Kirkovska

Dec 3, 20257 min min read

Guides

Best practices for building AI multi agent system

A practical guide to building production grade, multi agent AI systems using context engineering best practices.

Nicolas Zeeb

Dec 3, 20257 min min read

Guides

How to write effective prompts for GPT-5

A practical prompting guide to get GPT-5 to work for your use case.

Dec 3, 20258 min min read

Guides

Google's AP2: A new protocol for AI agent payments

How verifiable mandates are creating a secure foundation for AI-driven commerce.

Anita Kirkovska

Dec 3, 20254 min min read

Guides

Google Gemini 3 Benchmarks

A deep dive into Google's latest model performance

Nicolas Zeeb

Dec 3, 20258 min min read

Guides

Claude Opus 4.5 Benchmarks

A deep dive and breakdown into Anthropic's latest flagship model Claude Opus 4.5

Nicolas Zeeb

Dec 3, 20257 min min read

Guides

A Guide to LLM Observability

Think your APM tool has your AI covered? Think again. LLMs need their own observability playbook.

Anita Kirkovska

Oct 17, 202510 min min read

Guides

A practical guide to AI automation

A practical guide on understanding and implementing AI automations for all industries and teams.

Nicolas Zeeb

Oct 6, 202515 min read

Guides

Zero-Shot vs Few-Shot prompting: A Guide with Examples

Exploring zero-shot & few-shot prompting: usage, application methods, and limits.

Anita Kirkovska

Sep 23, 20257 min min read

Guides

Chain of Thought Prompting (CoT): Everything you need to know

We break down when Chain-of-Thought adds value, when it doesn’t, and how to use it in today’s LLMs.

Anita Kirkovska

Sep 22, 20255 min min read

Guides

Understanding your agent’s behavior in production

You can’t improve what you can’t see, so start tracking every decision your agent makes.

Anita Kirkovska

Sep 15, 20257 min min read

Guides

We don’t speak JSON

Why forcing LLMs to output structured data is a flawed paradigm, and what might come next for developers.

David Vargas

Sep 15, 20256 min min read

Guides

How can agentic capabilities be deployed in production today?

A practical guide to deploying agentic capabilities: what works, what doesn’t, and how to keep it reliable in prod.

Anita Kirkovska

Sep 7, 20254 min read

Guides

How to continuously improve your AI Assistant using Vellum

Capture edge cases in production and fix them in couple of minutes without redeploying you application.

Sep 7, 20255 min min read

Guides

Partnering with Composio to Help You Build Better AI Agents

Building AI agents is 10x easier with 10,000+ tools and built-in LLM tooling support

Anita Kirkovska

Aug 12, 20256 min min read

Guides

How to craft effective prompts

A curated list of best practices, techniques and practical advice on how to get better at prompt engineering.

Anita Kirkovska

Aug 5, 2025

Guides

Subliminal Learning in LLMs

LLMs carry hidden traits in their data and we have no idea how.

Anita Kirkovska

Jul 27, 2025

Guides

Why ‘Context Engineering’ is the New Frontier for AI Agents

You can’t have effective agents without context engineering.

Lee Gaul

Jul 15, 20257 min min read

Guides

10 Humanloop Alternatives in 2025

A side-by-side look at Humanloop and 10 other LLM platforms.

Anita Kirkovska

Jun 3, 20255 min min read

Guides

How the Best Product and Engineering Teams Ship AI Solutions

Four core practices that enable teams to move 100x faster, without sacrificing reliability.

Mathew Pregasen

May 28, 20257 min min read

Guides

How to connect a Vellum AI Workflow with your Lovable app

Build a functional chatbot using Vellum AI Workflows and Lovable with just a few prompts.

Anita Kirkovska

May 13, 20256 min min read

Guides

How to evaluate an LLM evaluation framework

A quick guide to picking the right framework for testing your AI workflows.

Anita Kirkovska

Apr 24, 20256 min min read

Guides

Four Reasons Enterprise AI Projects Get Stuck

A wake up call to not underestimate the unique challenges of working with LLMs.

Anita Kirkovska

Apr 14, 20256 min min read

Guides

MCP: The Hype vs. Reality

LLMs are stepping outside the sandbox. Should you let them?

Anita Kirkovska

Apr 9, 20255 min min read

Guides

How to evaluate your AI product if you don’t have ground truth data

Ground truths help build confidence, but they shouldn’t block progress.

Aaron Levin

Mar 28, 20255 min min read

Guides

Automating PR Reviews for Dummies

Time to see if I’ve automated myself out of a job.

Pei Li

Mar 19, 2025

Guides

GPT 4.5 is here: Better, but not the best

Feels more natural, hallucinates less, can be persuaded—and it’s not a game-changer.

Anita Kirkovska

Feb 27, 2025

Guides

Claude 3.7 Sonnet: Can It Actually Reason?

Evaluating the 'thinking' of Claude 3.7 Sonnet and other reasoning models to understand how they really reason.

Anita Kirkovska

Jan 30, 20255 min min read

Guides

Breaking down the DeepSeek-R1 training process—no PhD required

Learn how DeepSeek achieved OpenAI o1-level reasoning with pure RL and solved issues through multi-stage training.

Anita Kirkovska

Jan 24, 202510 min min read

Guides

AI Model Scaling Isn’t Over—It’s Entering a New Era

Explore the fundamentals of neural scaling laws and discover the next frontier in AI model development.

Dec 27, 20247 min min read

Guides

What to do when an LLM request fails

Rate limiting and downtime are common issues with LLMs — here’s how to manage it in production.

Anita Kirkovska

Dec 16, 20245 min min read

Guides

Synthetic Test Case Generation for LLM Evaluation

Easily test your AI workflows with Vellum—generate tons of test cases automatically and catch those tricky edge case

Nico Finelli

Nov 20, 20244min min read

Guides

Reintroducing Vellum for 2025

We’re simplifying the complex world of AI development for teams of all sizes.

Akash Sharma

Oct 10, 20245 min min read

Guides

Cursor AI is god tier

I’d Pay $2,000 Out of My Own Pocket to Keep Using Cursor - The tab + context is next level.

Oct 1, 20244 min min read

Guides

LLM Evaluation: Key Metrics and Strategies for Every Use Case

Learn how to use guardrails, online/offline evaluation metrics for various LLM use-cases.

Akash Sharma

Sep 17, 2024

Guides

OpenAI o1: Prompting Tips, Limitations, and Capabilities

Learn how to prompt OpenAI o1 models, understand their limits and the opportunities ahead.

Anita Kirkovska

Sep 13, 2024

Guides

Tutorial: How to Convert Any PDF to CSV?

Learn how to use Vellum to convert any PDF into CSV: Examples with invoice, restaurant menu and product spec.

Aaron Levin

Sep 12, 2024

Guides

LLM Benchmarks: Overview, Limits and Model Comparison

Understand the latest benchmarks, their limitations, and how models compare.

Anita Kirkovska

Sep 11, 2024

Guides

When should I use function calling, structured outputs or JSON mode?

Learn how and when to JSON mode, structured outputs and function calling for your AI application.

Akash Sharma

Sep 6, 2024

Guides

GPT-5: What should we expect?

Learn more about the expected GPT-5 features on improved reasoning, multimodality and accuracy on math & coding

Aug 30, 2024

Guides

How I Built an AI-Powered SlackBot for Customer Support

Learn how to build an AI-powered Slackbot that can answer customer queries in real-time.

Aaron Levin

Aug 30, 2024

Guides

Announcing Vellum VPC

Vellum now offers VPC installations for secure AI development in your cloud, keeping data private and compliant.

Akash Sharma

Aug 27, 2024

Guides

The 6 Stages for Successful AI Implementation

Learn critical strategies to build and launch AI systems quickly and reliably.

Anita Kirkovska

Aug 20, 2024

Guides

GraphRAG: Improving RAG with Knowledge Graphs

Learn how combining knowledge graphs with vector stores can make your AI applications more accurate and reliable.

Aug 2, 2024

Guides

How to Optimize Long Prompts with Corpus-In-Context Prompting

Learn how to enhance long-context prompts with corpus-in-context prompting and discover the best use-cases.

Mathew Pregasen

Jul 8, 2024

Guides

Announcing an Investment from InvestInData

Read InvestInData's guest post on their decision to invest in Vellum.

InvestInData

Jun 11, 2024

Guides

What is Required for a Reliable AI System?

Learn the key strategies and tools for building production-ready AI systems.

Akash Sharma

Jun 4, 2024

Guides

Building an AI Agent for SEO Research and Content Generation

Learn to build an Agent that analyzes keywords, generates articles, and refines content to meet criteria.

May 29, 2024

Guides

The FaceMash of Prompt Evaluation

How can I make my prompts better if I don't know the latest prompt engineering techniques?

Pei Li

May 22, 2024

Guides

LlamaIndex vs LangChain Comparison

Discover what are the main differences between LangChain and LlamaIndex, and when to use them.

Anita Kirkovska

May 1, 2024

Guides

RAG vs Fine-Tuning: How to Choose the Right Technique?

Learn how RAG compares to fine-tuning and the impact of both model techniques on LLM performance.

Anita Kirkovska

Apr 30, 2024

Guides

Tutorial: Setting Up OpenAI Function Calling with Chat Models

Learn how to use OpenAI function calling in your AI apps to enable reliable, structured outputs.

Anita Kirkovska

Apr 23, 2024

Guides

How to Count Tokens Before you Send an OpenAI API Request

Learn how to use Tiktoken and Vellum to programmatically count tokens before running OpenAI API requests.

Anita Kirkovska

Mar 27, 2024

Guides

Getting Started with Prompt Chaining

Learn how to improve LLM outputs, and make your setup more reliable using prompt chaining.

Anita Kirkovska

Mar 26, 2024

Guides

RAG vs Long Context?

Will long context replace RAG? An analysis of the pros and cons of both approaches.

Mar 20, 2024

Guides

How to Evaluate Your RAG System?

Learn how to use retrieval and content generation metrics to consistently evaluate and improve your RAG system.

Anita Kirkovska

Mar 8, 2024

Guides

How can I get GPT-3.5 Turbo to follow instructions like GPT-4?

Learn prompt engineering tips on how to make GPT-3.5 perform as good as GPT-4.

Anita Kirkovska

Feb 15, 2024

Guides

How Should I Manage Memory for my LLM Chatbot?

Tips to most effectively use memory for your LLM chatbot.

Akash Sharma

Feb 14, 2024

Guides

Prompt Engineering Guide for Claude Models

Learn how to prompt Claude with these 11 prompt engineering tips.

Anita Kirkovska

Feb 2, 2024

Guides

The Four Pillars of Building LLM Applications for Production

Learn how successful companies develop reliable AI products by following a proven approach.

Akash Sharma

Jan 29, 2024

Guides

How can I use LLMs to classify user intents for my chatbot?

Learn how to build and evaluate intent handler logic in your chatbot workflow

Anita Kirkovska

Jan 11, 2024

Guides

3 Strategies to Reduce LLM Hallucinations

Methods and techniques to reduce hallucinations and maintain more reliable LLMs in production.

Anita Kirkovska

Jan 3, 2024

Guides

Four LLM hallucinations and ways to fix them

What is LLM hallucination & the four most common hallucination types and the causes for them

Anita Kirkovska

Jan 1, 2024

Guides

Classifying Customer Tickets using Gemini Pro

Comparing the performance of Gemini Pro with zero and few shot prompting when classifying customer support tickets

Anita Kirkovska

Dec 20, 2023

Guides

Tree of Thought Prompting: What It Is and How to Use It

Learn how to use Tree of Thought prompting to improve LLM results

Anita Kirkovska

Nov 30, 2023

Guides

User Confidence in OpenAI vs. Alternative models/providers

Discover how recent OpenAI developments have influenced user confidence and interest in OpenAI alternatives

Anita Kirkovska

Nov 28, 2023

Guides

Setting Up an OpenAI Model on Microsoft Azure

Step-by-step instructions for configuring OpenAI on Azure

Noa Flaherty

Nov 20, 2023

Guides

First impressions with the Assistants API

Assistants API: Easy assistant setup with memory management - but what's under the hood?

Anita Kirkovska

Nov 16, 2023

Guides

The ABC’s of Multimodal AI: Models, tasks and use-cases

How to use Multimodal AI models to build apps that solve new tasks and offer unique experiences for end users.

Anita Kirkovska

Nov 6, 2023

Guides

Automatic data labeling with LLMs

LLMs can label data at the same or better quality compared to human annotators, but ~20x faster and ~7x cheaper.

Anita Kirkovska

Nov 2, 2023

Guides

Why is collaborating on Prompt Engineering so difficult?

Collaborating with colleagues to test prompts yields good results but it's challenging.

Akash Sharma

Sep 27, 2023

Guides

Should I use Prompting, RAG or Fine-tuning?

Rag vs Fine-Tuning vs Prompt Engineering: Learn how to pick which one is the best option for your use-case.

Akash Sharma

Aug 31, 2023

Guides

How we cut model costs by >90% by swapping LoRA weights dynamically

Dynamically swapping LoRA weights can significantly lower costs of a fine tuned model

Sidd Seethepalli

Aug 3, 2023

Guides

Fine-tuning open source models: why is it relevant now?

Why fine tuning is now relevant with open source models

Akash Sharma

Jul 20, 2023

Guides

My prompt is in production: now what should I do?

Tips on how to monitor your in-production LLM traffic

Akash Sharma

Jun 19, 2023

Guides

Testing LLM applications features - before & after production

Tips to experiment with your LLM related prompts

Akash Sharma

Jun 12, 2023

Guides

Great (and not so great) use cases of Large Language Models

Despite high potential, LLMs are not a one-size-fits all solution. Choosing the right use case for LLMs is important

Akash Sharma

Feb 27, 2023

Guides

When to use fine-tuning?

Fine-tuning can provide significant benefits in cost, quality & latency when compared to prompting

Akash Sharma

Feb 7, 2023