
Introducing Vellum for Agents
Today we're introducing Vellum. All you do is chat and let Vellum build reliable Agents for you.
Drowning in AI information? Experts at Vellum distill the latest and greatest into bite-sized articles to keep you informed.

Today we're introducing Vellum. All you do is chat and let Vellum build reliable Agents for you.

Workflow Sandbox upgrades, Vellum Voice Input, compare agent building changes, and more

Breaking down OpenAI's GPT 5.2 model performance across coding, reasoning, and long-horizon planning.

How we used foreground, background, and code review agents to double engineering velocity

A practical guide to the leading AI voice agent platforms in 2026

Use this playbook to execute a battle tested strategy for AI transformation, that will make your business AI native.

Explore the top alternatives to Langchain for building production-grade AI apps and agents.

Learn about common architectures, frameworks and discover best practices for building agents from AI experts.

A practical guide to the top AI workflow platforms, with comparisons to help you choose the best fit for your team.

Learn about the current rate limits and strategies like exponential backoff and caching to help you avoid them.

Learn what OpenAI's logprobs are and how can you use them for your LLM applications

Authored by:

A look at AI's evolution from basic, rule-based systems to fully creative agentic workflows.

A choice dependent on specific needs, document types and business requirements.

See how GPT-5 performs across benchmarks; with a big focus on health

A practical guide to building production grade, multi agent AI systems using context engineering best practices.

A practical prompting guide to get GPT-5 to work for your use case.

We reviewed and compared 27 platforms to filter down the 15 best n8n alternatives in 2026 for your team's needs.

A guide to 2026’s best AI agent builder platforms to help you find your enterprises perfect agent building solution.

How verifiable mandates are creating a secure foundation for AI-driven commerce.

A practical guide to best 10 low-code AI workflow automation tools in 2026 to help you choose your team's best fit.

A practical guide to choosing the best AI agent framework for developers.

Explore this curated list of the top AI agent platforms for product managers to help find your ideal solution.

A 2026 guide to the top no-code AI workflow tools and how they compare to find your teams best fit.

The 2026 guide to the best AI workflow builders for automating, scaling, and governing business processes.

Complete 2026 guide to the top platforms to build, govern, and scale secure AI agents across the enterprise.

Guide to how anyone can design, build, and launch intelligent, no-code agents using Vellum

A deep dive into Google's latest model performance

A clear, honest comparison of Gumloop, n8n, and Vellum to help teams choose the right AI automation platform.

A deep dive and breakdown into Anthropic's latest flagship model Claude Opus 4.5

Workflow triggers, multimodal outputs, 40+ integrations, and other updates making agent building easier and faster.

Compare the top Gumloop alternatives of 2026 to pick the best solution for your automation use cases.

A report on the latest flagship model benchmarks and trends they signal for the AI agent space in 2026

Explore AI agent use cases to learn how to unlock AI ROI in your organization.

Native integrations, Agent Builder Threads, and upgrades that make agent building faster than ever in Vellum.

Four lessons from building an agent that builds other agents

Think your APM tool has your AI covered? Think again. LLMs need their own observability playbook.

A practical guide on understanding and implementing AI automations for all industries and teams.

A breakdown of OpenAI’s new Agent Builder and what it signals for the future of building and deploying AI agents.

Agent Builder (beta), Custom Nodes, AI Apps, and more for faster and more complex agent building in Vellum.

AI Apps turn your deployed Workflows into no-code apps your whole team can share to use directly in Vellum.

Exploring zero-shot & few-shot prompting: usage, application methods, and limits.

We break down when Chain-of-Thought adds value, when it doesn’t, and how to use it in today’s LLMs.

Compare top AI platforms for fast, reliable development in 2025.

Introducing Agent Node: Multi-tool use with automatic schema, loop logic and context tracking.

Learn about MCP UI and how it enables AI agents with the missing UI layer for the future of agentic commerce.

You can’t improve what you can’t see, so start tracking every decision your agent makes.

Why forcing LLMs to output structured data is a flawed paradigm, and what might come next for developers.

A practical guide to the top AI workflow platforms, with comparisons to help you choose the best fit for your team.

Learn how Coursemojo uses Vellum to unlock engineering productivity and deploy AI-powered edTech solutions faster.

Learn how Marveri's lawyers use Vellum to build and evaluate AI workflows and save countless engineering hours.

A practical guide to deploying agentic capabilities: what works, what doesn’t, and how to keep it reliable in prod.

Capture edge cases in production and fix them in couple of minutes without redeploying you application.

A practical guide to building effective LLM agents for yourself or your customers.

MCP-powered Agent Nodes, public Workflow sharing, and a new Workflow Console for easier, collaborative building.

Upgraded Environments, Workflow, and Prompt Builder plus a new Agent Node for faster and easier building on Vellum.

Building AI agents is 10x easier with 10,000+ tools and built-in LLM tooling support

Just another eval confirming 90% discount with highest performance from GPT-OSS 120b.

A curated list of best practices, techniques and practical advice on how to get better at prompt engineering.

LLMs carry hidden traits in their data and we have no idea how.

Go from idea to AI workflow in seconds and continue to build in the UI or your IDE.

A first-class way to manage your work across Development, Staging, and Production.

Complete control over the business logic and runtime of your AI workflows in Vellum.

You can’t have effective agents without context engineering.

Full control in code and real-time visibility in UI, built for teams shipping reliable AI.

AI Development needs a standard & we’re building it at Vellum

AI-powered features and easier ways to customize and build together, across both the SDK and visual builder.

What’s shaping AI products, agents, and infrastructure in 2025.

A side-by-side look at Humanloop and 10 other LLM platforms.

Learn how LLM and GenAI models compare, their differences, applications and use-cases

Helping a leading financial institution speed up legal reviews, without compromising quality.

Four core practices that enable teams to move 100x faster, without sacrificing reliability.

Analyzing the difference in performance, cost and speed between the world's best reasoning models.

Build a functional chatbot using Vellum AI Workflows and Lovable with just a few prompts.

We have a bunch of quality-of-life upgrades including protected tags, smoother Workflows, and more!

A quick guide to picking the right framework for testing your AI workflows.

Evaluating SOTA models if they can really reason

A wake up call to not underestimate the unique challenges of working with LLMs.

LLMs are stepping outside the sandbox. Should you let them?

Our biggest product feature drop ever: 27 updates in a single month (a Vellum record!)

Ground truths help build confidence, but they shouldn’t block progress.

Learn how DeepScribe uses Vellum to refine AI, act on feedback, and build clinician trust.

Time to see if I’ve automated myself out of a job.

See how Drata leveraged Vellum to build enterprise-grade AI workflows that enhance GRC automation.

This month we improved how you find models, preview Workflows SDK code, and more!

Support for IBM granite models in Vellum.

Comparing GPT-4.5 and Claude 3.7 Sonnet on cost, speed, SAT math equations, and adaptive reasoning skills.

Feels more natural, hallucinates less, can be persuaded—and it’s not a game-changer.

Learn how the latest Anthropic's model compares to similar top-tier reasoning models on the market.

Learn how Vellum enables Rely Health to rapidly build, test, and deploy AI-powered patient care solutions.

Discover how combining agents with RAG can make your AI workflows more context-aware, and proactive.

Vellum 2025: Workflows SDK Beta, self-serve org setup, and new model support!

Learn how to optimize prompt versioning, debug efficiently, and make real-time updates to boost AI performance.

Evaluating the 'thinking' of Claude 3.7 Sonnet and other reasoning models to understand how they really reason.

Explore how O1 and R1 perform on well-known reasoning puzzles—now tested in new contexts.

Learn how DeepSeek achieved OpenAI o1-level reasoning with pure RL and solved issues through multi-stage training.

Unwrap Vellum's latest features: optional inputs, error handling, JSON indexing!

Capture and use end-user feedback as ground truth data to improve your AI system’s accuracy.

Explore the fundamentals of neural scaling laws and discover the next frontier in AI model development.

Learn how OpenAI o1 compares to GPT-4o and Sonnet 3.5 on speed, math, reasoning and classification tasks.

Rate limiting and downtime are common issues with LLMs — here’s how to manage it in production.

Learn how the latest model from Meta, Llama 3.3 70b compares to GPT-4o on three tasks

Now you can run Llama 3.1 405b, with 200 t/s via SambaNova on Vellum!

Something special is coming, plus new models and quality of life improvements

Learn how to build modular, reusable, and version-controlled tools (subworkflows) to keep your workflows efficient.

Share your AI process in our 4-minute anonymous survey. Get early insights and a chance to win a MacBook M4 Pro.

Easily test your AI workflows with Vellum—generate tons of test cases automatically and catch those tricky edge case

Write and execute Python or TypeScript directly in your workflow

New debugging features for AI workflows to get visibility down to every decision and detail

Workflow execution timeline revamp, higher performance for evals, improved Map node debugging and more

Starting today, you can unlock 2,100 t/s with Llama 3.1 70B in Vellum for real-time AI apps.

We’re simplifying the complex world of AI development for teams of all sizes.

I’d Pay $2,000 Out of My Own Pocket to Keep Using Cursor - The tab + context is next level.

Workflow execution timeline revamp, higher performance for evals, improved Map node debugging and more

Discover how Glowing leverages Vellum's Workflows to create innovative AI solutions for the hospitality industry.

Learn how to use guardrails, online/offline evaluation metrics for various LLM use-cases.

Learn how to prompt OpenAI o1 models, understand their limits and the opportunities ahead.

Learn how to use Vellum to convert any PDF into CSV: Examples with invoice, restaurant menu and product spec.

Understand the latest benchmarks, their limitations, and how models compare.

More control with workflow replays, cost and latency tracking, and new Workflow Editor UI

Learn how Woflow sped up AI development by 50% — making it easier to handle errors, improve models and ship updates.

Learn how and when to JSON mode, structured outputs and function calling for your AI application.

Learn more about the expected GPT-5 features on improved reasoning, multimodality and accuracy on math & coding

Learn how to build an AI-powered Slackbot that can answer customer queries in real-time.

Explore how a leading EdTech company saves 50 eng hours per month and empowers everyone on the team to contribute.

Vellum now offers VPC installations for secure AI development in your cloud, keeping data private and compliant.

Learn critical strategies to build and launch AI systems quickly and reliably.

Learn how Odyseek used Vellum to simplify AI development and improve team collaboration.

Learn about the latest features and improvements shipped by the Vellum team in July.

Learn how combining knowledge graphs with vector stores can make your AI applications more accurate and reliable.

Discover How Llama 3.1 405b Stacks Up Against GPT-4o, Gemini 1.5 Pro, and Claude 3.5 Sonnet on Three Tasks

Explore Llama 3.1 70b's upgrades and see how it stacks up against same-tier closed-source models.

A comparison between the latest low cost, low latency models

Learn more about the latest updates at Vellum: Map Nodes, Inline Subworkflows, API updates and more

Learn how to enhance long-context prompts with corpus-in-context prompting and discover the best use-cases.

Learn how Claude 3.5 Sonnet compares to GPT4o on data extraction, classification and verbal reasoning tasks.

Read InvestInData's guest post on their decision to invest in Vellum.

Run Workflows from Node, evaluate function call outputs, Guardrail nodes, RAGAS metrics, image support & more.

Learn the key strategies and tools for building production-ready AI systems.

Learn to build an Agent that analyzes keywords, generates articles, and refines content to meet criteria.

How can I make my prompts better if I don't know the latest prompt engineering techniques?

Learn how GPT4o compares to GPT-4 Turbo on classification, reasoning and data extraction tasks.

Find out how Llama 3 70B stacks up against GPT-4 in terms of cost, speed, and performance on specific tasks.

Prompt editor, prompt blocks, reusable evaluation metrics, new models, and more.

Learn how Rentgrata used Vellum to evaluate their chatbot, and cut development time in half.

Discover what are the main differences between LangChain and LlamaIndex, and when to use them.

Learn how RAG compares to fine-tuning and the impact of both model techniques on LLM performance.

Learn how to use OpenAI function calling in your AI apps to enable reliable, structured outputs.

Learn how Drata used Vellum to quickly validate AI ideas, and speed up AI development.

Iterating on prompts using OpenAI's playground & Azure AI studio was challenging, until Autobound discovered Vellum.

Discover how Redfin used Vellum to develop and evaluate a production-ready AI assistant, now live in 14 markets.

Explore Opus and GPT4's performance in tasks like summarization, graph interpretation, math, coding, and more.

Subworkflow nodes, image support in the UI, error nodes, node mocking, workflow graphs and so much more.

Learn how Vellum is helping their team to iterate faster and build reliable AI Assistants for health and wellness.

Learn how to use Tiktoken and Vellum to programmatically count tokens before running OpenAI API requests.

Learn how to improve LLM outputs, and make your setup more reliable using prompt chaining.

Will long context replace RAG? An analysis of the pros and cons of both approaches.

SOC 2 Type 2 Compliant, Prompt Node retries, Evaluation reports, Custom release tags, Cloning workflow nodes & more.

Learn how to use retrieval and content generation metrics to consistently evaluate and improve your RAG system.

Enhanced prompt comparison, more metrics, flexibility, and new reports for effective LLM evaluation.

Learn prompt engineering tips on how to make GPT-3.5 perform as good as GPT-4.

Tips to most effectively use memory for your LLM chatbot.

Learn how Lavender develops and manages more than 20 LLM features in production.

January: Folders, tracking usage, better collaboration, more OpenAI controls, image support.

Learn how to prompt Claude with these 11 prompt engineering tips.

Learn how Vellum helped Codingscape to ship AI apps quicker and win more projects.

Learn how successful companies develop reliable AI products by following a proven approach.

Learn how to build and evaluate intent handler logic in your chatbot workflow

Introducing a new way to invoke your Vellum stored prompts!

Methods and techniques to reduce hallucinations and maintain more reliable LLMs in production.

What is LLM hallucination & the four most common hallucination types and the causes for them

December: fine-grained control over your prompt release process, powerful new APIs for executing Prompts, and more

Comparing the performance of Gemini Pro with zero and few shot prompting when classifying customer support tickets

Comparing GPT3.5 Turbo, GPT-4 Turbo, Claude, and Gemini Pro on classifying customer support tickets.

Learn how to use Tree of Thought prompting to improve LLM results

November: major Test Suite improvements, arbitrary code execution, and new models!

Discover how recent OpenAI developments have influenced user confidence and interest in OpenAI alternatives

Step-by-step instructions for configuring OpenAI on Azure

Assistants API: Easy assistant setup with memory management - but what's under the hood?

How to use Multimodal AI models to build apps that solve new tasks and offer unique experiences for end users.

LLMs can label data at the same or better quality compared to human annotators, but ~20x faster and ~7x cheaper.

October: universal LLM support, new Test Suite metrics, and performance

Learn how Vellum helped Narya.AI save time and make AI easy for everyone on their team.

How Miri built a powerful chat experience using Vellum's platform

September is full of enhancements to Workflows, Security, Support, and more!

Collaborating with colleagues to test prompts yields good results but it's challenging.

August brings the introduction of Vellum Workflows, Metadata Filtering in Search, and a new design

Rag vs Fine-Tuning vs Prompt Engineering: Learn how to pick which one is the best option for your use-case.

We did an analysis comparing the latency of OpenAI, Anthropic and Google. Here are the results!

Vellum Workflows help you quickly prototype, deploy, and manage complex chains of LLM calls

Learn how Left Field Labs used Vellum for LLM prompt versioning, evaluation and monitoring once in production.

Dynamically swapping LoRA weights can significantly lower costs of a fine tuned model

We've continued to build our platform more, here's a look at the latest from us and a sneak peak of what's coming!

Why fine tuning is now relevant with open source models

We've raised $5m to double down on our mission to help companies build production use cases of LLMs

If you’re versioning in Jupyter notebooks or Google Docs, running custom scripts for testing, you need to read this

Tips on how to monitor your in-production LLM traffic

We've shipped a lot of features recently, here's a look at the latest updates from us!

Tips to experiment with your LLM related prompts

Details about how to best leverage the Vellum <> LlamaIndex integration

Use Vellum Test Suites to test the quality of prompts in bulk before production. Unit testing for LLMs is here!

Compare model quality across OpenAI's GPT-4, Anthropic's Claude and now Google's PaLM LLM in our platform

Vellum Search, the latest addition to our platform helps companies use proprietary data in LLM applications

Despite high potential, LLMs are not a one-size-fits all solution. Choosing the right use case for LLMs is important

Fine-tuning can provide significant benefits in cost, quality & latency when compared to prompting

We’re excited to publicly announce the start of our new adventure: Vellum