The Latest And Greatest in Gen AI Learnings

Drowning in AI information? Experts at Vellum distill the latest and greatest into bite-sized articles to keep you informed.

All All Best Practices Case Studies [OLD]Customer Stories Guides LLM basics Model Comparisons Model Comparisons [OLD]Product Updates Product Updates [OLD]

Product Updates

Introducing Vellum for Agents

Today we're introducing Vellum. All you do is chat and let Vellum build reliable Agents for you.

Akash Sharma

Jan 13, 20265 min min read

Vellum Product Update | December

Workflow Sandbox upgrades, Vellum Voice Input, compare agent building changes, and more

Nicolas Zeeb

Jan 10, 20268 min min read

LLM basics

GPT-5.2 Benchmarks

Breaking down OpenAI's GPT 5.2 model performance across coding, reasoning, and long-horizon planning.

Nicolas Zeeb

Dec 12, 20258 min min read

All

How we use coding agents to 2x engineering output

How we used foreground, background, and code review agents to double engineering velocity

Nicolas Zeeb

Dec 12, 20257 min min read

LLM basics

AI Voice Agent Platforms Guide

A practical guide to the leading AI voice agent platforms in 2026

Nicolas Zeeb

Dec 11, 202512 min min read

Guides

AI transformation playbook

Use this playbook to execute a battle tested strategy for AI transformation, that will make your business AI native.

Nicolas Zeeb

Dec 9, 202515 min min read

Guides

The Top LangChain Alternatives in 2026

Explore the top alternatives to Langchain for building production-grade AI apps and agents.

Nicolas Zeeb

Dec 4, 202511 min min read

Guides

The 2026 Guide to AI Agent Workflows

Learn about common architectures, frameworks and discover best practices for building agents from AI experts.

Dec 4, 2025

LLM basics

Top 12 AI Workflow Platforms

A practical guide to the top AI workflow platforms, with comparisons to help you choose the best fit for your team.

Nicolas Zeeb

Dec 4, 20258 min min read

Guides

How to Manage OpenAI Rate Limits as You Scale Your App?

Learn about the current rate limits and strategies like exponential backoff and caching to help you avoid them.

Mathew Pregasen

Dec 3, 20258 min min read

Guides

Understanding Logprobs: What They Are and How to Use Them

Learn what OpenAI's logprobs are and how can you use them for your LLM applications

Anita Kirkovska

Dec 3, 20258 min min read

Guides

100 Must-Know AI Facts and Statistics for 2025

Authored by:

Mathew Pregasen

Dec 3, 202510 min min read

Guides

The Six Levels of Agentic Behavior

A look at AI's evolution from basic, rule-based systems to fully creative agentic workflows.

Dec 3, 20255 min min read

Guides

Document Data Extraction in 2026: LLMs vs OCRs

A choice dependent on specific needs, document types and business requirements.

Anita Kirkovska

Dec 3, 20257 min min read

Guides

GPT-5 Benchmarks

See how GPT-5 performs across benchmarks; with a big focus on health

Anita Kirkovska

Dec 3, 20257 min min read

Guides

Best practices for building AI multi agent system

A practical guide to building production grade, multi agent AI systems using context engineering best practices.

Nicolas Zeeb

Dec 3, 20257 min min read

Guides

How to write effective prompts for GPT-5

A practical prompting guide to get GPT-5 to work for your use case.

Dec 3, 20258 min min read

LLM basics

15 Best n8n Alternatives: Reviewed & Compared

We reviewed and compared 27 platforms to filter down the 15 best n8n alternatives in 2026 for your team's needs.

Nicolas Zeeb

Dec 3, 202512 min min read

LLM basics

Top 13 AI Agent Builder Platforms for Enterprises

A guide to 2026’s best AI agent builder platforms to help you find your enterprises perfect agent building solution.

Nicolas Zeeb

Dec 3, 202510 min min read

Guides

Google's AP2: A new protocol for AI agent payments

How verifiable mandates are creating a secure foundation for AI-driven commerce.

Anita Kirkovska

Dec 3, 20254 min min read

LLM basics

Top low‑code AI workflow automation tools

A practical guide to best 10 low-code AI workflow automation tools in 2026 to help you choose your team's best fit.

Nicolas Zeeb

Dec 3, 20257 min min read

LLM basics

The Best AI Agent Frameworks For Developers

A practical guide to choosing the best AI agent framework for developers.

Nicolas Zeeb

Dec 3, 20258 min min read

LLM basics

Top Low-code AI Agent Platforms for Product Managers

Explore this curated list of the top AI agent platforms for product managers to help find your ideal solution.

Nicolas Zeeb

Dec 3, 20258 min min read

LLM basics

Complete Guide to No Code AI Workflow Automation Tools

A 2026 guide to the top no-code AI workflow tools and how they compare to find your teams best fit.

Nicolas Zeeb

Dec 3, 20258 min min read

LLM basics

The Best AI Workflow Builders for Automating Business Processes

The 2026 guide to the best AI workflow builders for automating, scaling, and governing business processes.

Nicolas Zeeb

Dec 3, 20257 min min read

LLM basics

The Top Enterprise AI Automation Platforms (Guide)

Complete 2026 guide to the top platforms to build, govern, and scale secure AI agents across the enterprise.

Nicolas Zeeb

Dec 3, 20258 min min read

LLM basics

Beginners Guide to Building AI Agents

Guide to how anyone can design, build, and launch intelligent, no-code agents using Vellum

Nicolas Zeeb

Dec 3, 20257 min min read

Guides

Google Gemini 3 Benchmarks

A deep dive into Google's latest model performance

Nicolas Zeeb

Dec 3, 20258 min min read

LLM basics

Gumloop vs. n8n vs. Vellum (Platform Comparison)

A clear, honest comparison of Gumloop, n8n, and Vellum to help teams choose the right AI automation platform.

Nicolas Zeeb

Dec 3, 202510 min min read

Guides

Claude Opus 4.5 Benchmarks

A deep dive and breakdown into Anthropic's latest flagship model Claude Opus 4.5

Nicolas Zeeb

Dec 3, 20257 min min read

Product Updates

Vellum Product Update | November

Workflow triggers, multimodal outputs, 40+ integrations, and other updates making agent building easier and faster.

Nicolas Zeeb

Dec 3, 202512 min min read

LLM basics

Gumloop Alternatives (Reviewed & Explained)

Compare the top Gumloop alternatives of 2026 to pick the best solution for your automation use cases.

Nicolas Zeeb

Nov 27, 202514 min min read

Model Comparisons

Flagship Model Report: Gpt-5.1 vs Gemini 3 Pro vs Claude Opus 4.5

A report on the latest flagship model benchmarks and trends they signal for the AI agent space in 2026

Nicolas Zeeb

Nov 27, 202518 min min read

AI Agent Use Cases Guide to Unlock AI ROI

Explore AI agent use cases to learn how to unlock AI ROI in your organization.

Nicolas Zeeb

Nov 11, 202515 min min read

Product Updates

Vellum Product Update | October

Native integrations, Agent Builder Threads, and upgrades that make agent building faster than ever in Vellum.

Noa Flaherty

Nov 5, 20257 min min read

All

I’m done building AI agents

Four lessons from building an agent that builds other agents

Sidd Seethepalli

Nov 3, 20256 min min read

Guides

A Guide to LLM Observability

Think your APM tool has your AI covered? Think again. LLMs need their own observability playbook.

Anita Kirkovska

Oct 17, 202510 min min read

Guides

A practical guide to AI automation

A practical guide on understanding and implementing AI automations for all industries and teams.

Nicolas Zeeb

Oct 6, 202515 min read

All

OpenAI's Agent Builder Explained

A breakdown of OpenAI’s new Agent Builder and what it signals for the future of building and deploying AI agents.

Anita Kirkovska

Oct 6, 20256 min min read

Product Updates

Vellum Product Update | September

Agent Builder (beta), Custom Nodes, AI Apps, and more for faster and more complex agent building in Vellum.

Noa Flaherty

Oct 1, 20257 min min read

Product Updates

Introducing AI Apps: A new interface to interact with AI workflows

AI Apps turn your deployed Workflows into no-code apps your whole team can share to use directly in Vellum.

Nicolas Zeeb

Sep 24, 20257 min min read

Guides

Zero-Shot vs Few-Shot prompting: A Guide with Examples

Exploring zero-shot & few-shot prompting: usage, application methods, and limits.

Anita Kirkovska

Sep 23, 20257 min min read

Guides

Chain of Thought Prompting (CoT): Everything you need to know

We break down when Chain-of-Thought adds value, when it doesn’t, and how to use it in today’s LLMs.

Anita Kirkovska

Sep 22, 20255 min min read

LLM basics

Build AI Products Faster: Top Development Platforms Compared

Compare top AI platforms for fast, reliable development in 2025.

Anita Kirkovska

Sep 19, 20258 min min read

Product Updates

Built-In Tool Calling for Complex Agent Workflows

Introducing Agent Node: Multi-tool use with automatic schema, loop logic and context tracking.

David Vargas

Sep 18, 20255 mins min read

All

MCP UI & The Future of Agentic Commerce

Learn about MCP UI and how it enables AI agents with the missing UI layer for the future of agentic commerce.

Nicolas Zeeb

Sep 16, 202512 min min read

Guides

Understanding your agent’s behavior in production

You can’t improve what you can’t see, so start tracking every decision your agent makes.

Anita Kirkovska

Sep 15, 20257 min min read

Guides

We don’t speak JSON

Why forcing LLMs to output structured data is a flawed paradigm, and what might come next for developers.

David Vargas

Sep 15, 20256 min min read

LLM basics

Top 12 AI Workflow Platforms

A practical guide to the top AI workflow platforms, with comparisons to help you choose the best fit for your team.

Nicolas Zeeb

Sep 12, 20258 min min read

Customer Stories

How Coursemojo Sped Up AI Delivery by 6+ Months

Learn how Coursemojo uses Vellum to unlock engineering productivity and deploy AI-powered edTech solutions faster.

Nicolas Zeeb

Sep 9, 20255 min min read

Customer Stories

How Marveri enabled lawyers to shape AI products without blocking developers

Learn how Marveri's lawyers use Vellum to build and evaluate AI workflows and save countless engineering hours.

Nicolas Zeeb

Sep 8, 20258 min read

Guides

How can agentic capabilities be deployed in production today?

A practical guide to deploying agentic capabilities: what works, what doesn’t, and how to keep it reliable in prod.

Anita Kirkovska

Sep 7, 20254 min read

Guides

How to continuously improve your AI Assistant using Vellum

Capture edge cases in production and fix them in couple of minutes without redeploying you application.

Sep 7, 20255 min min read

LLM basics

The ultimate LLM agent build guide

A practical guide to building effective LLM agents for yourself or your customers.

Nicolas Zeeb

Sep 4, 202515 min read

Product Updates

Vellum Product Update | August

MCP-powered Agent Nodes, public Workflow sharing, and a new Workflow Console for easier, collaborative building.

Noa Flaherty

Sep 3, 20256 min min read

Product Updates

Vellum Product Update | July

Upgraded Environments, Workflow, and Prompt Builder plus a new Agent Node for faster and easier building on Vellum.

Noa Flaherty

Aug 12, 20255 min min read

Guides

Partnering with Composio to Help You Build Better AI Agents

Building AI agents is 10x easier with 10,000+ tools and built-in LLM tooling support

Anita Kirkovska

Aug 12, 20256 min min read

Model Comparisons

OpenAI o3 vs gpt-oss 120b

Just another eval confirming 90% discount with highest performance from GPT-OSS 120b.

Anita Kirkovska

Aug 6, 20257 min min read

Guides

How to craft effective prompts

A curated list of best practices, techniques and practical advice on how to get better at prompt engineering.

Anita Kirkovska

Aug 5, 2025

Guides

Subliminal Learning in LLMs

LLMs carry hidden traits in their data and we have no idea how.

Anita Kirkovska

Jul 27, 2025

Product Updates

Introducing Vellum Agent Builder

Go from idea to AI workflow in seconds and continue to build in the UI or your IDE.

Anita Kirkovska

Jul 18, 20257 min min read

Product Updates

Introducing Environments in Vellum: Isolate, Promote, and Deploy with Confidence

A first-class way to manage your work across Development, Staging, and Production.

Akash Sharma

Jul 17, 20256 min min read

Product Updates

Introducing Custom Docker Images & Custom Nodes

Complete control over the business logic and runtime of your AI workflows in Vellum.

Anita Kirkovska

Jul 15, 2025

Guides

Why ‘Context Engineering’ is the New Frontier for AI Agents

You can’t have effective agents without context engineering.

Lee Gaul

Jul 15, 20257 min min read

Product Updates

Vellum Workflows SDK is Generally Available

Full control in code and real-time visibility in UI, built for teams shipping reliable AI.

Akash Sharma

Jul 14, 2025

All

Announcing our $20m Series A

AI Development needs a standard & we’re building it at Vellum

Akash Sharma

Jul 10, 20255 min min read

Product Updates

Vellum Product Update | May & June

AI-powered features and easier ways to customize and build together, across both the SDK and visual builder.

Noa Flaherty

Jul 1, 20256 min min read

LLM basics

Big Ideas from the AI Engineer World’s Fair

What’s shaping AI products, agents, and infrastructure in 2025.

Anita Kirkovska

Jun 8, 20255 min min read

Guides

10 Humanloop Alternatives in 2025

A side-by-side look at Humanloop and 10 other LLM platforms.

Anita Kirkovska

Jun 3, 20255 min min read

LLM basics

GenAI vs LLM: The Basics, Differences, and Best Uses

Learn how LLM and GenAI models compare, their differences, applications and use-cases

Mathew Pregasen

Jun 2, 2025

Customer Stories

How GravityStack Cut Credit Agreement Review Time by 200% with Agentic AI

Helping a leading financial institution speed up legal reviews, without compromising quality.

Akash Sharma

May 30, 20255 min min read

Guides

How the Best Product and Engineering Teams Ship AI Solutions

Four core practices that enable teams to move 100x faster, without sacrificing reliability.

Mathew Pregasen

May 28, 20257 min min read

Model Comparisons

Evaluation: Claude 4 Sonnet vs OpenAI o4-mini vs Gemini 2.5 Pro

Analyzing the difference in performance, cost and speed between the world's best reasoning models.

Anita Kirkovska

May 23, 20258 min min read

Guides

How to connect a Vellum AI Workflow with your Lovable app

Build a functional chatbot using Vellum AI Workflows and Lovable with just a few prompts.

Anita Kirkovska

May 13, 20256 min min read

Product Updates

Vellum Product Update | April 2025

We have a bunch of quality-of-life upgrades including protected tags, smoother Workflows, and more!

Sharon Toh

May 1, 20253 min min read

Guides

How to evaluate an LLM evaluation framework

A quick guide to picking the right framework for testing your AI workflows.

Anita Kirkovska

Apr 24, 20256 min min read

LLM basics

How does MCP work

and how you can build one with Vellum

Apr 22, 20258 min min read

Evaluating models on adaptive reasoning, SAT questions & real-world classification tasks

Evaluating SOTA models if they can really reason

Anita Kirkovska

Apr 14, 20255 min min read

Guides

Four Reasons Enterprise AI Projects Get Stuck

A wake up call to not underestimate the unique challenges of working with LLMs.

Anita Kirkovska

Apr 14, 20256 min min read

Guides

MCP: The Hype vs. Reality

LLMs are stepping outside the sandbox. Should you let them?

Anita Kirkovska

Apr 9, 20255 min min read

Product Updates

Vellum Product Update | March 2025

Our biggest product feature drop ever: 27 updates in a single month (a Vellum record!)

Sharon Toh

Apr 4, 20255 min min read

Guides

How to evaluate your AI product if you don’t have ground truth data

Ground truths help build confidence, but they shouldn’t block progress.

Aaron Levin

Mar 28, 20255 min min read

Customer Stories

How DeepScribe Builds Clinician Trust by Iterating on AI Feedback 40% Faster

Learn how DeepScribe uses Vellum to refine AI, act on feedback, and build clinician trust.

Mar 20, 2025

Guides

Automating PR Reviews for Dummies

Time to see if I’ve automated myself out of a job.

Pei Li

Mar 19, 2025

Customer Stories

How Drata built an enterprise-grade AI solution with Vellum

See how Drata leveraged Vellum to build enterprise-grade AI workflows that enhance GRC automation.

Anita Kirkovska

Mar 18, 20258 min min read

Product Updates

Vellum Product Update | February 2025

This month we improved how you find models, preview Workflows SDK code, and more!

Sharon Toh

Mar 7, 20252 min min read

Product Updates

Native integration with IBM’s Granite models

Support for IBM granite models in Vellum.

Anita Kirkovska

Mar 1, 2025

Model Comparisons

GPT-4.5 vs Claude 3.7 Sonnet

Comparing GPT-4.5 and Claude 3.7 Sonnet on cost, speed, SAT math equations, and adaptive reasoning skills.

Anita Kirkovska

Feb 28, 2025

Guides

GPT 4.5 is here: Better, but not the best

Feels more natural, hallucinates less, can be persuaded—and it’s not a game-changer.

Anita Kirkovska

Feb 27, 2025

Model Comparisons

Claude 3.7 Sonnet vs OpenAI o1 vs DeepSeek R1

Learn how the latest Anthropic's model compares to similar top-tier reasoning models on the market.

Anita Kirkovska

Feb 25, 20258 min min read

Customer Stories

How RelyHealth Deploys Healthcare AI Solutions 100x Faster

Learn how Vellum enables Rely Health to rapidly build, test, and deploy AI-powered patient care solutions.

Anita Kirkovska

Feb 20, 20258 min min read

LLM basics

What is Agentic RAG?

Discover how combining agents with RAG can make your AI workflows more context-aware, and proactive.

Liz Acosta

Feb 19, 20258 min min read

Product Updates

Vellum Product Update | January 2025

Vellum 2025: Workflows SDK Beta, self-serve org setup, and new model support!

Sharon Toh

Feb 11, 20254 min min read

Customer Stories

How Revamp Reliably Runs 15M+ LLM Executions in Production

Learn how to optimize prompt versioning, debug efficiently, and make real-time updates to boost AI performance.

Anita Kirkovska

Feb 10, 20255 min min read

Guides

Claude 3.7 Sonnet: Can It Actually Reason?

Evaluating the 'thinking' of Claude 3.7 Sonnet and other reasoning models to understand how they really reason.

Anita Kirkovska

Jan 30, 20255 min min read

Model Comparisons

Analysis: OpenAI o1 vs DeepSeek R1

Explore how O1 and R1 perform on well-known reasoning puzzles—now tested in new contexts.

Anita Kirkovska

Jan 30, 20255 min min read

Guides

Breaking down the DeepSeek-R1 training process—no PhD required

Learn how DeepSeek achieved OpenAI o1-level reasoning with pure RL and solved issues through multi-stage training.

Anita Kirkovska

Jan 24, 202510 min min read

Product Updates

Vellum Product Update | December 2024

Unwrap Vellum's latest features: optional inputs, error handling, JSON indexing!

Sharon Toh

Jan 9, 20253 min min read

Product Updates

Capture User Feedback for AI Testing

Capture and use end-user feedback as ground truth data to improve your AI system’s accuracy.

Noa Flaherty

Jan 1, 20253 min min read

Guides

AI Model Scaling Isn’t Over—It’s Entering a New Era

Explore the fundamentals of neural scaling laws and discover the next frontier in AI model development.

Dec 27, 20247 min min read

Model Comparisons

Analysis: OpenAI o1 vs GPT-4o vs Claude 3.5 Sonnet

Learn how OpenAI o1 compares to GPT-4o and Sonnet 3.5 on speed, math, reasoning and classification tasks.

Dec 17, 2024

Guides

What to do when an LLM request fails

Rate limiting and downtime are common issues with LLMs — here’s how to manage it in production.

Anita Kirkovska

Dec 16, 20245 min min read

Model Comparisons

Llama 3.3 70b vs GPT-4o

Learn how the latest model from Meta, Llama 3.3 70b compares to GPT-4o on three tasks

Anita Kirkovska

Dec 10, 202410 min min read

Product Updates

Native support for SambaNova inference in Vellum

Now you can run Llama 3.1 405b, with 200 t/s via SambaNova on Vellum!

Anita Kirkovska

Dec 9, 20242 min min read

Product Updates

Vellum Product Update | November 2024

Something special is coming, plus new models and quality of life improvements

Dec 2, 20243 min min read

Product Updates

Introducing Subworkflows (tools) for modular, reusable AI logic

Learn how to build modular, reusable, and version-controlled tools (subworkflows) to keep your workflows efficient.

Noa Flaherty

Nov 27, 20245 min min read

LLM basics

AI Development Survey: Help us build the ultimate AI changelog

Share your AI process in our 4-minute anonymous survey. Get early insights and a chance to win a MacBook M4 Pro.

Anita Kirkovska

Nov 25, 20242 min min read

Guides

Synthetic Test Case Generation for LLM Evaluation

Easily test your AI workflows with Vellum—generate tons of test cases automatically and catch those tricky edge case

Nico Finelli

Nov 20, 20244min min read

Product Updates

Running Arbitrary Code in Workflows & Evals

Write and execute Python or TypeScript directly in your workflow

Noa Flaherty

Nov 13, 20243 min min read

Product Updates

Introducing Vellum Tracing and Graph view

New debugging features for AI workflows to get visibility down to every decision and detail

Noa Flaherty

Nov 4, 20244 min min read

Product Updates

Vellum Product Update | October 2024

Workflow execution timeline revamp, higher performance for evals, improved Map node debugging and more

Noa Flaherty

Nov 1, 20245 min min read

Product Updates

Announcing Native Support for Cerebras Inference in Vellum

Starting today, you can unlock 2,100 t/s with Llama 3.1 70B in Vellum for real-time AI apps.

Anita Kirkovska

Oct 24, 20244 min min read

Guides

Reintroducing Vellum for 2025

We’re simplifying the complex world of AI development for teams of all sizes.

Akash Sharma

Oct 10, 20245 min min read

Guides

Cursor AI is god tier

I’d Pay $2,000 Out of My Own Pocket to Keep Using Cursor - The tab + context is next level.

Oct 1, 20244 min min read

Product Updates

Vellum Product Update | September 2024

Workflow execution timeline revamp, higher performance for evals, improved Map node debugging and more

Noa Flaherty

Oct 1, 2024

Customer Stories

How Glowing Personalized Hospitality Experiences with AI

Discover how Glowing leverages Vellum's Workflows to create innovative AI solutions for the hospitality industry.

Anita Kirkovska

Oct 1, 2024

Guides

LLM Evaluation: Key Metrics and Strategies for Every Use Case

Learn how to use guardrails, online/offline evaluation metrics for various LLM use-cases.

Akash Sharma

Sep 17, 2024

Guides

OpenAI o1: Prompting Tips, Limitations, and Capabilities

Learn how to prompt OpenAI o1 models, understand their limits and the opportunities ahead.

Anita Kirkovska

Sep 13, 2024

Guides

Tutorial: How to Convert Any PDF to CSV?

Learn how to use Vellum to convert any PDF into CSV: Examples with invoice, restaurant menu and product spec.

Aaron Levin

Sep 12, 2024

Guides

LLM Benchmarks: Overview, Limits and Model Comparison

Understand the latest benchmarks, their limitations, and how models compare.

Anita Kirkovska

Sep 11, 2024

Product Updates

Vellum Product Update | August

More control with workflow replays, cost and latency tracking, and new Workflow Editor UI

Noa Flaherty

Sep 10, 2024

Customer Stories

How Woflow Decoupled AI Updates for 50% Faster Delivery — Without the Infra Stress

Learn how Woflow sped up AI development by 50% — making it easier to handle errors, improve models and ship updates.

Anita Kirkovska

Sep 10, 2024

Guides

When should I use function calling, structured outputs or JSON mode?

Learn how and when to JSON mode, structured outputs and function calling for your AI application.

Akash Sharma

Sep 6, 2024

Guides

GPT-5: What should we expect?

Learn more about the expected GPT-5 features on improved reasoning, multimodality and accuracy on math & coding

Aug 30, 2024

Guides

How I Built an AI-Powered SlackBot for Customer Support

Learn how to build an AI-powered Slackbot that can answer customer queries in real-time.

Aaron Levin

Aug 30, 2024

Customer Stories

How this EdTech Company Made AI Development 10x Faster with Vellum

Explore how a leading EdTech company saves 50 eng hours per month and empowers everyone on the team to contribute.

Anita Kirkovska

Aug 28, 2024

Guides

Announcing Vellum VPC

Vellum now offers VPC installations for secure AI development in your cloud, keeping data private and compliant.

Akash Sharma

Aug 27, 2024

Guides

The 6 Stages for Successful AI Implementation

Learn critical strategies to build and launch AI systems quickly and reliably.

Anita Kirkovska

Aug 20, 2024

Customer Stories

How Vellum Helped Odyseek Build Smarter AI Faster

Learn how Odyseek used Vellum to simplify AI development and improve team collaboration.

Anita Kirkovska

Aug 16, 2024

Product Updates

Vellum Product Update | July 2024

Learn about the latest features and improvements shipped by the Vellum team in July.

Noa Flaherty

Aug 6, 2024

Guides

GraphRAG: Improving RAG with Knowledge Graphs

Learn how combining knowledge graphs with vector stores can make your AI applications more accurate and reliable.

Aug 2, 2024

Model Comparisons

Llama 3.1 405b vs Leading Closed-Source Models

Discover How Llama 3.1 405b Stacks Up Against GPT-4o, Gemini 1.5 Pro, and Claude 3.5 Sonnet on Three Tasks

Anita Kirkovska

Jul 26, 2024

Model Comparisons

Evaluation: Llama 3.1 70B vs. Comparable Closed-Source Models

Explore Llama 3.1 70b's upgrades and see how it stacks up against same-tier closed-source models.

Anita Kirkovska

Jul 24, 2024

Model Comparisons

GPT-4o Mini v/s Claude 3 Haiku v/s GPT-3.5 Turbo: A Comparison

A comparison between the latest low cost, low latency models

Jul 19, 2024

Product Updates

Vellum Product Update | June 2024

Learn more about the latest updates at Vellum: Map Nodes, Inline Subworkflows, API updates and more

Noa Flaherty

Jul 9, 2024

Guides

How to Optimize Long Prompts with Corpus-In-Context Prompting

Learn how to enhance long-context prompts with corpus-in-context prompting and discover the best use-cases.

Mathew Pregasen

Jul 8, 2024

Model Comparisons

Claude 3.5 Sonnet vs GPT-4o

Learn how Claude 3.5 Sonnet compares to GPT4o on data extraction, classification and verbal reasoning tasks.

Anita Kirkovska

Jun 25, 2024

Guides

Announcing an Investment from InvestInData

Read InvestInData's guest post on their decision to invest in Vellum.

InvestInData

Jun 11, 2024

Product Updates

Vellum Product Update | May 2024

Run Workflows from Node, evaluate function call outputs, Guardrail nodes, RAGAS metrics, image support & more.

Noa Flaherty

Jun 6, 2024

Guides

What is Required for a Reliable AI System?

Learn the key strategies and tools for building production-ready AI systems.

Akash Sharma

Jun 4, 2024

Guides

Building an AI Agent for SEO Research and Content Generation

Learn to build an Agent that analyzes keywords, generates articles, and refines content to meet criteria.

May 29, 2024

Guides

The FaceMash of Prompt Evaluation

How can I make my prompts better if I don't know the latest prompt engineering techniques?

Pei Li

May 22, 2024

Model Comparisons

Analysis: GPT-4o vs GPT-4 Turbo

Learn how GPT4o compares to GPT-4 Turbo on classification, reasoning and data extraction tasks.

May 14, 2024

Model Comparisons

Llama 3 70B vs GPT-4: Comparison Analysis

Find out how Llama 3 70B stacks up against GPT-4 in terms of cost, speed, and performance on specific tasks.

Anita Kirkovska

May 8, 2024

Product Updates

Vellum Product Update | April 2024

Prompt editor, prompt blocks, reusable evaluation metrics, new models, and more.

Noa Flaherty

May 2, 2024

Customer Stories

Rentgrata's Test Driven Journey to a Production-Ready Chatbot

Learn how Rentgrata used Vellum to evaluate their chatbot, and cut development time in half.

Anita Kirkovska

May 2, 2024

Guides

LlamaIndex vs LangChain Comparison

Discover what are the main differences between LangChain and LlamaIndex, and when to use them.

Anita Kirkovska

May 1, 2024

Guides

RAG vs Fine-Tuning: How to Choose the Right Technique?

Learn how RAG compares to fine-tuning and the impact of both model techniques on LLM performance.

Anita Kirkovska

Apr 30, 2024

Guides

Tutorial: Setting Up OpenAI Function Calling with Chat Models

Learn how to use OpenAI function calling in your AI apps to enable reliable, structured outputs.

Anita Kirkovska

Apr 23, 2024

Customer Stories

How Drata Collaborates on AI Workflows with Vellum

Learn how Drata used Vellum to quickly validate AI ideas, and speed up AI development.

Akash Sharma

Apr 16, 2024

Customer Stories

How Autobound Achieved a 20x Faster End-to-End LLM Iteration Cycle

Iterating on prompts using OpenAI's playground & Azure AI studio was challenging, until Autobound discovered Vellum.

Anita Kirkovska

Apr 11, 2024

Customer Stories

Redfin's Test Driven Development Approach to Building an AI Virtual Assistant

Discover how Redfin used Vellum to develop and evaluate a production-ready AI assistant, now live in 14 markets.

Anita Kirkovska

Apr 9, 2024

Model Comparisons

Claude 3 Opus vs GPT-4: Task Specific Analysis

Explore Opus and GPT4's performance in tasks like summarization, graph interpretation, math, coding, and more.

Apr 8, 2024

Product Updates

Vellum Product Update | March 2024

Subworkflow nodes, image support in the UI, error nodes, node mocking, workflow graphs and so much more.

Noa Flaherty

Apr 2, 2024

Customer Stories

Suggestic's Collaborative Journey to Faster AI Development

Learn how Vellum is helping their team to iterate faster and build reliable AI Assistants for health and wellness.

Apr 1, 2024

Guides

How to Count Tokens Before you Send an OpenAI API Request

Learn how to use Tiktoken and Vellum to programmatically count tokens before running OpenAI API requests.

Anita Kirkovska

Mar 27, 2024

Guides

Getting Started with Prompt Chaining

Learn how to improve LLM outputs, and make your setup more reliable using prompt chaining.

Anita Kirkovska

Mar 26, 2024

Guides

RAG vs Long Context?

Will long context replace RAG? An analysis of the pros and cons of both approaches.

Mar 20, 2024

Product Updates

Vellum Product Update | February 2024

SOC 2 Type 2 Compliant, Prompt Node retries, Evaluation reports, Custom release tags, Cloning workflow nodes & more.

Noa Flaherty

Mar 8, 2024

Guides

How to Evaluate Your RAG System?

Learn how to use retrieval and content generation metrics to consistently evaluate and improve your RAG system.

Anita Kirkovska

Mar 8, 2024

Product Updates

Introducing Vellum Evaluations

Enhanced prompt comparison, more metrics, flexibility, and new reports for effective LLM evaluation.

Noa Flaherty

Feb 29, 2024

Guides

How can I get GPT-3.5 Turbo to follow instructions like GPT-4?

Learn prompt engineering tips on how to make GPT-3.5 perform as good as GPT-4.

Anita Kirkovska

Feb 15, 2024

Guides

How Should I Manage Memory for my LLM Chatbot?

Tips to most effectively use memory for your LLM chatbot.

Akash Sharma

Feb 14, 2024

Customer Stories

How Lavender cut latency by half for 90K monthly requests in production

Learn how Lavender develops and manages more than 20 LLM features in production.

Anita Kirkovska

Feb 13, 2024

Customer Stories

Vellum Product Update | January 2024

January: Folders, tracking usage, better collaboration, more OpenAI controls, image support.

Noa Flaherty

Feb 2, 2024

Guides

Prompt Engineering Guide for Claude Models

Learn how to prompt Claude with these 11 prompt engineering tips.

Anita Kirkovska

Feb 2, 2024

Customer Stories

How Codingscape improved time-to-market for their AI apps

Learn how Vellum helped Codingscape to ship AI apps quicker and win more projects.

Anita Kirkovska

Feb 1, 2024

Guides

The Four Pillars of Building LLM Applications for Production

Learn how successful companies develop reliable AI products by following a proven approach.

Akash Sharma

Jan 29, 2024

Guides

How can I use LLMs to classify user intents for my chatbot?

Learn how to build and evaluate intent handler logic in your chatbot workflow

Anita Kirkovska

Jan 11, 2024

Product Updates

Introducing New Execute Prompt APIs

Introducing a new way to invoke your Vellum stored prompts!

David Vargas

Jan 4, 2024

Guides

3 Strategies to Reduce LLM Hallucinations

Methods and techniques to reduce hallucinations and maintain more reliable LLMs in production.

Anita Kirkovska

Jan 3, 2024

Guides

Four LLM hallucinations and ways to fix them

What is LLM hallucination & the four most common hallucination types and the causes for them

Anita Kirkovska

Jan 1, 2024

Product Updates

Vellum Product Update | December 2023

December: fine-grained control over your prompt release process, powerful new APIs for executing Prompts, and more

Noa Flaherty

Dec 31, 2023

Guides

Classifying Customer Tickets using Gemini Pro

Comparing the performance of Gemini Pro with zero and few shot prompting when classifying customer support tickets

Anita Kirkovska

Dec 20, 2023

Model Comparisons

Best Model for Text Classification: Gemini Pro, GPT-4 or Claude2?

Comparing GPT3.5 Turbo, GPT-4 Turbo, Claude, and Gemini Pro on classifying customer support tickets.

Anita Kirkovska

Dec 13, 2023

Guides

Tree of Thought Prompting: What It Is and How to Use It

Learn how to use Tree of Thought prompting to improve LLM results

Anita Kirkovska

Nov 30, 2023

Product Updates

Vellum Product Update | November 2023

November: major Test Suite improvements, arbitrary code execution, and new models!

Noa Flaherty

Nov 30, 2023

Guides

User Confidence in OpenAI vs. Alternative models/providers

Discover how recent OpenAI developments have influenced user confidence and interest in OpenAI alternatives

Anita Kirkovska

Nov 28, 2023

Guides

Setting Up an OpenAI Model on Microsoft Azure

Step-by-step instructions for configuring OpenAI on Azure

Noa Flaherty

Nov 20, 2023

Guides

First impressions with the Assistants API

Assistants API: Easy assistant setup with memory management - but what's under the hood?

Anita Kirkovska

Nov 16, 2023

Guides

The ABC’s of Multimodal AI: Models, tasks and use-cases

How to use Multimodal AI models to build apps that solve new tasks and offer unique experiences for end users.

Anita Kirkovska

Nov 6, 2023

Guides

Automatic data labeling with LLMs

LLMs can label data at the same or better quality compared to human annotators, but ~20x faster and ~7x cheaper.

Anita Kirkovska

Nov 2, 2023

Product Updates

Vellum Product Update | October 2023

October: universal LLM support, new Test Suite metrics, and performance

Noa Flaherty

Oct 31, 2023

Customer Stories

How Narya's team uses Vellum for auto data labeling & deployments

Learn how Vellum helped Narya.AI save time and make AI easy for everyone on their team.

Anita Kirkovska

Oct 25, 2023

Customer Stories

Miri: Collaboratively building a chatbot in production with Vellum

How Miri built a powerful chat experience using Vellum's platform

Akash Sharma

Oct 13, 2023

Product Updates

Vellum Product Update | September 2023

September is full of enhancements to Workflows, Security, Support, and more!

Noa Flaherty

Oct 2, 2023

Guides

Why is collaborating on Prompt Engineering so difficult?

Collaborating with colleagues to test prompts yields good results but it's challenging.

Akash Sharma

Sep 27, 2023

Product Updates

Vellum Product Update | August 2023

August brings the introduction of Vellum Workflows, Metadata Filtering in Search, and a new design

Noa Flaherty

Sep 5, 2023

Guides

Should I use Prompting, RAG or Fine-tuning?

Rag vs Fine-Tuning vs Prompt Engineering: Learn how to pick which one is the best option for your use-case.

Akash Sharma

Aug 31, 2023

Model Comparisons

OpenAI v/s Anthropic v/s Google: A latency comparison

We did an analysis comparing the latency of OpenAI, Anthropic and Google. Here are the results!

Akash Sharma

Aug 24, 2023

Product Updates

Introducing Vellum Workflows

Vellum Workflows help you quickly prototype, deploy, and manage complex chains of LLM calls

Noa Flaherty

Aug 15, 2023

Customer Stories

How Left Field Labs was able to prototype fast, and improve collaboration

Learn how Left Field Labs used Vellum for LLM prompt versioning, evaluation and monitoring once in production.

Akash Sharma

Aug 9, 2023

Guides

How we cut model costs by >90% by swapping LoRA weights dynamically

Dynamically swapping LoRA weights can significantly lower costs of a fine tuned model

Sidd Seethepalli

Aug 3, 2023

Product Updates

Vellum Product Update | July 2023

We've continued to build our platform more, here's a look at the latest from us and a sneak peak of what's coming!

Noa Flaherty

Jul 27, 2023

Guides

Fine-tuning open source models: why is it relevant now?

Why fine tuning is now relevant with open source models

Akash Sharma

Jul 20, 2023

Product Updates

Announcing our seed round

We've raised $5m to double down on our mission to help companies build production use cases of LLMs

Akash Sharma

Jul 13, 2023

Customer Stories

Encore increased eng productivity 3x when working with LLMs

If you’re versioning in Jupyter notebooks or Google Docs, running custom scripts for testing, you need to read this

Akash Sharma

Jul 6, 2023

Guides

My prompt is in production: now what should I do?

Tips on how to monitor your in-production LLM traffic

Akash Sharma

Jun 19, 2023

Product Updates

Vellum Product Update | June 14th, 2023

We've shipped a lot of features recently, here's a look at the latest updates from us!

Noa Flaherty

Jun 14, 2023

Guides

Testing LLM applications features - before & after production

Tips to experiment with your LLM related prompts

Akash Sharma

Jun 12, 2023

Product Updates

Vellum <> LlamaIndex Integration

Details about how to best leverage the Vellum <> LlamaIndex integration

Akash Sharma

Jun 5, 2023

Product Updates

Introducing Vellum Test Suites

Use Vellum Test Suites to test the quality of prompts in bulk before production. Unit testing for LLMs is here!

Noa Flaherty

May 17, 2023

Product Updates

Our thoughts on working with Google's LLM: PaLM

Compare model quality across OpenAI's GPT-4, Anthropic's Claude and now Google's PaLM LLM in our platform

Akash Sharma

May 10, 2023

Product Updates

Introducing Vellum Search

Vellum Search, the latest addition to our platform helps companies use proprietary data in LLM applications

Noa Flaherty

Apr 12, 2023

Guides

Great (and not so great) use cases of Large Language Models

Despite high potential, LLMs are not a one-size-fits all solution. Choosing the right use case for LLMs is important

Akash Sharma

Feb 27, 2023

Guides

When to use fine-tuning?

Fine-tuning can provide significant benefits in cost, quality & latency when compared to prompting

Akash Sharma

Feb 7, 2023

Product Updates

Announcing Vellum

We’re excited to publicly announce the start of our new adventure: Vellum

Akash Sharma

Feb 2, 2023

The Latest And Greatest in Gen AI Learnings

Introducing Vellum for Agents

Vellum Product Update | December

GPT-5.2 Benchmarks

How we use coding agents to 2x engineering output

AI Voice Agent Platforms Guide

AI transformation playbook

The Top LangChain Alternatives in 2026

The 2026 Guide to AI Agent Workflows

Top 12 AI Workflow Platforms

How to Manage OpenAI Rate Limits as You Scale Your App?

Understanding Logprobs: What They Are and How to Use Them

100 Must-Know AI Facts and Statistics for 2025

The Six Levels of Agentic Behavior

Document Data Extraction in 2026: LLMs vs OCRs

GPT-5 Benchmarks

Best practices for building AI multi agent system

How to write effective prompts for GPT-5

15 Best n8n Alternatives: Reviewed & Compared

Top 13 AI Agent Builder Platforms for Enterprises

Google's AP2: A new protocol for AI agent payments

Top low‑code AI workflow automation tools

The Best AI Agent Frameworks For Developers

Top Low-code AI Agent Platforms for Product Managers

Complete Guide to No Code AI Workflow Automation Tools

The Best AI Workflow Builders for Automating Business Processes

The Top Enterprise AI Automation Platforms (Guide)

Beginners Guide to Building AI Agents

Google Gemini 3 Benchmarks

Gumloop vs. n8n vs. Vellum (Platform Comparison)

Claude Opus 4.5 Benchmarks

Vellum Product Update | November

Gumloop Alternatives (Reviewed & Explained)

Flagship Model Report: Gpt-5.1 vs Gemini 3 Pro vs Claude Opus 4.5

AI Agent Use Cases Guide to Unlock AI ROI

Vellum Product Update | October

I’m done building AI agents

A Guide to LLM Observability

A practical guide to AI automation

OpenAI's Agent Builder Explained

Vellum Product Update | September

Introducing AI Apps: A new interface to interact with AI workflows

Zero-Shot vs Few-Shot prompting: A Guide with Examples

Chain of Thought Prompting (CoT): Everything you need to know

Build AI Products Faster: Top Development Platforms Compared

Built-In Tool Calling for Complex Agent Workflows

MCP UI & The Future of Agentic Commerce

Understanding your agent’s behavior in production

We don’t speak JSON

Top 12 AI Workflow Platforms

How Coursemojo Sped Up AI Delivery by 6+ Months

How Marveri enabled lawyers to shape AI products without blocking developers

How can agentic capabilities be deployed in production today?

How to continuously improve your AI Assistant using Vellum

The ultimate LLM agent build guide

Vellum Product Update | August

Vellum Product Update | July

Partnering with Composio to Help You Build Better AI Agents

OpenAI o3 vs gpt-oss 120b

How to craft effective prompts

Subliminal Learning in LLMs

Introducing Vellum Agent Builder

Introducing Environments in Vellum: Isolate, Promote, and Deploy with Confidence

Introducing Custom Docker Images & Custom Nodes

Why ‘Context Engineering’ is the New Frontier for AI Agents

Vellum Workflows SDK is Generally Available

Announcing our $20m Series A

Vellum Product Update | May & June

Big Ideas from the AI Engineer World’s Fair

10 Humanloop Alternatives in 2025

GenAI vs LLM: The Basics, Differences, and Best Uses

​​How GravityStack Cut Credit Agreement Review Time by 200% with Agentic AI

How the Best Product and Engineering Teams Ship AI Solutions

Evaluation: Claude 4 Sonnet vs OpenAI o4-mini vs Gemini 2.5 Pro

How to connect a Vellum AI Workflow with your Lovable app

Vellum Product Update | April 2025

How to evaluate an LLM evaluation framework

How does MCP work

Evaluating models on adaptive reasoning, SAT questions & real-world classification tasks

Four Reasons Enterprise AI Projects Get Stuck

How GravityStack Cut Credit Agreement Review Time by 200% with Agentic AI