- How to Upgrade Your Payload Website with RAG and Vector Search
How to Upgrade Your Payload Website with RAG and Vector Search
Use Upstash + OpenAI embeddings and Payload Native Jobs to add RAG, semantic search, and context-aware chatbots to…

Need Help Making the Switch?
Moving to Next.js and Payload CMS? I offer advisory support on an hourly basis.
Book Hourly AdvisoryRelated Posts:
Context: This guide assumes you have a running Payload CMS 3.0 project.
Imagine if your CMS didn't just store content, but actually understood it. By integrating a Vector Store (Upstash) with Payload, you unlock Chatbots, RAG (Retrieval Augmented Generation), and Semantic Search.
But there is a trap: AI operations are slow. Generating embeddings and syncing to Upstash can take 2-3 seconds—too long for a user to wait when saving a post.
This guide shows you how to implement a Background Job pipeline to sync your content asynchronously using Payload's native Jobs queue.
0. Prerequisites
Before writing code, we need to set up our environment.
Install Dependencies
npm install @upstash/vector openai
Environment Variables
Add these to your .env file:
# Get keys from https://console.upstash.com/vector
UPSTASH_VECTOR_REST_URL="https://your-index-url.upstash.io"
UPSTASH_VECTOR_REST_TOKEN="your-token"
# Get key from https://platform.openai.com/
OPENAI_API_KEY="sk-..."
Create the Upstash Index
CRITICAL: When creating your index in the Upstash Console, you MUST set the dimensions to 1024 to match OpenAI's text-embedding-3-small model config we will use.
- Metric: Cosine (recommended)
- Dimensions: 1024
1. Vector Infrastructure
Let's verify the basics first.
File: src/lib/vector/client.ts
import { Index } from '@upstash/vector'
if (!process.env.UPSTASH_VECTOR_REST_URL || !process.env.UPSTASH_VECTOR_REST_TOKEN) {
throw new Error('Missing Upstash Vector env vars')
}
export const vectorIndex = new Index({
url: process.env.UPSTASH_VECTOR_REST_URL,
token: process.env.UPSTASH_VECTOR_REST_TOKEN,
})
File: src/lib/vector/embedding.ts
import OpenAI from 'openai'
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY })
export async function generateEmbedding(text: string): Promise<number[]> {
const sanitizedText = text.replace(/\n/g, ' ')
// IMPORTANT: dimensions must match your Upstash index
const response = await openai.embeddings.create({
model: 'text-embedding-3-small',
input: sanitizedText,
dimensions: 1024,
})
return response.data[0].embedding
}
2. The Logic: Operations
We need a function to handle the actual embedding logic. This is the code that will run inside our job.
File: src/lib/vector/operations.ts
import { vectorIndex } from './client'
import { generateEmbedding } from './embedding'
export async function embedDocument({ id, collection, text }: { id: string, collection: string, text: string }) {
const embedding = await generateEmbedding(text)
await vectorIndex.upsert([{
id: `${collection}-${id}`,
vector: embedding,
metadata: {
docId: id,
collection,
// Add other metadata here
}
}])
console.log(`[Vector] Synced ${collection}/${id}`)
}
3. The Job: Upsert Handler
Now we create the Payload Task Handler. This runs in the background.
Task Input Interface: First, define what data we pass to the job.
export interface VectorUpsertInput {
docId: string
collection: string
}
File: src/payload/jobs/vector/upsert.ts
import type { TaskHandler } from 'payload'
import { embedDocument } from '@/lib/vector/operations'
export interface VectorUpsertInput {
docId: string
collection: string
}
export const vectorUpsertHandler: TaskHandler<VectorUpsertInput> = async ({ input, req }) => {
const { docId, collection } = input
req.payload.logger.info(`[Job] Starting vector sync for ${collection}/${docId}`)
try {
const doc = await req.payload.findByID({ collection, id: docId })
if (doc._status && doc._status !== 'published') {
return { output: { message: 'Skipped: Not published' } }
}
// Extract text content based on collection
// Note: For richText fields, you'd want a lexicalToMarkdown utility here
const content = (doc as any).content || (doc as any).description || ''
if (!content) return { output: { message: 'Skipped: No content' } }
await embedDocument({
id: docId.toString(),
collection,
text: typeof content === 'string' ? content : JSON.stringify(content)
})
return { output: { message: 'Success' } }
} catch (error) {
req.payload.logger.error(`[Job] Failed: ${error.message}`)
throw error // Trigger retry
}
}
4. Register the Job
Tell Payload about the job.
File: payload.config.ts
import { vectorUpsertHandler } from '@/payload/jobs/vector/upsert'
export default buildConfig({
// ...
jobs: {
// Determine who can manually trigger jobs via API (if needed)
access: { run: ({ req }) => !!req.user },
tasks: [
{
slug: 'vector-upsert',
handler: vectorUpsertHandler,
retries: 3,
},
],
},
})
5. The Trigger: Collection Hook
Attach a hook to your collections to verify the publishing state and dispatch the job.
File: src/payload/hooks/syncToVectorStore.ts
import type { CollectionAfterChangeHook } from 'payload'
export const syncToVectorStoreAfterChange: CollectionAfterChangeHook = async ({
doc,
req,
collection,
}) => {
if (doc._status !== 'published') return doc
await req.payload.jobs.queue({
task: 'vector-upsert',
input: {
docId: doc.id,
collection: collection.slug,
},
})
return doc
}
CRITICAL STEP: Attach to Collection You must add this hook to every collection you want indexed!
File: src/collections/Posts.ts
import { syncToVectorStoreAfterChange } from '@/payload/hooks/syncToVectorStore'
export const Posts: CollectionConfig = {
slug: 'posts',
hooks: {
afterChange: [syncToVectorStoreAfterChange], // <--- Add this!
},
// ...
}
6. Running the Jobs
Defining the job isn't enough; something needs to run it.
Local Development
In a separate terminal window, run:
npx payload jobs:run
This starts a long-running process that polls the payload-jobs collection.
Production (Vercel/Serverless)
Since you don't have a long-running server, usage Vercel Cron or an external cron service to poke Payload's job endpoint.
- Enable Vercel Cron.
- Payload automatically configured the endpoint at
/api/payload-jobs/run. - Ensure your
vercel.jsoncalls this endpoint periodically.
Summary
- Dependencies: Installed
@upstash/vector&openai. - Config: Created Index (1024 dims) &
.env. - Code: Added
client,embedding,operations, andupsertjob handler. - Registration: Registered Job in
payload.config.ts. - Trigger: Added hook to
Postscollection. - Runner: Started
npx payload jobs:run.
Now, when you publish a post, Payload queues the task, your worker picks it up, and your Vector Store stays perfect in sync—users never wait.
📚 Comprehensive Payload CMS Guides
Detailed Payload guides with field configuration examples, custom components, and workflow optimization tips to speed up your CMS development process.



