- How to Upgrade Your Payload Website with RAG and Vector Search
How to Upgrade Your Payload Website with RAG and Vector Search
Use Upstash + OpenAI embeddings and Payload Native Jobs to add RAG, semantic search, and context-aware chatbots to…

Need Help Making the Switch?
Moving to Next.js and Payload CMS? I offer advisory support on an hourly basis.
Book Hourly AdvisoryRelated Posts:
Context: This guide assumes you have a running Payload CMS 3.0 project.
Imagine if your CMS didn't just store content, but actually understood it. By integrating a Vector Store (Upstash) with Payload, you unlock Chatbots, RAG (Retrieval Augmented Generation), and Semantic Search.
But there is a trap: AI operations are slow. Generating embeddings and syncing to Upstash can take 2-3 seconds—too long for a user to wait when saving a post.
This guide shows you how to implement a Background Job pipeline to sync your content asynchronously using Payload's native Jobs queue.
0. Prerequisites
Before writing code, we need to set up our environment.
Install Dependencies
npm install @upstash/vector openai
Environment Variables
Add these to your .env file:
# Get keys from https://console.upstash.com/vector
UPSTASH_VECTOR_REST_URL="https://your-index-url.upstash.io"
UPSTASH_VECTOR_REST_TOKEN="your-token"
# Get key from https://platform.openai.com/
OPENAI_API_KEY="sk-..."
Create the Upstash Index
CRITICAL: When creating your index in the Upstash Console, you MUST set the dimensions to 1024 to match OpenAI's text-embedding-3-small model config we will use.
- Metric: Cosine (recommended)
- Dimensions: 1024
1. Vector Infrastructure
Let's verify the basics first.
File: src/lib/vector/client.ts
import { Index } from '@upstash/vector'
if (!process.env.UPSTASH_VECTOR_REST_URL || !process.env.UPSTASH_VECTOR_REST_TOKEN) {
throw new Error('Missing Upstash Vector env vars')
}
export const vectorIndex = new Index({
url: process.env.UPSTASH_VECTOR_REST_URL,
token: process.env.UPSTASH_VECTOR_REST_TOKEN,
})
File: src/lib/vector/embedding.ts
import OpenAI from 'openai'
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY })
export async function generateEmbedding(text: string): Promise<number[]> {
const sanitizedText = text.replace(/\n/g, ' ')
// IMPORTANT: dimensions must match your Upstash index
const response = await openai.embeddings.create({
model: 'text-embedding-3-small',
input: sanitizedText,
dimensions: 1024,
})
return response.data[0].embedding
}
2. The Logic: Operations
We need a function to handle the actual embedding logic. This is the code that will run inside our job.
File: src/lib/vector/operations.ts
import { vectorIndex } from './client'
import { generateEmbedding } from './embedding'
export async function embedDocument({ id, collection, text }: { id: string, collection: string, text: string }) {
const embedding = await generateEmbedding(text)
await vectorIndex.upsert([{
id: `${collection}-${id}`,
vector: embedding,
metadata: {
docId: id,
collection,
// Add other metadata here
}
}])
console.log(`[Vector] Synced ${collection}/${id}`)
}
3. The Job: Upsert Handler
Now we create the Payload Task Handler. This runs in the background.
Task Input Interface: First, define what data we pass to the job.
export interface VectorUpsertInput {
docId: string
collection: string
}
File: src/payload/jobs/vector/upsert.ts
import type { TaskHandler } from 'payload'
import { embedDocument } from '@/lib/vector/operations'
export interface VectorUpsertInput {
docId: string
collection: string
}
export const vectorUpsertHandler: TaskHandler<VectorUpsertInput> = async ({ input, req }) => {
const { docId, collection } = input
req.payload.logger.info(`[Job] Starting vector sync for ${collection}/${docId}`)
try {
const doc = await req.payload.findByID({ collection, id: docId })
if (doc._status && doc._status !== 'published') {
return { output: { message: 'Skipped: Not published' } }
}
// Extract text content based on collection
// Note: For richText fields, you'd want a lexicalToMarkdown utility here
const content = (doc as any).content || (doc as any).description || ''
if (!content) return { output: { message: 'Skipped: No content' } }
await embedDocument({
id: docId.toString(),
collection,
text: typeof content === 'string' ? content : JSON.stringify(content)
})
return { output: { message: 'Success' } }
} catch (error) {
req.payload.logger.error(`[Job] Failed: ${error.message}`)
throw error // Trigger retry
}
}
4. Register the Job
Tell Payload about the job.
File: payload.config.ts
import { vectorUpsertHandler } from '@/payload/jobs/vector/upsert'
export default buildConfig({
// ...
jobs: {
// Determine who can manually trigger jobs via API (if needed)
access: { run: ({ req }) => !!req.user },
tasks: [
{
slug: 'vector-upsert',
handler: vectorUpsertHandler,
retries: 3,
},
],
},
})
5. The Trigger: Collection Hook
Attach a hook to your collections to verify the publishing state and dispatch the job.
File: src/payload/hooks/syncToVectorStore.ts
import type { CollectionAfterChangeHook } from 'payload'
export const syncToVectorStoreAfterChange: CollectionAfterChangeHook = async ({
doc,
req,
collection,
}) => {
if (doc._status !== 'published') return doc
await req.payload.jobs.queue({
task: 'vector-upsert',
input: {
docId: doc.id,
collection: collection.slug,
},
})
return doc
}
CRITICAL STEP: Attach to Collection You must add this hook to every collection you want indexed!
File: src/collections/Posts.ts
import { syncToVectorStoreAfterChange } from '@/payload/hooks/syncToVectorStore'
export const Posts: CollectionConfig = {
slug: 'posts',
hooks: {
afterChange: [syncToVectorStoreAfterChange], // <--- Add this!
},
// ...
}
6. Running the Jobs
Defining the job isn't enough; something needs to run it.
Local Development
In a separate terminal window, run:
npx payload jobs:run
This starts a long-running process that polls the payload-jobs collection.
Production (Vercel/Serverless)
Since you don't have a long-running server, usage Vercel Cron or an external cron service to poke Payload's job endpoint.
- Enable Vercel Cron.
- Payload automatically configured the endpoint at
/api/payload-jobs/run. - Ensure your
vercel.jsoncalls this endpoint periodically.
Summary
- Dependencies: Installed
@upstash/vector&openai. - Config: Created Index (1024 dims) &
.env. - Code: Added
client,embedding,operations, andupsertjob handler. - Registration: Registered Job in
payload.config.ts. - Trigger: Added hook to
Postscollection. - Runner: Started
npx payload jobs:run.
Now, when you publish a post, Payload queues the task, your worker picks it up, and your Vector Store stays perfect in sync—users never wait.
📚 Comprehensive Payload CMS Guides
Detailed Payload guides with field configuration examples, custom components, and workflow optimization tips to speed up your CMS development process.
Frequently Asked Questions
Comments
No comments yet
Be the first to share your thoughts on this post!



