Context : This guide assumes you have a running Payload CMS 3.0 project.
Imagine if your CMS didn't just store content, but actually understood it. By integrating a Vector Store (Upstash) with Payload, you unlock Chatbots, RAG (Retrieval Augmented Generation), and Semantic Search.
But there is a trap: AI operations are slow. Generating embeddings and syncing to Upstash can take 2-3 seconds—too long for a user to wait when saving a post.
This guide shows you how to implement a Background Job pipeline to sync your content asynchronously using Payload's native Jobs queue.
0. Prerequisites
Before writing code, we need to set up our environment.
Install Dependencies
npm install @upstash/vector openai
Environment Variables
Add these to your .env file:
UPSTASH_VECTOR_REST_URL="https://your-index-url.upstash.io"
UPSTASH_VECTOR_REST_TOKEN="your-token"
OPENAI_API_KEY="sk-..."
Create the Upstash Index
CRITICAL: When creating your index in the Upstash Console, you MUST set the dimensions to 1024 to match OpenAI's text-embedding-3-small model config we will use.
Metric : Cosine (recommended)
Dimensions : 1024
1. Vector Infrastructure
Let's verify the basics first.
File: src/lib/vector/client.ts
import { Index } from '@upstash/vector'
if (!process.env .UPSTASH_VECTOR_REST_URL || !process.env .UPSTASH_VECTOR_REST_TOKEN ) {
throw new Error ('Missing Upstash Vector env vars' )
}
export const vectorIndex = new Index ({
url : process.env .UPSTASH_VECTOR_REST_URL ,
token : process.env .UPSTASH_VECTOR_REST_TOKEN ,
})
File: src/lib/vector/embedding.ts
import OpenAI from 'openai'
const openai = new OpenAI ({ apiKey : process.env .OPENAI_API_KEY })
export async function generateEmbedding (text : string ): Promise <number []> {
const sanitizedText = text.replace (/\n/g , ' ' )
const response = await openai.embeddings .create ({
model : 'text-embedding-3-small' ,
input : sanitizedText,
dimensions : 1024 ,
})
return response.data [0 ].embedding
}
2. The Logic: Operations
We need a function to handle the actual embedding logic. This is the code that will run inside our job.
File: src/lib/vector/operations.ts
import { vectorIndex } from './client'
import { generateEmbedding } from './embedding'
export async function embedDocument ({ id, collection, text }: { id: string , collection: string , text: string } ) {
const embedding = await generateEmbedding (text)
await vectorIndex.upsert ([{
id : `${collection} -${id} ` ,
vector : embedding,
metadata : {
docId : id,
collection,
}
}])
console .log (`[Vector] Synced ${collection} /${id} ` )
}
3. The Job: Upsert Handler
Now we create the Payload Task Handler. This runs in the background.
Task Input Interface: First, define what data we pass to the job.
export interface VectorUpsertInput {
docId : string
collection : string
}
File: src/payload/jobs/vector/upsert.ts
import type { TaskHandler } from 'payload'
import { embedDocument } from '@/lib/vector/operations'
export interface VectorUpsertInput {
docId : string
collection : string
}
export const vectorUpsertHandler : TaskHandler <VectorUpsertInput > = async ({ input, req }) => {
const { docId, collection } = input
req.payload .logger .info (`[Job] Starting vector sync for ${collection} /${docId} ` )
try {
const doc = await req.payload .findByID ({ collection, id : docId })
if (doc._status && doc._status !== 'published' ) {
return { output : { message : 'Skipped: Not published' } }
}
const content = (doc as any ).content || (doc as any ).description || ''
if (!content) return { output : { message : 'Skipped: No content' } }
await embedDocument ({
id : docId.toString (),
collection,
text : typeof content === 'string' ? content : JSON .stringify (content)
})
return { output : { message : 'Success' } }
} catch (error) {
req.payload .logger .error (`[Job] Failed: ${error.message} ` )
throw error
}
}
4. Register the Job
Tell Payload about the job.
File: payload.config.ts
import { vectorUpsertHandler } from '@/payload/jobs/vector/upsert'
export default buildConfig ({
jobs : {
access : { run : ({ req } ) => !!req.user },
tasks : [
{
slug : 'vector-upsert' ,
handler : vectorUpsertHandler,
retries : 3 ,
},
],
},
})
5. The Trigger: Collection Hook
Attach a hook to your collections to verify the publishing state and dispatch the job.
File: src/payload/hooks/syncToVectorStore.ts
import type { CollectionAfterChangeHook } from 'payload'
export const syncToVectorStoreAfterChange : CollectionAfterChangeHook = async ({
doc,
req,
collection,
}) => {
if (doc._status !== 'published' ) return doc
await req.payload .jobs .queue ({
task : 'vector-upsert' ,
input : {
docId : doc.id ,
collection : collection.slug ,
},
})
return doc
}
CRITICAL STEP: Attach to Collection
You must add this hook to every collection you want indexed!
File: src/collections/Posts.ts
import { syncToVectorStoreAfterChange } from '@/payload/hooks/syncToVectorStore'
export const Posts : CollectionConfig = {
slug : 'posts' ,
hooks : {
afterChange : [syncToVectorStoreAfterChange],
},
}
6. Running the Jobs
Defining the job isn't enough; something needs to run it.
Local Development
In a separate terminal window, run:
This starts a long-running process that polls the payload-jobs collection.
Production (Vercel/Serverless)
Since you don't have a long-running server, usage Vercel Cron or an external cron service to poke Payload's job endpoint.
Enable Vercel Cron.
Payload automatically configured the endpoint at /api/payload-jobs/run.
Ensure your vercel.json calls this endpoint periodically.
Summary
Dependencies : Installed @upstash/vector & openai.
Config : Created Index (1024 dims) & .env.
Code : Added client, embedding, operations, and upsert job handler.
Registration : Registered Job in payload.config.ts.
Trigger : Added hook to Posts collection.
Runner : Started npx payload jobs:run.
Now, when you publish a post, Payload queues the task, your worker picks it up, and your Vector Store stays perfect in sync—users never wait.