How to Upgrade Your Payload Website with RAG and Vector Search

Use Upstash + OpenAI embeddings and Payload Native Jobs to add RAG, semantic search, and context-aware chatbots to…

·Updated on:·Matija Žiberna·
How to Upgrade Your Payload Website with RAG and Vector Search

📚 Comprehensive Payload CMS Guides

Detailed Payload guides with field configuration examples, custom components, and workflow optimization tips to speed up your CMS development process.

No spam. Unsubscribe anytime.

Context: This guide assumes you have a running Payload CMS 3.0 project.

Imagine if your CMS didn't just store content, but actually understood it. By integrating a Vector Store (Upstash) with Payload, you unlock Chatbots, RAG (Retrieval Augmented Generation), and Semantic Search.

But there is a trap: AI operations are slow. Generating embeddings and syncing to Upstash can take 2-3 seconds—too long for a user to wait when saving a post.

This guide shows you how to implement a Background Job pipeline to sync your content asynchronously using Payload's native Jobs queue.


0. Prerequisites

Before writing code, we need to set up our environment.

Install Dependencies

npm install @upstash/vector openai

Environment Variables

Add these to your .env file:

# Get keys from https://console.upstash.com/vector
UPSTASH_VECTOR_REST_URL="https://your-index-url.upstash.io"
UPSTASH_VECTOR_REST_TOKEN="your-token"

# Get key from https://platform.openai.com/
OPENAI_API_KEY="sk-..."

Create the Upstash Index

CRITICAL: When creating your index in the Upstash Console, you MUST set the dimensions to 1024 to match OpenAI's text-embedding-3-small model config we will use.

  • Metric: Cosine (recommended)
  • Dimensions: 1024

1. Vector Infrastructure

Let's verify the basics first.

File: src/lib/vector/client.ts

import { Index } from '@upstash/vector'

if (!process.env.UPSTASH_VECTOR_REST_URL || !process.env.UPSTASH_VECTOR_REST_TOKEN) {
  throw new Error('Missing Upstash Vector env vars')
}

export const vectorIndex = new Index({
  url: process.env.UPSTASH_VECTOR_REST_URL,
  token: process.env.UPSTASH_VECTOR_REST_TOKEN,
})

File: src/lib/vector/embedding.ts

import OpenAI from 'openai'

const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY })

export async function generateEmbedding(text: string): Promise<number[]> {
  const sanitizedText = text.replace(/\n/g, ' ')
  // IMPORTANT: dimensions must match your Upstash index
  const response = await openai.embeddings.create({
    model: 'text-embedding-3-small',
    input: sanitizedText,
    dimensions: 1024, 
  })
  return response.data[0].embedding
}

2. The Logic: Operations

We need a function to handle the actual embedding logic. This is the code that will run inside our job.

File: src/lib/vector/operations.ts

import { vectorIndex } from './client'
import { generateEmbedding } from './embedding'

export async function embedDocument({ id, collection, text }: { id: string, collection: string, text: string }) {
  const embedding = await generateEmbedding(text)
  
  await vectorIndex.upsert([{
    id: `${collection}-${id}`,
    vector: embedding,
    metadata: {
      docId: id,
      collection,
      // Add other metadata here
    }
  }])
  
  console.log(`[Vector] Synced ${collection}/${id}`)
}

3. The Job: Upsert Handler

Now we create the Payload Task Handler. This runs in the background.

Task Input Interface: First, define what data we pass to the job.

export interface VectorUpsertInput {
  docId: string
  collection: string
}

File: src/payload/jobs/vector/upsert.ts

import type { TaskHandler } from 'payload'
import { embedDocument } from '@/lib/vector/operations'

export interface VectorUpsertInput {
  docId: string
  collection: string
}

export const vectorUpsertHandler: TaskHandler<VectorUpsertInput> = async ({ input, req }) => {
  const { docId, collection } = input

  req.payload.logger.info(`[Job] Starting vector sync for ${collection}/${docId}`)

  try {
    const doc = await req.payload.findByID({ collection, id: docId })

    if (doc._status && doc._status !== 'published') {
      return { output: { message: 'Skipped: Not published' } }
    }

    // Extract text content based on collection
    // Note: For richText fields, you'd want a lexicalToMarkdown utility here
    const content = (doc as any).content || (doc as any).description || ''
    
    if (!content) return { output: { message: 'Skipped: No content' } }

    await embedDocument({
      id: docId.toString(),
      collection,
      text: typeof content === 'string' ? content : JSON.stringify(content) 
    })

    return { output: { message: 'Success' } }
  } catch (error) {
    req.payload.logger.error(`[Job] Failed: ${error.message}`)
    throw error // Trigger retry
  }
}

4. Register the Job

Tell Payload about the job.

File: payload.config.ts

import { vectorUpsertHandler } from '@/payload/jobs/vector/upsert'

export default buildConfig({
  // ...
  jobs: {
    // Determine who can manually trigger jobs via API (if needed)
    access: { run: ({ req }) => !!req.user }, 
    tasks: [
      {
        slug: 'vector-upsert',
        handler: vectorUpsertHandler,
        retries: 3,
      },
    ],
  },
})

5. The Trigger: Collection Hook

Attach a hook to your collections to verify the publishing state and dispatch the job.

File: src/payload/hooks/syncToVectorStore.ts

import type { CollectionAfterChangeHook } from 'payload'

export const syncToVectorStoreAfterChange: CollectionAfterChangeHook = async ({
  doc,
  req,
  collection,
}) => {
  if (doc._status !== 'published') return doc

  await req.payload.jobs.queue({
    task: 'vector-upsert',
    input: {
      docId: doc.id,
      collection: collection.slug,
    },
  })

  return doc
}

CRITICAL STEP: Attach to Collection You must add this hook to every collection you want indexed!

File: src/collections/Posts.ts

import { syncToVectorStoreAfterChange } from '@/payload/hooks/syncToVectorStore'

export const Posts: CollectionConfig = {
  slug: 'posts',
  hooks: {
    afterChange: [syncToVectorStoreAfterChange], // <--- Add this!
  },
  // ...
}

6. Running the Jobs

Defining the job isn't enough; something needs to run it.

Local Development

In a separate terminal window, run:

npx payload jobs:run

This starts a long-running process that polls the payload-jobs collection.

Production (Vercel/Serverless)

Since you don't have a long-running server, usage Vercel Cron or an external cron service to poke Payload's job endpoint.

  1. Enable Vercel Cron.
  2. Payload automatically configured the endpoint at /api/payload-jobs/run.
  3. Ensure your vercel.json calls this endpoint periodically.

Summary

  1. Dependencies: Installed @upstash/vector & openai.
  2. Config: Created Index (1024 dims) & .env.
  3. Code: Added client, embedding, operations, and upsert job handler.
  4. Registration: Registered Job in payload.config.ts.
  5. Trigger: Added hook to Posts collection.
  6. Runner: Started npx payload jobs:run.

Now, when you publish a post, Payload queues the task, your worker picks it up, and your Vector Store stays perfect in sync—users never wait.

0

Frequently Asked Questions

Comments

Leave a Comment

Your email will not be published

10-2000 characters

• Comments are automatically approved and will appear immediately

• Your name and email will be saved for future comments

• Be respectful and constructive in your feedback

• No spam, self-promotion, or off-topic content

Matija Žiberna
Matija Žiberna
Full-stack developer, co-founder

I'm Matija Žiberna, a self-taught full-stack developer and co-founder passionate about building products, writing clean code, and figuring out how to turn ideas into businesses. I write about web development with Next.js, lessons from entrepreneurship, and the journey of learning by doing. My goal is to provide value through code—whether it's through tools, content, or real-world software.

You might be interested in