How to Auto-Sync Your CMS Content to OpenAI Vector Store with Webhooks

Build a production-ready webhook system that syncs blog posts to OpenAI for AI-powered search

·Matija Žiberna·
How to Auto-Sync Your CMS Content to OpenAI Vector Store with Webhooks

⚡ Next.js Implementation Guides

In-depth Next.js guides covering App Router, RSC, ISR, and deployment. Get code examples, optimization checklists, and prompts to accelerate development.

No spam. Unsubscribe anytime.

After building an AI chat interface with ChatKit (see my previous guide on ChatKit integration), I needed to make my blog content searchable through the chat. The challenge? Getting CMS content into OpenAI's Vector Store automatically when I publish, without manual exports or batch uploads.

I built a webhook-based system that syncs posts the moment I hit publish. This guide shows the complete implementation using Sanity CMS as the example, but the webhook pattern works with any headless CMS that supports webhooks—Payload, Strapi, Contentful, you name it.

By the end, you'll have a production-ready sync system with author control, status tracking, and proper error handling. Your content will be ready for semantic search through ChatKit or any OpenAI Assistant.

The Problem: Manual Syncing Doesn't Scale

When you build an AI-powered search feature, OpenAI needs your content in its Vector Store. The typical workflow looks like this:

  1. Write a blog post in your CMS
  2. Publish it
  3. Export the content manually
  4. Upload to OpenAI's dashboard or via API
  5. Repeat for every new post or update

This breaks down immediately. You forget to upload posts. Your search results become stale. The AI doesn't know about your latest content.

What we need is automation: publish in the CMS, content automatically appears in the vector store. That's what webhooks enable.

The Solution: Webhook-Triggered Sync

The architecture is straightforward. Your CMS fires a webhook when content is published. Your Next.js API route catches it, formats the content, uploads to OpenAI, and updates status fields back in the CMS.

Here's the flow:

  1. Author publishes post in CMS (with "sync to vector store" checkbox enabled)
  2. CMS webhook fires to your API endpoint
  3. Background process fetches full content, formats as markdown with metadata
  4. Uploads to OpenAI Files API, then adds to Vector Store
  5. Updates sync status in CMS (pending → synced or failed)

The entire webhook response takes under 100ms. The actual upload happens in the background, so publishing feels instant to authors.

This guide uses Sanity CMS, but the core pattern is CMS-agnostic. The vector store service I'll show you works with any content source. Only the webhook handler and status tracking need CMS-specific code.

What You'll Build

By the end of this guide, you'll have:

  • A reusable vector store service (CMS-agnostic)
  • Webhook handler for post-publish events
  • Author-controlled sync via CMS checkbox
  • Real-time status tracking (pending, synced, failed)
  • Production-ready error handling
  • Local testing with development mode

The example uses Sanity, but I'll point out which parts are CMS-specific so you can adapt it to your setup.

Prerequisites

Before starting, make sure you have:

  • A Next.js project (App Router)
  • A headless CMS with webhook support (this guide uses Sanity)
  • An OpenAI account with API access
  • An OpenAI Vector Store created (I'll show you how)
  • Node.js 18+ and TypeScript

You'll also need the openai npm package installed:

npm install openai
# or
pnpm add openai

Step 1: Create Your OpenAI Vector Store

First, you need a vector store to upload content to. Head to the OpenAI platform and create one via API:

curl https://api.openai.com/v1/vector_stores \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -H "OpenAI-Beta: assistants=v2" \
  -d '{
    "name": "Blog Content Store"
  }'

Save the id from the response—you'll need it for environment variables. It looks like vs_abc123xyz.

The vector store is where OpenAI will index your content for semantic search. When users ask questions through ChatKit or an Assistant, OpenAI searches this store and retrieves relevant chunks from your blog posts.

Step 2: Set Up Environment Variables

Add these to your .env.local file:

# OpenAI Configuration
OPENAI_API_KEY=your-openai-api-key
OPENAI_VECTOR_STORE_ID=vs_abc123xyz

# Your CMS API credentials (example with Sanity)
SANITY_API_TOKEN=your-sanity-editor-token
NEXT_PUBLIC_SANITY_PROJECT_ID=your-project-id
NEXT_PUBLIC_SANITY_DATASET=production
SANITY_WEBHOOK_SECRET=your-webhook-secret

# Application
NEXT_PUBLIC_BASE_URL=https://yourdomain.com
NODE_ENV=development

The OpenAI key needs access to Files and Vector Stores. For your CMS token, you'll need write permissions—the webhook updates status fields after syncing.

The NODE_ENV=development setting lets you test webhooks locally without signature verification. In production, this automatically enables security checks.

Step 3: Build the Vector Store Service

This is the CMS-agnostic core. It handles formatting content and uploading to OpenAI. Create this file:

// File: src/lib/openai/vector-store-service.ts

import OpenAI from 'openai'
import { toFile } from 'openai/uploads'

const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
})

const VECTOR_STORE_ID = process.env.OPENAI_VECTOR_STORE_ID!

export interface BlogPostContent {
  title: string
  slug: string
  subtitle?: string
  excerpt?: string
  author: string
  publishedAt: string
  categories: string[]
  markdownContent: string
  url: string
}

function formatBlogPostForVectorStore(post: BlogPostContent): string {
  const parts: string[] = []

  // Add metadata header
  parts.push('---')
  parts.push(`Title: ${post.title}`)
  if (post.subtitle) parts.push(`Subtitle: ${post.subtitle}`)
  if (post.excerpt) parts.push(`Excerpt: ${post.excerpt}`)
  parts.push(`Author: ${post.author}`)
  parts.push(`Published: ${new Date(post.publishedAt).toLocaleDateString('en-US', {
    year: 'numeric',
    month: 'long',
    day: 'numeric'
  })}`)
  if (post.categories.length > 0) {
    parts.push(`Categories: ${post.categories.join(', ')}`)
  }
  parts.push(`URL: ${post.url}`)
  parts.push('---')
  parts.push('')

  // Add main content
  parts.push(`# ${post.title}`)
  if (post.subtitle) {
    parts.push('')
    parts.push(`*${post.subtitle}*`)
  }
  parts.push('')
  parts.push(post.markdownContent)

  return parts.join('\n')
}

export async function uploadBlogPostToVectorStore(
  post: BlogPostContent
): Promise<{ fileId: string; success: boolean; error?: string }> {
  try {
    console.log(`[VectorStore] Starting upload for post: ${post.slug}`)

    // Format content as markdown with metadata
    const formattedContent = formatBlogPostForVectorStore(post)

    // Convert to file buffer
    const fileBuffer = Buffer.from(formattedContent, 'utf-8')
    const fileName = `${post.slug}.md`

    // Create file object using OpenAI's toFile helper
    const file = await toFile(fileBuffer, fileName, {
      type: 'text/markdown',
    })

    console.log(`[VectorStore] Uploading file: ${fileName}`)

    // Step 1: Upload file to OpenAI
    const uploadedFile = await openai.files.create({
      file: file,
      purpose: 'assistants',
    })

    console.log(`[VectorStore] File uploaded with ID: ${uploadedFile.id}`)

    // Step 2: Add file to vector store via REST API
    // Note: SDK v6.2.0 doesn't expose vectorStores, so we use fetch
    const vectorStoreResponse = await fetch(
      `https://api.openai.com/v1/vector_stores/${VECTOR_STORE_ID}/files`,
      {
        method: 'POST',
        headers: {
          'Authorization': `Bearer ${process.env.OPENAI_API_KEY}`,
          'Content-Type': 'application/json',
          'OpenAI-Beta': 'assistants=v2',
        },
        body: JSON.stringify({
          file_id: uploadedFile.id,
        }),
      }
    )

    if (!vectorStoreResponse.ok) {
      const errorData = await vectorStoreResponse.json()
      throw new Error(`Failed to add file to vector store: ${JSON.stringify(errorData)}`)
    }

    console.log(`[VectorStore] ✅ File added to vector store: ${VECTOR_STORE_ID}`)

    return {
      fileId: uploadedFile.id,
      success: true,
    }
  } catch (error) {
    console.error(`[VectorStore] Error uploading post ${post.slug}:`, error)
    return {
      fileId: '',
      success: false,
      error: error instanceof Error ? error.message : 'Unknown error occurred',
    }
  }
}

This service does three things: formats your content as markdown with metadata, uploads to OpenAI's Files API, then adds it to the vector store.

The formatting matters. OpenAI's vector store performs better when content has structure. The metadata header lets you include context like author, publish date, and categories without cluttering the main text. When users search, OpenAI can reference "this post by Matija from January 2025 about Next.js."

The BlogPostContent interface is generic—title, content, metadata. Any CMS can populate this. That's why this service works regardless of your content source.

One quirk: the OpenAI Node.js SDK (version 6.2.0) doesn't expose vector store methods, so we use fetch() for the second API call. It's a known limitation. The SDK handles file uploads fine; we just need the REST API to add files to the store.

Step 4: Add CMS Fields for Sync Control

Now we add fields to your CMS so authors can control syncing and see status. This part is CMS-specific, but the pattern applies everywhere: checkbox to enable sync, fields to track status.

Here's the Sanity implementation:

// File: src/lib/sanity/schemaTypes/postType.ts

import { defineField } from 'sanity'

// Add these fields to your existing post schema
defineField({
  name: 'includeInVectorStore',
  title: 'Include in Vector Store',
  type: 'boolean',
  description: 'Enable this to sync this post to OpenAI vector store for AI search',
  initialValue: false,
}),
defineField({
  name: 'vectorStoreFileId',
  title: 'Vector Store File ID',
  type: 'string',
  description: 'OpenAI file ID (auto-populated)',
  readOnly: true,
  hidden: true,
}),
defineField({
  name: 'vectorStoreSyncedAt',
  title: 'Last Synced',
  type: 'datetime',
  description: 'When this post was last synced',
  readOnly: true,
  hidden: ({ document }) => !document?.vectorStoreFileId,
}),
defineField({
  name: 'vectorStoreSyncStatus',
  title: 'Sync Status',
  type: 'string',
  options: {
    list: [
      { title: 'Not Synced', value: 'not_synced' },
      { title: 'Pending', value: 'pending' },
      { title: 'Synced', value: 'synced' },
      { title: 'Failed', value: 'failed' },
    ],
  },
  readOnly: true,
  hidden: ({ document }) => !document?.includeInVectorStore,
  initialValue: 'not_synced',
}),
defineField({
  name: 'vectorStoreSyncError',
  title: 'Sync Error',
  type: 'text',
  description: 'Error message if sync failed',
  readOnly: true,
  hidden: ({ document }) => document?.vectorStoreSyncStatus !== 'failed',
}),

The checkbox (includeInVectorStore) is always visible. Authors decide which posts to sync. The status fields appear only when relevant—no clutter.

When an author publishes with the checkbox enabled, the webhook sees it and triggers the sync. The status goes from not_synced to pending to synced (or failed if something breaks). Authors can refresh and see progress.

For other CMS platforms:

  • Payload CMS: Add these as fields in your collection config with admin.readOnly and admin.condition for visibility
  • Strapi: Create these fields in your content type builder, use lifecycle hooks for read-only enforcement
  • Contentful: Add fields to your content model, use UI extensions for conditional visibility

The pattern is universal: one writable boolean, several read-only status fields.

Step 5: Create the Webhook Handler

The webhook handler is where CMS events trigger actions. Your CMS calls this endpoint when a post is published. The handler validates the request, checks if syncing is enabled, and kicks off the background upload.

Here's the Next.js App Router implementation:

// File: src/app/api/revalidate/route.ts

import { NextRequest, NextResponse } from 'next/server'
import { createClient } from '@sanity/client'
import { uploadBlogPostToVectorStore } from '@/lib/openai/vector-store-service'

// Sanity client with write permissions
const sanityClient = createClient({
  projectId: process.env.NEXT_PUBLIC_SANITY_PROJECT_ID!,
  dataset: process.env.NEXT_PUBLIC_SANITY_DATASET!,
  apiVersion: '2024-01-01',
  token: process.env.SANITY_API_TOKEN,
  useCdn: false,
})

async function syncToVectorStore(postId: string, slug: string | undefined) {
  try {
    console.log(`[VectorStore] Syncing post: ${postId}`)

    // Update status to pending
    await sanityClient
      .patch(postId)
      .set({
        vectorStoreSyncStatus: 'pending',
        vectorStoreSyncError: null,
      })
      .commit()

    // Fetch full post data from CMS
    const post = await sanityClient.fetch(
      `*[_id == $postId][0]{
        title,
        "slug": slug.current,
        subtitle,
        excerpt,
        "author": author->name,
        publishedAt,
        "categories": categories[]->title,
        markdownContent,
      }`,
      { postId }
    )

    if (!post) {
      throw new Error('Post not found')
    }

    if (!post.markdownContent) {
      throw new Error('Post has no markdown content')
    }

    // Prepare content for vector store
    const blogPostContent = {
      title: post.title,
      slug: post.slug,
      subtitle: post.subtitle,
      excerpt: post.excerpt,
      author: post.author || 'Unknown Author',
      publishedAt: post.publishedAt,
      categories: post.categories || [],
      markdownContent: post.markdownContent,
      url: `${process.env.NEXT_PUBLIC_BASE_URL}/blog/${post.slug}`,
    }

    console.log(`[VectorStore] Uploading to OpenAI...`)

    // Upload to vector store
    const result = await uploadBlogPostToVectorStore(blogPostContent)

    if (result.success) {
      // Update CMS with success
      await sanityClient
        .patch(postId)
        .set({
          vectorStoreFileId: result.fileId,
          vectorStoreSyncStatus: 'synced',
          vectorStoreSyncedAt: new Date().toISOString(),
          vectorStoreSyncError: null,
        })
        .commit()

      console.log(`[VectorStore] ✅ Successfully synced: ${post.slug}`)
    } else {
      // Update CMS with failure
      await sanityClient
        .patch(postId)
        .set({
          vectorStoreSyncStatus: 'failed',
          vectorStoreSyncError: result.error || 'Unknown error',
        })
        .commit()

      console.error(`[VectorStore] ❌ Failed to sync: ${post.slug}`)
    }
  } catch (error) {
    console.error(`[VectorStore] Sync error:`, error)

    // Update CMS with error
    try {
      await sanityClient
        .patch(postId)
        .set({
          vectorStoreSyncStatus: 'failed',
          vectorStoreSyncError: error instanceof Error ? error.message : 'Unknown error',
        })
        .commit()
    } catch (updateError) {
      console.error('[VectorStore] Failed to update error status:', updateError)
    }
  }
}

export async function POST(request: NextRequest) {
  try {
    const body = await request.json()
    const { _type, slug, _id, includeInVectorStore } = body

    console.log('Webhook received:', { _type, slug: slug?.current, _id })

    if (_type === 'post') {
      // Handle vector store sync (non-blocking)
      if (includeInVectorStore === true) {
        console.log(`[VectorStore] Post marked for sync: ${slug?.current}`)

        // Trigger async sync (don't await - keep webhook fast)
        syncToVectorStore(_id, slug?.current)
          .catch(error => {
            console.error(`[VectorStore] Background sync failed:`, error)
          })
      }

      return NextResponse.json({
        revalidated: true,
        vectorStoreSync: includeInVectorStore ? 'initiated' : 'skipped',
        timestamp: new Date().toISOString()
      })
    }

    return NextResponse.json({
      message: 'Webhook processed',
      timestamp: new Date().toISOString()
    })
  } catch (error) {
    console.error('Webhook error:', error)
    return NextResponse.json(
      { error: 'Failed to process webhook' },
      { status: 500 }
    )
  }
}

The key design choice here is the fire-and-forget pattern. The webhook returns immediately—usually under 100ms. The actual sync happens in the background via syncToVectorStore() without await. This keeps publishing fast for authors. They don't wait for OpenAI's upload to complete.

The sync function updates status three times: first to pending when it starts, then to synced or failed when it finishes. Authors can refresh the post in the CMS and see real-time progress.

For CMS integration, the Sanity-specific parts are the sanityClient calls. Here's how you'd adapt this to other platforms:

Payload CMS:

// Instead of sanityClient.patch()
await payload.update({
  collection: 'posts',
  id: postId,
  data: {
    vectorStoreSyncStatus: 'pending',
    vectorStorySyncError: null,
  }
})

Strapi:

// Using Strapi's SDK
await strapi.entityService.update('api::post.post', postId, {
  data: {
    vectorStoreSyncStatus: 'pending',
    vectorStorySyncError: null,
  }
})

The pattern is identical: receive webhook, validate, fetch content, upload to OpenAI, update status. Only the CMS client calls change.

Step 6: Configure Your CMS Webhook

Your CMS needs to know where to send publish events. The setup varies by platform, but the concept is universal: tell your CMS to POST to your webhook URL when content is published.

For Sanity, go to your project settings in the Sanity dashboard:

  1. Navigate to APIWebhooks
  2. Click "Create webhook"
  3. Configure:
    • URL: https://yourdomain.com/api/revalidate
    • Dataset: Your dataset (usually production)
    • Trigger on: Create, Update, Delete
    • HTTP method: POST
    • Secret: Generate a secure string (save in SANITY_WEBHOOK_SECRET)

The secret is important for production. It lets you verify webhooks are actually from your CMS, not random POST requests. In development, we skip verification with NODE_ENV=development for easier testing.

For other platforms:

Payload CMS: Use hooks in your collection config:

hooks: {
  afterChange: [
    async ({ doc, req, operation }) => {
      if (operation === 'create' || operation === 'update') {
        // Trigger your webhook endpoint
        await fetch(`${process.env.APP_URL}/api/revalidate`, {
          method: 'POST',
          headers: { 'Content-Type': 'application/json' },
          body: JSON.stringify(doc)
        })
      }
    }
  ]
}

Strapi: Create a lifecycle hook in src/api/post/content-types/post/lifecycles.js:

module.exports = {
  async afterCreate(event) {
    await fetch(`${process.env.APP_URL}/api/revalidate`, {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify(event.result)
    })
  }
}

The webhook URL stays the same. Your Next.js handler doesn't care which CMS sent the data—it just needs the post ID and the includeInVectorStore flag.

Step 7: Test Locally

Before deploying, test the entire flow locally. Start your development server:

pnpm dev

Use cURL to simulate a webhook from your CMS:

curl -X POST http://localhost:3000/api/revalidate \
  -H "Content-Type: application/json" \
  -d '{
    "_type": "post",
    "_id": "test-post-id",
    "slug": { "current": "test-post" },
    "includeInVectorStore": true
  }'

Watch your terminal. You should see logs like:

Webhook received: { _type: 'post', slug: 'test-post', _id: 'test-post-id' }
[VectorStore] Post marked for sync: test-post
[VectorStore] Syncing post: test-post-id
[VectorStore] Starting upload for post: test-post
[VectorStore] Uploading file: test-post.md
[VectorStore] File uploaded with ID: file-abc123
[VectorStore] ✅ File added to vector store: vs_abc123
[VectorStore] ✅ Successfully synced: test-post

Check your OpenAI dashboard at https://platform.openai.com/storage/vector_stores. You should see the new file in your vector store with a .md extension.

If you get errors, the most common issues are:

  • "Invalid vector_store_id: undefined": OPENAI_VECTOR_STORE_ID not set in .env.local
  • "Insufficient permissions": Your CMS API token needs write permissions
  • "Post has no markdown content": Make sure your test post has actual content

The local test proves the entire pipeline works: webhook → fetch content → format → upload → update status. Once this works, deployment is straightforward.

Step 8: Deploy to Production

Deploy your Next.js app to your hosting platform (Vercel, Railway, etc.). Make sure all environment variables are set in your production environment—especially OPENAI_API_KEY and OPENAI_VECTOR_STORE_ID.

One critical difference in production: webhook signature verification. You'll want to validate that webhooks actually come from your CMS. Here's how to add that to your webhook handler:

// File: src/app/api/revalidate/route.ts

import { isValidSignature, SIGNATURE_HEADER_NAME } from '@sanity/webhook'

const WEBHOOK_SECRET = process.env.SANITY_WEBHOOK_SECRET

export async function POST(request: NextRequest) {
  try {
    const body = await request.text()

    // Verify signature in production
    if (WEBHOOK_SECRET && process.env.NODE_ENV !== 'development') {
      const signature = request.headers.get(SIGNATURE_HEADER_NAME)

      if (!signature) {
        return NextResponse.json(
          { error: 'Missing signature' },
          { status: 401 }
        )
      }

      const isValid = await isValidSignature(body, signature, WEBHOOK_SECRET)
      if (!isValid) {
        return NextResponse.json(
          { error: 'Invalid signature' },
          { status: 401 }
        )
      }
    }

    const data = JSON.parse(body)
    // ... rest of your handler
  } catch (error) {
    // ... error handling
  }
}

The process.env.NODE_ENV !== 'development' check lets you skip verification locally while requiring it in production. This protects your webhook from unauthorized requests.

For non-Sanity platforms, signature verification works similarly. Most CMS platforms sign webhooks with HMAC. Check your CMS documentation for the specific header name and verification method.

Testing in Production

Once deployed, test with a real post in your CMS:

  1. Create a new blog post with markdown content
  2. Enable the "Include in Vector Store" checkbox
  3. Publish the post
  4. Wait 10-20 seconds (OpenAI needs time to process)
  5. Refresh the post in your CMS
  6. Check that "Sync Status" shows "Synced"

If the status shows "Failed," check your deployment logs for error messages. The vectorStoreSyncError field in your CMS will show what went wrong.

Verify the file uploaded by checking your OpenAI dashboard. The file should appear in your vector store with the post's slug as the filename (e.g., test-post.md).

Using Your Synced Content with ChatKit

Now that your posts are in the vector store, you can search them through ChatKit or any OpenAI Assistant. This is where the webhook automation pays off—every new post is immediately searchable.

If you followed my previous guide on ChatKit integration, you already have the chat interface set up. To enable searching your blog content, update your ChatKit configuration to use your vector store:

// File: src/lib/chatkit-config.ts

export const chatkitConfig = {
  assistantId: process.env.OPENAI_ASSISTANT_ID!,
  vectorStoreIds: [process.env.OPENAI_VECTOR_STORE_ID!],
  // ... other config
}

When users ask questions like "How do I optimize React performance?" or "What's your take on server components?", ChatKit searches your vector store and pulls relevant excerpts from your blog posts. The AI can cite specific articles and provide context.

The metadata we added (author, date, categories) helps the AI provide better context. Instead of just "here's some information," it can say "according to your January 2025 article about Next.js..."

What We Built

You now have a production-ready system that automatically syncs CMS content to OpenAI's Vector Store. Authors control which posts to sync via checkbox. Status updates happen in real-time. Error handling ensures failures are visible and debuggable.

The core pattern—webhook triggers upload, background processing, status tracking—works with any headless CMS that supports webhooks. I showed Sanity as the example, but the vector store service is completely CMS-agnostic. The webhook handler needs CMS-specific client code, but the structure stays the same.

Your content is now searchable through AI. As you publish new posts, they automatically appear in the vector store within seconds. No manual exports, no batch uploads, no stale search results.

This setup scales. Whether you publish once a week or ten times a day, the automation handles it. Authors don't think about syncing—they just write and publish.

If you're building AI-powered search or chat interfaces for content, this webhook pattern is the foundation. Combine it with ChatKit (see my previous guide) and you have a complete AI-powered knowledge base.

Next Steps

This implementation handles new posts being published. You might want to extend it with:

  • Update support: Re-sync posts when edited
  • Delete support: Remove posts from vector store when unpublished
  • Batch sync: Upload all existing posts at once
  • Search interface: Build a dedicated search UI beyond chat

I'm considering writing guides for these extensions. Let me know in the comments what you'd like to see next.

Using Payload CMS instead of Sanity? I'd love to adapt this guide for Payload. Drop a comment if that would be useful for your project.

Thanks for reading! Subscribe for more practical guides on building with AI APIs and modern content workflows.

Thanks, Matija

0

Comments

Leave a Comment

Your email will not be published

10-2000 characters

• Comments are automatically approved and will appear immediately

• Your name and email will be saved for future comments

• Be respectful and constructive in your feedback

• No spam, self-promotion, or off-topic content

Matija Žiberna
Matija Žiberna
Full-stack developer, co-founder

I'm Matija Žiberna, a self-taught full-stack developer and co-founder passionate about building products, writing clean code, and figuring out how to turn ideas into businesses. I write about web development with Next.js, lessons from entrepreneurship, and the journey of learning by doing. My goal is to provide value through code—whether it's through tools, content, or real-world software.