BuildWithMatija
  1. Home
  2. Blog
  3. Docker
  4. Self-Host a Qdrant Vector Store for Semantic Blog Search (TypeScript + Docker)

Self-Host a Qdrant Vector Store for Semantic Blog Search (TypeScript + Docker)

TypeScript + Docker walkthrough to deploy Qdrant on a VPS, generate OpenAI embeddings, and enable fast semantic search

1st June 2026·Updated on:4th June 2026··
Docker
Self-Host a Qdrant Vector Store for Semantic Blog Search (TypeScript + Docker)

🐳 Docker & DevOps Implementation Guides

Complete Docker guides with optimization techniques, deployment strategies, and automation prompts to streamline your containerization workflow.

No spam. Unsubscribe anytime.

📄View markdown version
0

Frequently Asked Questions

About the author

Matija Žiberna

Matija Žiberna

Full-stack developer, co-founder

AboutResume

Self-taught full-stack developer sharing lessons from building software and startups.

I'm Matija Žiberna, a self-taught full-stack developer and co-founder passionate about building products, writing clean code, and figuring out how to turn ideas into businesses. I write about web development with Next.js, lessons from entrepreneurship, and the journey of learning by doing. My goal is to provide value through code—whether it's through tools, content, or real-world software.

Contents

  • Prerequisites
  • Why Qdrant Over pgvector
  • The Architecture
  • Step 1: Spin Up Qdrant with Docker Compose
  • Step 2: Define the TypeScript Types
  • Step 3: Build the Qdrant Client Wrapper
  • Step 4: Generate Embeddings with OpenAI
  • Step 5: Run the End-to-End Integration
  • Step 6: Snapshot Backups
  • Security Checklist for VPS Deployment
  • Frequently Asked Questions
  • Conclusion
On this page:
  • Prerequisites
  • Why Qdrant Over pgvector
  • The Architecture
  • Step 1: Spin Up Qdrant with Docker Compose
  • Step 2: Define the TypeScript Types
Build with Matija logo

Build with Matija

Modern websites, content systems, and AI workflows built for long-term growth.

Services

  • Headless CMS Websites
  • Next.js & Headless CMS Advisory
  • AI Systems & Automation
  • Website & Content Audit

Resources

  • Case Studies
  • How I Work
  • Blog
  • CMS Hub
  • E-commerce Hub
  • Dashboard

Headless CMS

  • Payload CMS Developer
  • CMS Migration
  • Multi-Tenant CMS
  • Payload vs Sanity
  • Payload vs WordPress
  • Payload vs Contentful

Get in Touch

Ready to modernize your stack? Let's talk about what you're building.

Book a discovery callContact me →
© 2026Build with Matija•All rights reserved•Privacy Policy•Terms of Service
BuildWithMatija
Get In Touch

Standard blog search breaks the moment a reader phrases their query differently from what you wrote. A user asking "how do I run apps in isolated environments" will find nothing in a database full of Docker articles because LIKE '%docker%' has no concept of meaning — only characters.

The fix is a vector store. You embed your article content into numerical vectors using an AI model, store them in Qdrant running on your VPS, and then embed the user's query at search time and retrieve the closest semantic matches. The result is search that understands intent, synonyms, and conceptual relationships — all self-hosted with zero ongoing database bills.

This guide walks through the full setup: Docker Compose for Qdrant, a TypeScript client wrapper with automatic collection initialization, OpenAI embedding generation, an end-to-end integration script, and a snapshot backup service.

Tested on: Qdrant v1.18.2, Node.js v20, @qdrant/js-client-rest 1.9.x, openai 4.x. Last updated June 2026 — Matija Ziberna, full-stack developer at buildwithmatija.com.


I built this setup for buildwithmatija.com, which has over 200 technical articles. Keyword search was missing conceptually related content constantly — someone searching "containerize a node app" would get zero hits on articles that used "Docker" throughout but never that exact phrase. After running Qdrant alongside the existing Payload CMS stack on the same Hetzner VPS, query quality improved substantially without adding a managed database bill. The full pipeline took about a day to wire up, and most of the time was spent figuring out the collection initialization step that Qdrant's docs underexplain.


Prerequisites

Before starting, you need the following in place:

  • A VPS running Linux — any Ubuntu instance on Hetzner, DigitalOcean, or AWS with at least 1–2 GB of RAM works fine.
  • Docker and Docker Compose — installed on the VPS.
  • Node.js v18+ and TypeScript — on your development machine.
  • An OpenAI API key — we use the text-embedding-3-small model for generating vector embeddings.

Why Qdrant Over pgvector

The two main open-source options for self-hosted vector search are Qdrant and pgvector. Both work, but they serve different use cases.

Featurepgvector (PostgreSQL)Qdrant (Rust-native)
System footprintShared with your main database. Heavy RAM during indexing.Isolated Rust binary. Runs comfortably alongside other services on a cheap VPS.
Search speedGood at small scale, degrades as the dataset grows.Built for high-concurrency vector search from the ground up.
Advanced retrievalRequires complex SQL extension code.Native support for named vectors, sparse vectors (hybrid search), and ColBERT.
DashboardNone — requires SQL clients.Built-in web UI for inspecting points and collections.
BackupsFull SQL database dumps.Native collection-level snapshots that are lightweight and instant.

If you are already on PostgreSQL and have fewer than 10,000 vectors, pgvector keeps the stack simple. For dedicated search workloads with production backup requirements, Qdrant is the better choice.


The Architecture

The pipeline runs in four stages:

text
Payload CMS / Markdown Files
          ↓
  Text Chunking & Embedding (Node.js + OpenAI)
          ↓
  Qdrant (Self-Hosted on VPS via Docker)
          ↓
  Semantic Search & Retrieval

Each article gets split into paragraph-sized chunks, each chunk is converted into a 1536-dimensional vector using OpenAI's embedding model, and those vectors are stored in Qdrant. At query time, the user's search string goes through the same embedding step and Qdrant returns the closest matches by cosine similarity.


Step 1: Spin Up Qdrant with Docker Compose

The Qdrant Docker image is a single Rust binary that exposes two ports: 6333 for the REST API and dashboard, and 6334 for the gRPC API. We bind both strictly to 127.0.0.1 so the database is never reachable from the public internet directly.

Create a docker-compose.yml file on your VPS:

yaml
# File: docker-compose.yml
services:
  qdrant:
    image: qdrant/qdrant:v1.18.2
    container_name: qdrant
    restart: unless-stopped

    # Bound to localhost only — never expose to 0.0.0.0 in production
    ports:
      - "127.0.0.1:6333:6333"
      - "127.0.0.1:6334:6334"

    environment:
      QDRANT__SERVICE__API_KEY: "${QDRANT_API_KEY}"
      QDRANT__SERVICE__READ_ONLY_API_KEY: "${QDRANT_READ_ONLY_API_KEY}"

    volumes:
      - ./qdrant_storage:/qdrant/storage
      - ./qdrant_snapshots:/qdrant/snapshots

Qdrant natively supports two separate API keys configured via environment variables. QDRANT__SERVICE__API_KEY grants full write access and is used only by your indexing scripts. QDRANT__SERVICE__READ_ONLY_API_KEY is what your public-facing search endpoints use — read operations only, no ability to modify or delete data. Splitting these means a leaked frontend key cannot corrupt your index.

Create a .env file in the same directory. Generate the key values with openssl rand -hex 32:

env
# File: .env
QDRANT_URL=http://127.0.0.1:6333
QDRANT_API_KEY=your_secure_admin_write_key
QDRANT_READ_ONLY_API_KEY=your_secure_public_search_key
OPENAI_API_KEY=your_openai_api_key

Start the container:

bash
docker compose up -d

Qdrant will be running and persisting data to ./qdrant_storage on the VPS host. The snapshots directory will be used in Step 5.


Step 2: Define the TypeScript Types

Before building the client, define the shared types that flow through the entire pipeline. Create src/types.ts:

typescript
// File: src/types.ts

export interface RagChunk {
  id?: string;
  url: string;
  title: string;
  content: string;
  project: string;
  source?: string;
  heading?: string;
  language?: string;
  embedding?: number[];
  embeddingModel: string;
  chunkerVersion: string;
  metadata?: Record<string, unknown>;
}

export interface RagFilters {
  project?: string;
  language?: string;
  source?: string;
  url?: string;
}

export interface RagResult {
  id: string;
  score: number;
  url: string;
  title: string;
  content: string;
  project: string;
  payload: Record<string, unknown>;
}

RagChunk represents a paragraph-sized piece of article content alongside its embedding vector and metadata. RagResult is what gets returned to the search caller — the raw payload plus the cosine similarity score from Qdrant, which ranges from 0 to 1 with higher meaning more semantically similar.


Step 3: Build the Qdrant Client Wrapper

Install the official Qdrant JavaScript client:

bash
npm install @qdrant/js-client-rest

Now create src/qdrant.ts. The two critical implementation details here are collection initialization and stable point ID generation.

Qdrant does not auto-create collections. If you attempt to upsert into a collection that does not exist, the API returns a 404 and your indexing script fails silently or throws. The ensureCollection method runs a collection list check before every upsert and creates the collection with the correct vector size and distance metric if it is missing. This makes the script safe to run repeatedly — on first run it creates the collection, on subsequent runs it skips that step.

The second detail is stable IDs. Qdrant uses numeric or UUID point IDs. If you generate random IDs on each indexing run, re-indexing the same article creates duplicate points with different IDs, polluting the index. Instead, we hash the chunk's content and metadata into a deterministic UUID — the same chunk always produces the same ID, so upserting twice is a clean overwrite.

typescript
// File: src/qdrant.ts
import { QdrantClient } from "@qdrant/js-client-rest";
import { createHash } from "crypto";
import { RagChunk, RagFilters, RagResult } from "./types";

export class QdrantVectorStore {
  private client: QdrantClient;
  private collectionName = "blog_chunks";

  constructor() {
    this.client = new QdrantClient({
      url: process.env.QDRANT_URL ?? "http://127.0.0.1:6333",
      apiKey: process.env.QDRANT_API_KEY,
    });
  }

  async ensureCollection(vectorSize: number): Promise<void> {
    const response = await this.client.getCollections();
    const exists = response.collections.some(c => c.name === this.collectionName);

    if (!exists) {
      console.log(`Creating collection '${this.collectionName}' with vector size ${vectorSize}...`);
      await this.client.createCollection(this.collectionName, {
        vectors: {
          size: vectorSize,
          distance: "Cosine"
        }
      });
    }
  }

  private generateStableId(chunk: RagChunk): string {
    if (chunk.id) return chunk.id;
    const sourceData = `${chunk.project}:${chunk.url}:${chunk.chunkerVersion}:${chunk.content}`;
    const hash = createHash("sha256").update(sourceData).digest("hex");
    return [
      hash.substring(0, 8),
      hash.substring(8, 12),
      hash.substring(12, 16),
      hash.substring(16, 20),
      hash.substring(20, 32)
    ].join("-");
  }

  async upsertChunks(chunks: RagChunk[]): Promise<void> {
    if (chunks.length === 0) return;

    const firstVectorSize = chunks[0].embedding?.length;
    if (!firstVectorSize) throw new Error("No embedding vector found on chunks");

    await this.ensureCollection(firstVectorSize);

    const points = chunks.map(chunk => ({
      id: this.generateStableId(chunk),
      vector: chunk.embedding!,
      payload: {
        project: chunk.project,
        url: chunk.url,
        title: chunk.title,
        content: chunk.content,
        source: chunk.source ?? "website",
        language: chunk.language ?? "en",
        ...chunk.metadata
      }
    }));

    await this.client.upsert(this.collectionName, {
      wait: true,
      points
    });
  }

  async search(queryVector: number[], filters?: RagFilters): Promise<RagResult[]> {
    const qdrantFilter: { must: unknown[] } = { must: [] };

    if (filters) {
      for (const [key, value] of Object.entries(filters)) {
        if (value) qdrantFilter.must.push({ key, match: { value } });
      }
    }

    const response = await this.client.query(this.collectionName, {
      query: queryVector,
      filter: qdrantFilter.must.length > 0 ? qdrantFilter : undefined,
      with_payload: true,
      limit: 8
    });

    return response.points.map(hit => ({
      id: String(hit.id),
      score: hit.score ?? 0,
      url: String(hit.payload?.url),
      title: String(hit.payload?.title),
      content: String(hit.payload?.content),
      project: String(hit.payload?.project),
      payload: hit.payload ?? {}
    }));
  }
}

The search method accepts an optional filters object, which maps to Qdrant's must filter clauses. Filtering by project is useful if you run multiple sites or content sources in the same collection — it scopes results without needing a separate collection per project.


Step 4: Generate Embeddings with OpenAI

Install the OpenAI SDK:

bash
npm install openai

Create src/embedder.ts:

typescript
// File: src/embedder.ts
import { OpenAI } from "openai";

const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY
});

export async function getEmbedding(text: string): Promise<number[]> {
  const response = await openai.embeddings.create({
    model: "text-embedding-3-small",
    input: text
  });

  return response.data[0].embedding;
}

text-embedding-3-small produces 1536-dimensional vectors and costs roughly $0.02 per million tokens. For a blog with 200 articles chunked into 1000-token paragraphs, the full initial indexing run costs well under $1. The getEmbedding function is intentionally minimal — pass in a string, get back the number array that Qdrant stores or queries against.


Step 5: Run the End-to-End Integration

Install dotenv to load the .env file:

bash
npm install dotenv

Create src/main.ts as the full execution loop:

typescript
// File: src/main.ts
import * as dotenv from "dotenv";
import { QdrantVectorStore } from "./qdrant";
import { getEmbedding } from "./embedder";

dotenv.config();

async function run() {
  const store = new QdrantVectorStore();

  // A single mock article chunk representing one paragraph of content
  const blogPost = {
    url: "https://www.buildwithmatija.com/blog/docker-guide",
    title: "Docker Compose for VPS Deployments",
    content: "Docker containers run isolated processes on your server. Using Compose is ideal for managing multiple services securely.",
    project: "buildwithmatija"
  };

  console.log("Generating embedding...");
  const embedding = await getEmbedding(blogPost.content);

  console.log("Indexing chunk in Qdrant...");
  await store.upsertChunks([{
    ...blogPost,
    embedding,
    embeddingModel: "text-embedding-3-small",
    chunkerVersion: "v1"
  }]);
  console.log("Chunk indexed.");

  // Semantic search — note the query uses completely different vocabulary
  const userQuery = "How can I secure database applications on my host server?";
  console.log(`\nQuerying: "${userQuery}"`);

  const queryVector = await getEmbedding(userQuery);
  const results = await store.search(queryVector, { project: "buildwithmatija" });

  for (const match of results) {
    console.log(`[${match.score.toFixed(4)}] ${match.title}`);
    console.log(`  ${match.content.substring(0, 100)}...`);
  }
}

run().catch(console.error);

Run it with npx ts-node src/main.ts. On first run, you will see the collection creation log line, then the indexing confirmation, then search results. The query "secure database applications on my host server" does not contain the words Docker, containers, or Compose — but the chunk will still surface because the vectors are semantically close. That is the test that confirms the pipeline is working correctly.


Step 6: Snapshot Backups

Qdrant's snapshot API lets you take a binary backup of a collection without touching the live index. The backup is written to Qdrant's internal snapshot directory, then downloaded to the VPS filesystem, then deleted from Qdrant's storage to keep RAM clean. From there, you can pull it to your local machine or push it to S3 on a cron schedule.

Create src/backup.ts:

typescript
// File: src/backup.ts
import * as fs from "fs/promises";
import * as path from "path";

export class BackupService {
  private url = process.env.QDRANT_URL ?? "http://127.0.0.1:6333";
  private apiKey = process.env.QDRANT_API_KEY!;
  private localBackupDir = "./backups";

  async createAndDownloadBackup(collectionName: string): Promise<string> {
    await fs.mkdir(this.localBackupDir, { recursive: true });

    const createRes = await fetch(`${this.url}/collections/${collectionName}/snapshots`, {
      method: "POST",
      headers: { "api-key": this.apiKey }
    });
    const createData = (await createRes.json()) as { result: { name: string } };
    const snapshotName = createData.result.name;

    const downloadRes = await fetch(
      `${this.url}/collections/${collectionName}/snapshots/${snapshotName}`,
      { headers: { "api-key": this.apiKey } }
    );
    const buffer = Buffer.from(await downloadRes.arrayBuffer());

    const localFilePath = path.join(this.localBackupDir, snapshotName);
    await fs.writeFile(localFilePath, buffer);

    await fetch(`${this.url}/collections/${collectionName}/snapshots/${snapshotName}`, {
      method: "DELETE",
      headers: { "api-key": this.apiKey }
    });

    return localFilePath;
  }
}

The ./backups directory here is on the VPS host filesystem, not your local laptop. To pull the backup files to your development machine, run rsync from your local terminal:

bash
rsync -avz user@your-vps-ip:/path/to/project/backups/ ~/Desktop/qdrant_backups/

Security Checklist for VPS Deployment

Three rules cover the security surface for this setup:

Bind to localhost only. The docker-compose.yml above already does this with 127.0.0.1:6333:6333. This means nothing outside the VPS can reach Qdrant directly. All application code calling Qdrant runs on the same machine.

Access the dashboard via SSH tunnel. Since port 6333 is not publicly exposed, open it temporarily on your local machine with:

bash
ssh -L 6333:127.0.0.1:6333 user@your-vps-ip

Then navigate to http://localhost:6333/dashboard in your browser and supply the admin API key when prompted. Close the tunnel when finished.

Proxy public search through your application backend. Your frontend search UI should call your own API route, not Qdrant directly. The API route uses the QDRANT_READ_ONLY_API_KEY to query Qdrant and returns results to the client. This keeps both API keys server-side and gives you a place to add rate limiting, query sanitization, or result filtering without touching the Qdrant configuration.

If another server needs to call Qdrant directly, add an Nginx or Caddy reverse proxy that handles TLS and routes to the local port:

nginx
location / {
    proxy_pass http://127.0.0.1:6333;
    proxy_set_header Host $host;
    proxy_set_header X-Real-IP $remote_addr;
}

Frequently Asked Questions

Can I use a different embedding model instead of OpenAI? Yes. Any model that produces a fixed-length float array works. Voyage AI's voyage-3-lite is a strong alternative with lower cost. The only constraint is that the vector size you use when creating the collection must match the size the model produces — OpenAI's text-embedding-3-small outputs 1536 dimensions, for example. Switch the model in embedder.ts and make sure ensureCollection receives the correct size from the first chunk.

What happens if I switch embedding models after indexing? You need to re-index everything. Vectors from different models are not comparable — querying with a Voyage embedding against a collection indexed with OpenAI embeddings will return semantically meaningless results. The clean approach is to delete the collection, create a new one with the correct vector size, and run the full indexing script again.

How much RAM does Qdrant use on a small VPS? For a blog-scale dataset of a few thousand chunks, Qdrant comfortably runs within 200–400 MB of RAM. The Rust binary has a small base footprint and only loads vectors into memory when indexing or querying. A 2 GB Hetzner instance running Qdrant alongside a Node.js application and Nginx has enough headroom.

Do I need a separate collection for each project or site? Not necessarily. The project field in the payload and the RagFilters type in this guide are designed to let you segment multiple sites within a single collection. Filtering by project at query time scopes results correctly. Separate collections make sense if you want completely isolated backup and recovery — you can snapshot one without touching the other.

What does wait: true do in the upsert call? It tells Qdrant to wait until the points are fully indexed before returning a response. Without it, the upsert returns immediately and the data may not yet appear in search results. For indexing scripts that run sequentially, wait: true is the safe default.


Conclusion

Running Qdrant in Docker on a VPS gives you production-grade semantic search at zero extra database cost. The setup here — localhost binding, separate read and write API keys, stable point IDs, automatic collection initialization, and snapshot backups — covers everything needed to run this in production without surprises.

From here, the natural next step is wiring this into your actual CMS. If you are on Payload CMS, you can hook the afterChange collection hook to trigger upsertChunks automatically whenever an article is published or updated. The embedder and client wrapper in this guide slot directly into that pattern.

Let me know in the comments if you run into issues, and subscribe for more practical development guides.

Thanks, Matija Ziberna