Standard blog search breaks the moment a reader phrases their query differently from what you wrote. A user asking "how do I run apps in isolated environments" will find nothing in a database full of Docker articles because LIKE '%docker%' has no concept of meaning — only characters.
The fix is a vector store. You embed your article content into numerical vectors using an AI model, store them in Qdrant running on your VPS, and then embed the user's query at search time and retrieve the closest semantic matches. The result is search that understands intent, synonyms, and conceptual relationships — all self-hosted with zero ongoing database bills.
This guide walks through the full setup: Docker Compose for Qdrant, a TypeScript client wrapper with automatic collection initialization, OpenAI embedding generation, an end-to-end integration script, and a snapshot backup service.
Tested on: Qdrant v1.18.2, Node.js v20, @qdrant/js-client-rest 1.9.x, openai 4.x. Last updated June 2026 — Matija Ziberna, full-stack developer at buildwithmatija.com.
I built this setup for buildwithmatija.com, which has over 200 technical articles. Keyword search was missing conceptually related content constantly — someone searching "containerize a node app" would get zero hits on articles that used "Docker" throughout but never that exact phrase. After running Qdrant alongside the existing Payload CMS stack on the same Hetzner VPS, query quality improved substantially without adding a managed database bill. The full pipeline took about a day to wire up, and most of the time was spent figuring out the collection initialization step that Qdrant's docs underexplain.
Prerequisites
Before starting, you need the following in place:
A VPS running Linux — any Ubuntu instance on Hetzner, DigitalOcean, or AWS with at least 1–2 GB of RAM works fine.
Docker and Docker Compose — installed on the VPS.
Node.js v18+ and TypeScript — on your development machine.
An OpenAI API key — we use the text-embedding-3-small model for generating vector embeddings.
Why Qdrant Over pgvector
The two main open-source options for self-hosted vector search are Qdrant and pgvector. Both work, but they serve different use cases.
Feature
pgvector (PostgreSQL)
Qdrant (Rust-native)
System footprint
Shared with your main database. Heavy RAM during indexing.
Isolated Rust binary. Runs comfortably alongside other services on a cheap VPS.
Search speed
Good at small scale, degrades as the dataset grows.
Built for high-concurrency vector search from the ground up.
Advanced retrieval
Requires complex SQL extension code.
Native support for named vectors, sparse vectors (hybrid search), and ColBERT.
Dashboard
None — requires SQL clients.
Built-in web UI for inspecting points and collections.
Backups
Full SQL database dumps.
Native collection-level snapshots that are lightweight and instant.
If you are already on PostgreSQL and have fewer than 10,000 vectors, pgvector keeps the stack simple. For dedicated search workloads with production backup requirements, Qdrant is the better choice.
The Architecture
The pipeline runs in four stages:
text
Payload CMS / Markdown Files
↓
Text Chunking & Embedding (Node.js + OpenAI)
↓
Qdrant (Self-Hosted on VPS via Docker)
↓
Semantic Search & Retrieval
Each article gets split into paragraph-sized chunks, each chunk is converted into a 1536-dimensional vector using OpenAI's embedding model, and those vectors are stored in Qdrant. At query time, the user's search string goes through the same embedding step and Qdrant returns the closest matches by cosine similarity.
Step 1: Spin Up Qdrant with Docker Compose
The Qdrant Docker image is a single Rust binary that exposes two ports: 6333 for the REST API and dashboard, and 6334 for the gRPC API. We bind both strictly to 127.0.0.1 so the database is never reachable from the public internet directly.
Create a docker-compose.yml file on your VPS:
yaml
# File: docker-compose.ymlservices:qdrant:image:qdrant/qdrant:v1.18.2container_name:qdrantrestart:unless-stopped# Bound to localhost only — never expose to 0.0.0.0 in productionports:-"127.0.0.1:6333:6333"-"127.0.0.1:6334:6334"environment:QDRANT__SERVICE__API_KEY:"${QDRANT_API_KEY}"QDRANT__SERVICE__READ_ONLY_API_KEY:"${QDRANT_READ_ONLY_API_KEY}"volumes:-./qdrant_storage:/qdrant/storage-./qdrant_snapshots:/qdrant/snapshots
Qdrant natively supports two separate API keys configured via environment variables. QDRANT__SERVICE__API_KEY grants full write access and is used only by your indexing scripts. QDRANT__SERVICE__READ_ONLY_API_KEY is what your public-facing search endpoints use — read operations only, no ability to modify or delete data. Splitting these means a leaked frontend key cannot corrupt your index.
Create a .env file in the same directory. Generate the key values with openssl rand -hex 32:
RagChunk represents a paragraph-sized piece of article content alongside its embedding vector and metadata. RagResult is what gets returned to the search caller — the raw payload plus the cosine similarity score from Qdrant, which ranges from 0 to 1 with higher meaning more semantically similar.
Step 3: Build the Qdrant Client Wrapper
Install the official Qdrant JavaScript client:
bash
npm install @qdrant/js-client-rest
Now create src/qdrant.ts. The two critical implementation details here are collection initialization and stable point ID generation.
Qdrant does not auto-create collections. If you attempt to upsert into a collection that does not exist, the API returns a 404 and your indexing script fails silently or throws. The ensureCollection method runs a collection list check before every upsert and creates the collection with the correct vector size and distance metric if it is missing. This makes the script safe to run repeatedly — on first run it creates the collection, on subsequent runs it skips that step.
The second detail is stable IDs. Qdrant uses numeric or UUID point IDs. If you generate random IDs on each indexing run, re-indexing the same article creates duplicate points with different IDs, polluting the index. Instead, we hash the chunk's content and metadata into a deterministic UUID — the same chunk always produces the same ID, so upserting twice is a clean overwrite.
The search method accepts an optional filters object, which maps to Qdrant's must filter clauses. Filtering by project is useful if you run multiple sites or content sources in the same collection — it scopes results without needing a separate collection per project.
text-embedding-3-small produces 1536-dimensional vectors and costs roughly $0.02 per million tokens. For a blog with 200 articles chunked into 1000-token paragraphs, the full initial indexing run costs well under $1. The getEmbedding function is intentionally minimal — pass in a string, get back the number array that Qdrant stores or queries against.
Step 5: Run the End-to-End Integration
Install dotenv to load the .env file:
bash
npm install dotenv
Create src/main.ts as the full execution loop:
typescript
// File: src/main.tsimport * as dotenv from"dotenv";
import { QdrantVectorStore } from"./qdrant";
import { getEmbedding } from"./embedder";
dotenv.config();
asyncfunctionrun() {
const store = newQdrantVectorStore();
// A single mock article chunk representing one paragraph of contentconst blogPost = {
url: "https://www.buildwithmatija.com/blog/docker-guide",
title: "Docker Compose for VPS Deployments",
content: "Docker containers run isolated processes on your server. Using Compose is ideal for managing multiple services securely.",
project: "buildwithmatija"
};
console.log("Generating embedding...");
const embedding = awaitgetEmbedding(blogPost.content);
console.log("Indexing chunk in Qdrant...");
await store.upsertChunks([{
...blogPost,
embedding,
embeddingModel: "text-embedding-3-small",
chunkerVersion: "v1"
}]);
console.log("Chunk indexed.");
// Semantic search — note the query uses completely different vocabularyconst userQuery = "How can I secure database applications on my host server?";
console.log(`\nQuerying: "${userQuery}"`);
const queryVector = awaitgetEmbedding(userQuery);
const results = await store.search(queryVector, { project: "buildwithmatija" });
for (const match of results) {
console.log(`[${match.score.toFixed(4)}] ${match.title}`);
console.log(` ${match.content.substring(0, 100)}...`);
}
}
run().catch(console.error);
Run it with npx ts-node src/main.ts. On first run, you will see the collection creation log line, then the indexing confirmation, then search results. The query "secure database applications on my host server" does not contain the words Docker, containers, or Compose — but the chunk will still surface because the vectors are semantically close. That is the test that confirms the pipeline is working correctly.
Step 6: Snapshot Backups
Qdrant's snapshot API lets you take a binary backup of a collection without touching the live index. The backup is written to Qdrant's internal snapshot directory, then downloaded to the VPS filesystem, then deleted from Qdrant's storage to keep RAM clean. From there, you can pull it to your local machine or push it to S3 on a cron schedule.
The ./backups directory here is on the VPS host filesystem, not your local laptop. To pull the backup files to your development machine, run rsync from your local terminal:
Three rules cover the security surface for this setup:
Bind to localhost only. The docker-compose.yml above already does this with 127.0.0.1:6333:6333. This means nothing outside the VPS can reach Qdrant directly. All application code calling Qdrant runs on the same machine.
Access the dashboard via SSH tunnel. Since port 6333 is not publicly exposed, open it temporarily on your local machine with:
bash
ssh -L 6333:127.0.0.1:6333 user@your-vps-ip
Then navigate to http://localhost:6333/dashboard in your browser and supply the admin API key when prompted. Close the tunnel when finished.
Proxy public search through your application backend. Your frontend search UI should call your own API route, not Qdrant directly. The API route uses the QDRANT_READ_ONLY_API_KEY to query Qdrant and returns results to the client. This keeps both API keys server-side and gives you a place to add rate limiting, query sanitization, or result filtering without touching the Qdrant configuration.
If another server needs to call Qdrant directly, add an Nginx or Caddy reverse proxy that handles TLS and routes to the local port:
Can I use a different embedding model instead of OpenAI?
Yes. Any model that produces a fixed-length float array works. Voyage AI's voyage-3-lite is a strong alternative with lower cost. The only constraint is that the vector size you use when creating the collection must match the size the model produces — OpenAI's text-embedding-3-small outputs 1536 dimensions, for example. Switch the model in embedder.ts and make sure ensureCollection receives the correct size from the first chunk.
What happens if I switch embedding models after indexing?
You need to re-index everything. Vectors from different models are not comparable — querying with a Voyage embedding against a collection indexed with OpenAI embeddings will return semantically meaningless results. The clean approach is to delete the collection, create a new one with the correct vector size, and run the full indexing script again.
How much RAM does Qdrant use on a small VPS?
For a blog-scale dataset of a few thousand chunks, Qdrant comfortably runs within 200–400 MB of RAM. The Rust binary has a small base footprint and only loads vectors into memory when indexing or querying. A 2 GB Hetzner instance running Qdrant alongside a Node.js application and Nginx has enough headroom.
Do I need a separate collection for each project or site?
Not necessarily. The project field in the payload and the RagFilters type in this guide are designed to let you segment multiple sites within a single collection. Filtering by project at query time scopes results correctly. Separate collections make sense if you want completely isolated backup and recovery — you can snapshot one without touching the other.
What does wait: true do in the upsert call?
It tells Qdrant to wait until the points are fully indexed before returning a response. Without it, the upsert returns immediately and the data may not yet appear in search results. For indexing scripts that run sequentially, wait: true is the safe default.
Conclusion
Running Qdrant in Docker on a VPS gives you production-grade semantic search at zero extra database cost. The setup here — localhost binding, separate read and write API keys, stable point IDs, automatic collection initialization, and snapshot backups — covers everything needed to run this in production without surprises.
From here, the natural next step is wiring this into your actual CMS. If you are on Payload CMS, you can hook the afterChange collection hook to trigger upsertChunks automatically whenever an article is published or updated. The embedder and client wrapper in this guide slot directly into that pattern.
Let me know in the comments if you run into issues, and subscribe for more practical development guides.