BuildWithMatija
  1. Home
  2. Blog
  3. Next.js
  4. Self-Host Next.js and Payload on VPS: Complete Guide

Self-Host Next.js and Payload on VPS: Complete Guide

Step-by-step VPS deployment using Docker Compose, GitHub Actions self-hosted runner, staging→production, TLS, backups…

22nd May 2026·Updated on:3rd June 2026··
Next.js
Self-Host Next.js and Payload on VPS: Complete Guide

⚡ Next.js Implementation Guides

In-depth Next.js guides covering App Router, RSC, ISR, and deployment. Get code examples, optimization checklists, and prompts to accelerate development.

No spam. Unsubscribe anytime.

📄View markdown version
0

Frequently Asked Questions

About the author

Matija Žiberna

Matija Žiberna

Full-stack developer, co-founder

AboutResume

Self-taught full-stack developer sharing lessons from building software and startups.

I'm Matija Žiberna, a self-taught full-stack developer and co-founder passionate about building products, writing clean code, and figuring out how to turn ideas into businesses. I write about web development with Next.js, lessons from entrepreneurship, and the journey of learning by doing. My goal is to provide value through code—whether it's through tools, content, or real-world software.

Contents

  • Assumptions
  • The Runner Workspace Model
  • Target Architecture
  • Canonical Host State Before First Deploy
  • Build/Staging VPS
  • Production VPS
  • Provision Servers
  • Create users
  • Install Docker and the Compose plugin
  • Install Node.js and pnpm on the build/staging VPS
  • Install Postgres client tools on the build/staging VPS
  • Basic firewall
  • Provision DNS
  • Provision Databases and Object Storage
  • Databases
  • Object storage
  • Canonical Secret Locations
  • Environment Files
  • `/srv/my-app-secrets/.env.staging`
  • `/srv/my-app-secrets/.env.production`
  • Configure SSH From Runner To Production
  • Generate or place the key on the build/staging VPS
  • Install the public key on production
  • Populate `known_hosts`
  • Optional SSH config
  • Install the Self-Hosted GitHub Runner
  • Dockerfile and Build-Time Environment Injection
  • Build env generation
  • Docker build
  • Dockerfile consumption
  • Compose Templates
  • Staging compose
  • Production compose
  • nginx and TLS
  • Production nginx config
  • The Let's Encrypt decision
  • Certbot command
  • Certificate install and reload
  • Staging nginx
  • nginx config sync policy
  • GitHub Actions Workflow
  • Required triggers
  • Canonical workflow shape
  • Migration Policy
  • The canonical rule
  • The production backup rule
  • Direct DB connection
  • If migration succeeds but restart fails
  • Irreversible migrations
  • First Staging Deploy
  • First Production Deploy
  • Smoke Tests
  • Staging
  • Production
  • Rollback Policy
  • App rollback
  • Schema rollback
  • Data rollback
  • Image Retention and Cleanup
  • Troubleshooting
  • Farmica Working Implementation Map
  • Conclusion
On this page:
  • Assumptions
  • The Runner Workspace Model
  • Target Architecture
  • Canonical Host State Before First Deploy
  • Provision Servers
Build with Matija logo

Build with Matija

Modern websites, content systems, and AI workflows built for long-term growth.

Services

  • Headless CMS Websites
  • Next.js & Headless CMS Advisory
  • AI Systems & Automation
  • Website & Content Audit

Resources

  • Case Studies
  • How I Work
  • Blog
  • CMS Hub
  • E-commerce Hub
  • Dashboard

Headless CMS

  • Payload CMS Developer
  • CMS Migration
  • Multi-Tenant CMS
  • Payload vs Sanity
  • Payload vs WordPress
  • Payload vs Contentful

Get in Touch

Ready to modernize your stack? Let's talk about what you're building.

Book a discovery callContact me →
© 2026Build with Matija•All rights reserved•Privacy Policy•Terms of Service
BuildWithMatija
Get In Touch

If you have ever wanted to run a Next.js or Payload-style app on your own infrastructure without leaning on Vercel for hosting, you already know the gap between "I can deploy this manually" and "another developer could reproduce this from scratch." I kept running into that gap. Manual deploys work right up until you need a second environment, a second person, or a clean rollback at 11pm. So I wrote this down as the canonical version of how I actually do it, grounded in a real working deployment rather than an idealized one.

This is the full implementation guide for running an app with separate DEV, STAGING, and PRODUCTION environments, Docker Compose for the runtime, a self-hosted GitHub Actions runner, automatic staging deploys, manual production promotion, and separate .env files, databases, and object storage per environment. It is written to be reusable, but every major decision is anchored in this repo's working model, so you can see why each choice exists rather than just taking it on faith.

Reach for this when you want a deployment system that does not depend on Vercel for app hosting, that another developer can reproduce from scratch, and that stays explicit about secrets, permissions, networking, and rollback rather than hiding them inside a platform.

A quick naming note that runs through everything below: my-app is a placeholder image and repository name, prod-app is a placeholder SSH host alias from ~/.ssh/config, example.com is a placeholder domain, and placeholder filenames like example.com.conf or setup-example-letsencrypt.sh should be swapped for your project's real filenames.

For deeper context, see also:

  • Self-hosting Payload CMS: Stop Vercel's Hidden Costs
  • Deploy Payload CMS with Next.js 16: Self-Hosted Guide
  • Docker Certbot Auto-Renewal: SSL Setup with Nginx
  • Payload CMS Logging: Queue-Based Production Best Practices

Assumptions

Every deployment guide carries hidden assumptions, so let me make mine explicit up front. This guide assumes Ubuntu 24.04 LTS or Debian 12 on both VPS hosts, with one VPS used for build and staging and a separate VPS used for production. GitHub Actions runs on the build/staging VPS, deploys go through Docker Compose, and staging and production share the same container image shape. Production is promoted by a specific commit SHA rather than a floating main, secrets live on the servers rather than in git, and wildcard TLS uses DNS-01 rather than HTTP-01.

Your stack will probably differ in places, and that's fine. What matters is that you preserve the same control points even if the surrounding details change: one canonical secret location, one canonical runner user, one canonical deploy directory per environment, and one canonical production promotion input. Hold those four constant and the rest can flex.

The Runner Workspace Model

Before going further, it helps to be precise about where things actually live on disk, because this is a common source of confusion later. This guide assumes GitHub Actions uses the default runner workspace created by actions/checkout. That means the only persistent directories you need to care about are:

  • /srv/my-app-staging
  • /srv/my-app-prod
  • /srv/my-app-secrets
  • /srv/actions-runner

Notably, do not rely on /srv/my-app unless you intentionally maintain a persistent local clone for manual debugging or ad-hoc operator commands. The CI flow does not need it, and treating it as required will lead you astray.

Target Architecture

With those assumptions in place, here is the shape of the whole system. It is worth picturing this before touching any commands, because every later step is just filling in one of these boxes.

text
Developer machine
  └── pushes code to GitHub

Build/Staging VPS
  ├── GitHub Actions workspace for CI checkouts
  ├── self-hosted GitHub Actions runner
  ├── staging deploy directory
  ├── staging nginx
  └── Docker daemon used for builds and staging runtime

Production VPS
  ├── production deploy directory
  ├── production nginx with TLS
  ├── production app + worker containers
  ├── production database or direct DB access
  └── production object storage or direct object-store access

The delivery flow connects those two machines through a deliberate two-phase rhythm: staging happens automatically on every push, and production happens only when you choose to promote a verified commit.

text
push to main
  ├── build image on self-hosted runner
  ├── migrate staging
  ├── deploy staging
  └── smoke-test staging

manual workflow_dispatch with SHA
  ├── rebuild or reuse image for that SHA
  ├── verify production backup policy
  ├── migrate production with direct DB connection
  ├── stream image to production over SSH
  ├── restart production
  └── smoke-test production

Canonical Host State Before First Deploy

Architecture diagrams are aspirational until the hosts are actually prepared, so this section lists the minimum required state on each machine. Get this right once and the first deploy stops being a guessing game.

Build/Staging VPS

This machine carries the heaviest load because it both builds images and runs staging. It must have Docker Engine, the Docker Compose plugin, git, the Node.js version required by the app, pnpm, and postgresql-client if backups or direct DB inspection run on the runner. It also needs the self-hosted GitHub Actions runner service, an SSH private key that can reach production, a known_hosts entry for production, a staging deploy directory, and a server-side secret directory. On the network side, it needs access to the staging DB and, if production migrations run from the runner, to the production direct DB port.

The recommended directory layout keeps each concern in its own place:

text
/srv/my-app-staging/            # staging docker compose directory
/srv/my-app-secrets/            # env files and build env sources
/srv/actions-runner/            # GitHub runner

Production VPS

Production is leaner by design, since it should only ever run, never build. It must have Docker Engine, the Docker Compose plugin, a production deploy directory, an nginx config directory, a cert directory, and an env file for runtime. For the database it needs either production Postgres plus PgBouncer on the same host or network access to a production DB host, along with object storage credentials. Finally it needs inbound firewall rules for 80/tcp and 443/tcp.

Its directory layout mirrors that single-purpose intent:

text
/srv/my-app-prod/
├── .env.production
├── docker-compose.yml
├── nginx/
├── certs/example.com/
├── certbot/www/
└── scripts/

Provision Servers

Now that you know the target state, the next job is bringing each server up to it. This starts with users and permissions, because every later command inherits whatever identity you set up here.

Create users

The reusable recommendation is a dedicated deploy user on both the build/staging VPS and the production VPS. For transparency, the Farmica repo this guide is based on currently runs build/staging under the VPS login user and deploys production to root, but a cleaner setup prefers a dedicated deploy user. On the build/staging VPS:

bash
sudo adduser deploy
sudo usermod -aG docker deploy
sudo mkdir -p /srv/my-app /srv/my-app-staging /srv/my-app-secrets /srv/actions-runner
sudo chown -R deploy:deploy /srv/my-app /srv/my-app-staging /srv/my-app-secrets /srv/actions-runner

And on production:

bash
sudo adduser deploy
sudo usermod -aG docker deploy
sudo mkdir -p /srv/my-app-prod/nginx /srv/my-app-prod/certs/example.com /srv/my-app-prod/certbot/www /srv/my-app-prod/scripts
sudo chown -R deploy:deploy /srv/my-app-prod

One easy-to-miss detail: re-login after adding a user to the docker group, otherwise the new group membership won't take effect in your shell.

Install Docker and the Compose plugin

With the user in place, install Docker. The steps differ slightly between distributions, so pick the one that matches your host. If you're on Ubuntu and need the latest Docker Engine version rather than Ubuntu's bundled docker.io, the Docker upgrade guide covers the full migration. On Ubuntu:

bash
sudo apt-get update
sudo apt-get install -y ca-certificates curl gnupg
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu \
  $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
  sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update
sudo apt-get install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
sudo systemctl enable --now docker
docker version
docker compose version

On Debian the only real change is the repository URL pointing at linux/debian instead of linux/ubuntu:

bash
sudo apt-get update
sudo apt-get install -y ca-certificates curl gnupg
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/debian/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/debian \
  $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
  sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update
sudo apt-get install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
sudo systemctl enable --now docker
docker version
docker compose version

Install Node.js and pnpm on the build/staging VPS

This step is only required if migrations run from source on the runner, which this repo does. Install the Node.js version your app requires, deriving it from .nvmrc, the package.json engines field, or your CI or Dockerfile conventions. This repo specifically requires Node.js 24:

bash
curl -fsSL https://deb.nodesource.com/setup_24.x | sudo -E bash -
sudo apt-get install -y nodejs
sudo corepack enable
node -v
pnpm -v

Install Postgres client tools on the build/staging VPS

Closely related, if you create backups or run direct DB checks from the runner, you need the Postgres client tools and a backup directory the runner can write to:

bash
sudo apt-get update
sudo apt-get install -y postgresql-client
sudo mkdir -p /srv/backups
sudo chown deploy:deploy /srv/backups

Basic firewall

Finally, lock down the network surface. The build/staging VPS should allow 22/tcp, plus 80/tcp and 443/tcp if staging is public. Production should allow 22/tcp, 80/tcp, and 443/tcp. With ufw:

bash
sudo ufw allow 22/tcp
sudo ufw allow 80/tcp
sudo ufw allow 443/tcp
sudo ufw enable
sudo ufw status

Provision DNS

Servers are useless without names pointing at them, so DNS comes next. The recommended split gives each environment its own domain space: dev on *.dev.example.com, staging on *.staging.example.com, and production on *.example.com. That translates to records like:

text
example.com            A  <PRODUCTION_VPS_IP>
*.staging.example.com  A  <STAGING_VPS_IP>
staging.example.com    A  <STAGING_VPS_IP>
*.example.com          A  <PRODUCTION_VPS_IP>

If you need wildcard TLS for production, use DNS-01 and keep the production records DNS only, not proxied, during initial validation. For reference, the Farmica repo uses *.farmica.online for staging and *.farmica.si for production, with the apex farmica.si intentionally split for unrelated reasons. That apex split is not something this architecture requires, so don't feel obliged to copy it.

Provision Databases and Object Storage

It is tempting to treat data stores as something you wire up later, but that is exactly how staging ends up writing into production. Treat this as a mandatory first-deploy step.

Databases

Create a separate database and credentials per environment so the three never touch each other:

text
my_app_dev
my_app_staging
my_app_prod

For production specifically, the recommended shape splits pooled and unpooled access: DATABASE_URL points to PgBouncer or another pooled endpoint, DATABASE_URL_UNPOOLED points to direct Postgres, the app runtime uses the pooled connection, and migrations use the unpooled one. For staging and dev, direct Postgres is usually enough, though you should still keep separate DB names and credentials.

Object storage

Apply the same isolation to buckets, with separate buckets and credentials per environment:

text
my-app-dev-media
my-app-staging-media
my-app-prod-media

The non-negotiable rule here is that staging and dev must never write into the production bucket. If you are using S3-compatible storage such as Garage or MinIO, create a separate bucket, access key, and secret key for each environment.

Canonical Secret Locations

This is the single most important clarity rule in the entire guide, so it gets its own section: never copy .env files from the git checkout. Secrets belong on the server, in one canonical place. On the build/staging VPS, those canonical locations are:

text
/srv/my-app-secrets/.env.staging
/srv/my-app-secrets/.env.production
/srv/my-app-secrets/.env.staging.build
/srv/my-app-secrets/.env.production.build

And the canonical deployed env locations are:

text
/srv/my-app-staging/.env.staging
/srv/my-app-prod/.env.production

The meaning behind these is straightforward: .env.staging and .env.production are full runtime env files, while .env.staging.build and .env.production.build are optional pre-trimmed build env files. If you don't maintain separate build env files, you generate them from the full env files before each build. This repo uses that second pattern:

bash
bash deployment-templates/prepare-narocilnica-build-env.sh /srv/my-app-secrets/.env.production /tmp/my-app-build.env

That script extracts only what the build actually needs: PAYLOAD_SECRET, DATABASE_URL, NEXT_PUBLIC_VAPID_PUBLIC_KEY, and optional Sentry build vars. The working implementation lives at prepare-narocilnica-build-env.sh.

Environment Files

To make those canonical locations concrete, here is what the files themselves look like in sanitized form. Notice how staging and production are structurally identical but never share a single value.

/srv/my-app-secrets/.env.staging

dotenv
NODE_ENV=production
APP_ENV=staging

SERVER_URL=https://demo.staging.example.com
TENANT_STOREFRONT_BASE_DOMAIN=staging.example.com

DATABASE_URL=postgresql://staging_user:staging_pass@10.0.0.10:5432/my_app_staging
DATABASE_URL_UNPOOLED=postgresql://staging_user:staging_pass@10.0.0.10:5432/my_app_staging

S3_BUCKET=my-app-staging-media
S3_REGION=garage
S3_ENDPOINT=http://10.0.0.20:9000
S3_ACCESS_KEY_ID=staging-access-key
S3_SECRET_ACCESS_KEY=staging-secret-key

PAYLOAD_SECRET=replace-me
CRON_SECRET=replace-me
NEXT_PUBLIC_VAPID_PUBLIC_KEY=replace-me

/srv/my-app-secrets/.env.production

Production carries everything staging does, plus the extra credentials needed for TLS issuance and observability:

dotenv
NODE_ENV=production
APP_ENV=production

SERVER_URL=https://demo.example.com
TENANT_STOREFRONT_BASE_DOMAIN=example.com

DATABASE_URL=postgresql://prod_app:prod_pass@10.0.1.10:6432/my_app_prod
DATABASE_URL_UNPOOLED=postgresql://prod_app:prod_pass@10.0.1.10:5432/my_app_prod

S3_BUCKET=my-app-prod-media
S3_REGION=garage
S3_ENDPOINT=http://10.0.1.20:9000
S3_ACCESS_KEY_ID=prod-access-key
S3_SECRET_ACCESS_KEY=prod-secret-key

PAYLOAD_SECRET=replace-me
CRON_SECRET=replace-me
NEXT_PUBLIC_VAPID_PUBLIC_KEY=replace-me

CLOUDFLARE_DNS_API_TOKEN=replace-me
CERTBOT_EMAIL=ops@example.com
GRAFANA_ADMIN_PASSWORD=replace-me
OBSERVABILITY_NGINX_USER=ops
OBSERVABILITY_NGINX_PASSWORD=replace-me

Configure SSH From Runner To Production

With secrets sorted, the next link to forge is the one between the two machines, because production deploys happen by the runner reaching across to production over SSH. The runner user must be able to do this non-interactively.

Generate or place the key on the build/staging VPS

As the runner user, create a dedicated key:

bash
mkdir -p ~/.ssh
chmod 700 ~/.ssh
ssh-keygen -t ed25519 -f ~/.ssh/id_ed25519 -N ""
chmod 600 ~/.ssh/id_ed25519

Install the public key on production

Then push the public half to production:

bash
ssh-copy-id -i ~/.ssh/id_ed25519.pub deploy@<PRODUCTION_IP>

Populate known_hosts

And record production's host key so the connection verifies cleanly:

bash
ssh-keyscan -H <PRODUCTION_IP> >> ~/.ssh/known_hosts
chmod 644 ~/.ssh/known_hosts

Optional SSH config

To make the later workflow readable, give production a short alias:

ssh
Host prod-app
  HostName <PRODUCTION_IP>
  User deploy
  IdentityFile ~/.ssh/id_ed25519
  IdentitiesOnly yes

Then confirm the whole chain works:

bash
ssh prod-app 'docker ps'

One security note worth flagging: the current Farmica workflow uses StrictHostKeyChecking=no, but for a new setup you should prefer known_hosts plus normal host verification.

Install the Self-Hosted GitHub Runner

Now that the runner host can talk to production, give it the actual CI brain. Run this on the build/staging VPS as the runner user:

bash
mkdir -p /srv/actions-runner
cd /srv/actions-runner
curl -L -o actions-runner.tar.gz https://github.com/actions/runner/releases/latest/download/actions-runner-linux-x64.tar.gz
tar xzf actions-runner.tar.gz
./config.sh --url https://github.com/your-org/your-repo --token YOUR_RUNNER_TOKEN
sudo ./svc.sh install
sudo ./svc.sh start

For this to be useful, the runner needs a specific set of capabilities: it must be able to run docker build and docker compose, read /srv/my-app-secrets/*, SSH to production, reach the staging DB, and reach the production direct DB port if migrations run from the runner. Verify all of that before moving on:

bash
docker ps
docker compose version
ssh prod-app 'docker ps'
sudo systemctl status actions.runner.*

If you want a persistent manual checkout for debugging, create it separately. The workflow examples in this guide do not depend on it.

Dockerfile and Build-Time Environment Injection

A subtle but important decision is how secrets reach the build without leaking into the image. You must choose one build-time pattern and document it end to end. This repo uses BuildKit secrets for the build env file, which keeps the secret out of the final image layers.

Build env generation

First, generate the trimmed build env from the full production env:

bash
bash deployment-templates/prepare-narocilnica-build-env.sh \
  /srv/my-app-secrets/.env.production \
  /tmp/my-app-build.env

Docker build

Then build, mounting that file as a secret rather than baking it in:

bash
docker build \
  --secret id=env,src=/tmp/my-app-build.env \
  -t my-app:<git-sha> \
  .

Dockerfile consumption

Inside the Dockerfile, the secret is sourced only for the duration of the build step that needs it:

dockerfile
RUN --mount=type=secret,id=env,required=true \
    set -a && . /run/secrets/env && set +a && \
    pnpm run build

The working implementation is at Dockerfile. One practical detail about healthchecks: this repo installs curl in the image, so curl-based healthchecks work out of the box. If your image does not contain curl, either add it or use a Node-based healthcheck. The Dockerfile installs curl for exactly this reason.

Compose Templates

The build produces an image, but Compose is what actually runs it. Keep these deploy templates in git and sync them on every deploy so the deploy directories stay reproducible rather than drifting into hand-edited snowflakes.

Staging compose

The defining rule for staging is that it must not expose the app on 0.0.0.0; it binds to localhost and sits behind nginx. Alongside the app, it runs dedicated workers for media and inventory queues:

yaml
services:
  app:
    image: ${IMAGE}
    restart: unless-stopped
    env_file:
      - .env.staging
    ports:
      - '127.0.0.1:48592:65434'
    healthcheck:
      test: ['CMD', 'curl', '-f', 'http://localhost:65434/']
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 40s

  worker-media:
    image: ${IMAGE}
    restart: unless-stopped
    env_file:
      - .env.staging
    environment:
      NODE_OPTIONS: --no-deprecation
    command:
      - node
      - node_modules/payload/dist/bin/index.js
      - jobs:run
      - --cron
      - '* * * * *'
      - --queue
      - media
      - --limit
      - '5'

  worker-inventory:
    image: ${IMAGE}
    restart: unless-stopped
    env_file:
      - .env.staging
    environment:
      NODE_OPTIONS: --no-deprecation
    command:
      - node
      - node_modules/payload/dist/bin/index.js
      - jobs:run
      - --cron
      - '* * * * *'
      - --queue
      - inventory
      - --limit
      - '10'

The working implementation is at docker-compose.staging.yml.

Production compose

Production builds on the same app-plus-workers core but adds the public-facing layer: nginx, cert mounts, and nginx config mounts, with an optional observability profile. The canonical shape:

yaml
services:
  app:
    image: ${IMAGE}
    restart: unless-stopped
    env_file:
      - .env.production
    healthcheck:
      test: ['CMD', 'curl', '-f', 'http://localhost:65434/']
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 40s

  worker-media:
    image: ${IMAGE}
    restart: unless-stopped
    env_file:
      - .env.production
    command:
      - node
      - node_modules/payload/dist/bin/index.js
      - jobs:run
      - --cron
      - '* * * * *'
      - --queue
      - media
      - --limit
      - '5'

  worker-inventory:
    image: ${IMAGE}
    restart: unless-stopped
    env_file:
      - .env.production
    command:
      - node
      - node_modules/payload/dist/bin/index.js
      - jobs:run
      - --cron
      - '* * * * *'
      - --queue
      - inventory
      - --limit
      - '10'

  nginx:
    image: nginx:1.27-alpine
    restart: unless-stopped
    ports:
      - '80:80'
      - '443:443'
    volumes:
      - ./nginx/default.conf:/etc/nginx/conf.d/default.conf:ro
      - ./certs/example.com:/etc/nginx/certs/example.com:ro
      - ./certbot/www:/var/www/certbot:ro
    depends_on:
      app:
        condition: service_healthy

The working implementation is at docker-compose.production.yml.

nginx and TLS

That nginx service in the production compose needs a config and a certificate, which brings us to TLS. As with the build pattern, the rule is to choose one strategy and document it completely. This guide uses nginx inside the production Compose stack, a Let's Encrypt wildcard certificate, and DNS-01 validation.

Production nginx config

The config does two jobs: it serves the ACME challenge and redirects HTTP to HTTPS on port 80, then terminates TLS and proxies to the app on 443:

nginx
server {
  listen 80;
  server_name ~^(.+)\.example\.com$;

  location ^~ /.well-known/acme-challenge/ {
    root /var/www/certbot;
    default_type "text/plain";
  }

  location / {
    return 301 https://$host$request_uri;
  }
}

server {
  listen 443 ssl;
  http2 on;
  server_name ~^(.+)\.example\.com$;

  ssl_certificate /etc/nginx/certs/example.com/fullchain.pem;
  ssl_certificate_key /etc/nginx/certs/example.com/privkey.pem;

  location / {
    proxy_pass http://app:65434;
    proxy_http_version 1.1;
    proxy_set_header Host $host;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header X-Forwarded-Proto $scheme;
  }
}

The working implementation is at farmica.si.conf.

The Let's Encrypt decision

The reason for DNS-01 is simple: when you need a wildcard certificate like *.example.com, HTTP-01 will not do it because it does not support wildcards. DNS-01 does.

Certbot command

Install certbot and the DNS plugin first, and drop the Cloudflare credentials in a locked-down file:

bash
sudo apt-get update
sudo apt-get install -y certbot python3-certbot-dns-cloudflare
sudo mkdir -p /root/.secrets
sudo sh -c 'printf "%s\n" "dns_cloudflare_api_token = REPLACE_ME" > /root/.secrets/cloudflare.ini'
sudo chmod 600 /root/.secrets/cloudflare.ini

If you use a non-root deploy user, place the credentials file under that user's home and update the command accordingly. The canonical issuance command:

bash
certbot certonly \
  --dns-cloudflare \
  --dns-cloudflare-credentials /root/.secrets/cloudflare.ini \
  --dns-cloudflare-propagation-seconds 60 \
  -d "*.example.com" \
  --email ops@example.com \
  --agree-tos \
  --non-interactive \
  --keep-until-expiring

The working implementation is at setup-farmica-si-letsencrypt.sh.

Certificate install and reload

The certs need to land where the nginx container expects them:

text
/srv/my-app-prod/certs/example.com/fullchain.pem
/srv/my-app-prod/certs/example.com/privkey.pem

After copying or renewing certs, reload nginx so it picks them up:

bash
cd /srv/my-app-prod
docker compose --env-file .env.production exec -T nginx nginx -s reload

Staging nginx

Staging should also sit behind nginx, either via a separate host-level nginx or a separate proxy Compose project. The critical rule, repeated because it matters, is that the staging app itself binds only to 127.0.0.1.

nginx config sync policy

Finally, make one explicit choice about how nginx config is managed: either manage it in CI or treat it as a one-time manual bootstrap. This guide recommends managing nginx config in CI so the production deploy directory stays reproducible. In the workflow examples below, deployment-templates/nginx/example.com.conf and deployment-templates/scripts/setup-example-letsencrypt.sh are placeholders for your real files.

GitHub Actions Workflow

Everything so far has been preparation; the workflow is where it all comes together. The most important thing it must define is one canonical production promotion mechanism, so there is never ambiguity about what is going live.

Required triggers

The workflow responds to two events: an automatic push to main, and a manual dispatch that takes an exact SHA to promote:

yaml
on:
  push:
    branches: [main]
  workflow_dispatch:
    inputs:
      sha:
        description: Commit SHA to promote to production
        required: true
        type: string
      force_rebuild:
        description: Rebuild image even if it exists locally
        required: false
        type: boolean
        default: false

The reasoning behind the SHA input is worth internalizing: staging verifies one exact commit, so production must promote that same commit. Checking out a floating main at click time is not deterministic, and that non-determinism is precisely what bites you during an incident.

Canonical workflow shape

The full workflow splits into three jobs. The first two react to a push, building and migrating staging then deploying and smoke-testing it. The third reacts only to a manual dispatch and handles the more careful production path, including a backup before migration and streaming the image over SSH:

yaml
jobs:
  build:
    runs-on: self-hosted
    if: github.event_name == 'push'
    steps:
      - uses: actions/checkout@v6
      - name: Prepare build env
        run: |
          bash deployment-templates/prepare-narocilnica-build-env.sh \
            /srv/my-app-secrets/.env.staging \
            /tmp/my-app-build.env
      - name: Build image
        run: |
          docker build \
            --secret id=env,src=/tmp/my-app-build.env \
            -t my-app:${{ github.sha }} .
      - name: Install deps for migrate
        run: pnpm install --frozen-lockfile
      - name: Migrate staging
        run: |
          set -a && source /srv/my-app-secrets/.env.staging && set +a
          pnpm payload migrate

  deploy-staging:
    runs-on: self-hosted
    needs: build
    if: github.event_name == 'push'
    steps:
      - uses: actions/checkout@v6
      - name: Sync runtime env
        run: cp /srv/my-app-secrets/.env.staging /srv/my-app-staging/.env.staging
      - name: Sync compose template
        run: cp deployment-templates/docker-compose.staging.yml /srv/my-app-staging/docker-compose.yml
      - name: Deploy staging
        run: |
          cd /srv/my-app-staging
          export IMAGE=my-app:${{ github.sha }}
          docker compose up -d --remove-orphans --no-build
      - name: Smoke test staging
        run: |
          cd /srv/my-app-staging
          docker compose exec -T app curl -fsS http://localhost:65434/

  deploy-production:
    runs-on: self-hosted
    if: github.event_name == 'workflow_dispatch'
    steps:
      - uses: actions/checkout@v6
        with:
          ref: ${{ inputs.sha }}
      - name: Build env
        run: |
          bash deployment-templates/prepare-narocilnica-build-env.sh \
            /srv/my-app-secrets/.env.production \
            /tmp/my-app-build.env
      - name: Build or reuse image
        run: |
          TAG="my-app:${{ inputs.sha }}"
          if [ "${{ inputs.force_rebuild }}" = "true" ] || ! docker image inspect "$TAG" >/dev/null 2>&1; then
            docker build --secret id=env,src=/tmp/my-app-build.env -t "$TAG" .
          fi
      - name: Install deps for migrate
        run: pnpm install --frozen-lockfile
      - name: Create production backup before migrate
        run: |
          set -a && source /srv/my-app-secrets/.env.production && set +a
          mkdir -p /srv/backups
          pg_dump "$DATABASE_URL_UNPOOLED" > /srv/backups/my-app-prod-${{ inputs.sha }}.sql
      - name: Migrate production
        run: |
          set -a && source /srv/my-app-secrets/.env.production && set +a
          DATABASE_URL="$DATABASE_URL_UNPOOLED" pnpm payload migrate
      - name: Sync runtime env
        run: scp /srv/my-app-secrets/.env.production prod-app:/srv/my-app-prod/.env.production
      - name: Sync compose
        run: scp deployment-templates/docker-compose.production.yml prod-app:/srv/my-app-prod/docker-compose.yml
      - name: Sync nginx config and scripts
        run: |
          ssh prod-app 'mkdir -p /srv/my-app-prod/nginx /srv/my-app-prod/scripts /srv/my-app-prod/certbot/www /srv/my-app-prod/certs/example.com'
          scp deployment-templates/nginx/example.com.conf prod-app:/srv/my-app-prod/nginx/default.conf
          scp deployment-templates/scripts/setup-example-letsencrypt.sh prod-app:/srv/my-app-prod/scripts/setup-example-letsencrypt.sh
          ssh prod-app 'chmod +x /srv/my-app-prod/scripts/setup-example-letsencrypt.sh'
      - name: Stream image
        run: docker save my-app:${{ inputs.sha }} | gzip | ssh prod-app 'gzip -d | docker load'
      - name: Restart production
        run: |
          ssh prod-app '
            cd /srv/my-app-prod
            export IMAGE=my-app:${{ inputs.sha }}
            docker compose --env-file .env.production up -d --remove-orphans
          '

For honesty about the current state: the live Farmica workflow still dispatches against current main, but for a new canonical setup you should prefer the SHA input shown above. The working implementation reference is deploy.yml.

Migration Policy

Migrations are where a deploy stops being reversible by a simple image swap, so the policy around them has to be explicit rather than assumed.

The canonical rule

For this repo's shape, the order is: build the image first, install dependencies on the runner, run migrations from source on the runner, and fail the deploy before restart if migration fails. The reason migrations run from source rather than from the image is that the shipped image does not contain runnable TS migration files.

The production backup rule

Before any production migration, you either verify that a restorable backup already exists or you create one. A direct backup looks like:

bash
pg_dump "$DATABASE_URL_UNPOOLED" > /backups/my-app-prod-$(date +%F-%H%M%S).sql

If you do not want to take a fresh dump on every deploy, at minimum enforce a check that the scheduled backup succeeded recently. Either way, this is why the runner needs postgresql-client and write access to /srv/backups or another backup path.

Direct DB connection

Migrations always use the unpooled connection, overriding DATABASE_URL for the duration of the command:

bash
set -a && source /srv/my-app-secrets/.env.production && set +a
DATABASE_URL="$DATABASE_URL_UNPOOLED" pnpm payload migrate

If migration succeeds but restart fails

This is the dangerous in-between state, and it is no longer a pure app rollback. You now have a new schema running against an old or failed application process. Your options are to fix and redeploy a compatible image, or to restore the DB backup if the migration is incompatible and rollback is required.

Irreversible migrations

Some migrations cannot be undone by an app rollback at all, so document them before merge. Dropped columns, renamed tables without a compatibility layer, and destructive data transforms all fall in this category. App rollback alone does not undo them. A working helper reference is docker-run-payload.sh.

First Staging Deploy

With the policy understood, you are ready for the first real deploy. Thanks to the default runner workspace model, you do not need a manual persistent clone under /srv/my-app for CI to work. Before pushing, verify that DNS points to the staging VPS, the staging DB exists, the staging S3 bucket exists, /srv/my-app-secrets/.env.staging exists, the runner service is online, the runner can run docker ps, and staging nginx is already configured.

Then push a test commit to main and confirm the whole chain: the image built on the runner, migrations succeeded, /srv/my-app-staging/docker-compose.yml and /srv/my-app-staging/.env.staging both exist, docker compose ps shows the app and workers healthy, and https://demo.staging.example.com/ returns 200.

First Production Deploy

Once staging is proven, production follows the same spirit with more guardrails. Before the first production deploy, verify that DNS points to the production VPS, the production DB and bucket exist, /srv/my-app-secrets/.env.production exists on the runner host, /srv/my-app-prod/ exists on the production host, the runner can SSH to production, production can run docker compose, the production nginx config is synced, the TLS certificate exists or the issuance script is ready, and the production firewall allows 80 and 443.

If you are using DNS-01 wildcard TLS, issue the certificate before the first public cutover:

bash
ssh prod-app
cd /srv/my-app-prod
set -a && source .env.production && set +a
certbot certonly \
  --dns-cloudflare \
  --dns-cloudflare-credentials /root/.secrets/cloudflare.ini \
  --dns-cloudflare-propagation-seconds 60 \
  -d "*.example.com" \
  --email ops@example.com \
  --agree-tos \
  --non-interactive \
  --keep-until-expiring

Then run the workflow manually with the exact staging-verified SHA, and afterward verify the result:

bash
curl -fsS https://demo.example.com/
cd /srv/my-app-prod
docker compose --env-file .env.production ps
docker compose --env-file .env.production logs --tail=100 app
docker compose --env-file .env.production logs --tail=100 worker-media
docker compose --env-file .env.production logs --tail=100 worker-inventory

Smoke Tests

Those final checks deserve their own section, because internal localhost checks are necessary but not sufficient. A container can answer on localhost while the public route is broken, so you test both layers.

Staging

bash
cd /srv/my-app-staging
docker compose ps
docker compose exec -T app curl -fsS http://localhost:65434/
curl -fsS https://demo.staging.example.com/
docker compose logs --tail=100 app
docker compose logs --tail=100 worker-media
docker compose logs --tail=100 worker-inventory

Production

bash
cd /srv/my-app-prod
docker compose --env-file .env.production ps
docker compose --env-file .env.production exec -T app curl -fsS http://localhost:65434/
curl -fsS https://demo.example.com/
curl -fsS https://tenant-a.example.com/
docker compose --env-file .env.production logs --tail=100 app
docker compose --env-file .env.production logs --tail=100 worker-media
docker compose --env-file .env.production logs --tail=100 worker-inventory

For a tenant-routed app, check at least one real tenant hostname, not only the base domain, since base-domain success can mask broken tenant routing.

Rollback Policy

No matter how careful the deploy is, you eventually need to undo one, and not all rollbacks are equal. It helps to split them into three classes so you reach for the right tool under pressure.

App rollback

Use this when the image or config is bad but the schema is still compatible. It is the cheap, fast case:

bash
ssh prod-app '
  cd /srv/my-app-prod
  export IMAGE=my-app:<previous-good-sha>
  docker compose --env-file .env.production up -d --remove-orphans
'

Schema rollback

Use this when a migration changed the schema incompatibly and the previous image cannot run against it. Here you restore a backup, or run a documented down-migration if your project supports one.

Data rollback

Use this when background jobs changed data format, object storage writes changed structure, or partial job execution created inconsistent state. The procedure is to stop workers if needed, then restore data from backup or run a manual repair plan.

The rule that ties all three together, and the one most worth remembering: image rollback is not data rollback.

Image Retention and Cleanup

Because images are built locally and streamed over SSH, both hosts accumulate tags over time, and left unchecked that quietly fills disks. A minimum policy is to keep the current production image, keep the previous known-good image, and prune unused images regularly.

Inspect what you have:

bash
docker images my-app --format '{{.Repository}}:{{.Tag}}\t{{.CreatedAt}}\t{{.Size}}'

The simple cleanup is a blunt prune:

bash
docker image prune -f

The safer cleanup is more deliberate: list all my-app:* tags, confirm which are still referenced by running containers, and remove only the unreferenced ones. This repo's production workflow already applies that safer pattern after each deploy.

Troubleshooting

Even a well-built pipeline fails sometimes, so here are the failure modes I actually hit and where to look first.

When staging is unreachable, check DNS, staging nginx, that the app bind address is 127.0.0.1:<port>, docker compose ps, and the container healthcheck.

When production restarted but serves the old version, check that the workflow promoted the intended SHA, that the image was loaded on production, that IMAGE in the shell matches the target tag, and that docker compose up -d --remove-orphans actually ran.

When a production migration fails, check DATABASE_URL_UNPOOLED, DB privileges, the network path from the runner to direct Postgres, and recent backup availability.

When a curl healthcheck fails in the container, check whether curl is installed in the image and whether the app listens on the expected internal port.

And when TLS fails in the browser, check that DNS points at the production VPS, that nginx has the expected cert paths mounted, that the wildcard cert actually covers the hostname, and that the cert was renewed and nginx reloaded.

Farmica Working Implementation Map

Since this whole guide is grounded in a real deployment rather than a hypothetical one, here is the concrete mapping between the reusable placeholders and the actual Farmica repo that proves the pattern:

LayerCurrent implementation
Build/staging VPSbuild-staging-vps
Production VPSfarmica
Source checkout/srv/narocilnica
Staging deploy dir/srv/narocilnica-staging
Production deploy dir/srv/narocilnica-prod
Runner dir/srv/actions-runner/
Workflowdeploy.yml
Staging composedocker-compose.staging.yml
Production composedocker-compose.production.yml
nginx vhostfarmica.si.conf
LE scriptsetup-farmica-si-letsencrypt.sh
Build env extractionprepare-narocilnica-build-env.sh

For completeness, the live repo differs from the reusable recommendation in three ways: production is currently accessed as root, production dispatch currently uses current main instead of a required SHA input, and the live workflow still uses StrictHostKeyChecking=no. For a new setup, prefer the stricter canonical pattern described throughout this guide.

Conclusion

The problem this guide set out to solve was the one that quietly blocks most self-hosting efforts: it is easy to deploy an app by hand, and hard to build a deployment system that another developer can reproduce, that keeps environments truly separate, and that gives you a clean answer when something breaks at the worst possible moment. The approach here solves that by being explicit about the things platforms usually hide from you, with one canonical secret location, one runner user, one deploy directory per environment, and one deterministic production promotion driven by a verified commit SHA.

Walking through it, you have set up two VPS hosts, isolated databases and object storage per environment, wired a self-hosted runner that builds images and streams them to production over SSH, terminated wildcard TLS with DNS-01, and given yourself a layered rollback story that distinguishes app, schema, and data. The result is a system you own end to end, where every moving part is visible and reproducible rather than abstracted away.

Let me know in the comments if you have questions, and subscribe for more practical development guides.

Thanks, Matija