Self-Host Next.js and Payload on VPS: Complete Guide

If you have ever wanted to run a Next.js or Payload-style app on your own infrastructure without leaning on Vercel for hosting, you already know the gap between "I can deploy this manually" and "another developer could reproduce this from scratch." I kept running into that gap. Manual deploys work right up until you need a second environment, a second person, or a clean rollback at 11pm. So I wrote this down as the canonical version of how I actually do it, grounded in a real working deployment rather than an idealized one.

This is the full implementation guide for running an app with separate DEV, STAGING, and PRODUCTION environments, Docker Compose for the runtime, a self-hosted GitHub Actions runner, automatic staging deploys, manual production promotion, and separate .env files, databases, and object storage per environment. It is written to be reusable, but every major decision is anchored in this repo's working model, so you can see why each choice exists rather than just taking it on faith.

Reach for this when you want a deployment system that does not depend on Vercel for app hosting, that another developer can reproduce from scratch, and that stays explicit about secrets, permissions, networking, and rollback rather than hiding them inside a platform.

A quick naming note that runs through everything below: my-app is a placeholder image and repository name, prod-app is a placeholder SSH host alias from ~/.ssh/config, example.com is a placeholder domain, and placeholder filenames like example.com.conf or setup-example-letsencrypt.sh should be swapped for your project's real filenames.

For deeper context, see also:

Assumptions

Every deployment guide carries hidden assumptions, so let me make mine explicit up front. This guide assumes Ubuntu 24.04 LTS or Debian 12 on both VPS hosts, with one VPS used for build and staging and a separate VPS used for production. GitHub Actions runs on the build/staging VPS, deploys go through Docker Compose, and staging and production share the same container image shape. Production is promoted by a specific commit SHA rather than a floating main, secrets live on the servers rather than in git, and wildcard TLS uses DNS-01 rather than HTTP-01.

Your stack will probably differ in places, and that's fine. What matters is that you preserve the same control points even if the surrounding details change: one canonical secret location, one canonical runner user, one canonical deploy directory per environment, and one canonical production promotion input. Hold those four constant and the rest can flex.

The Runner Workspace Model

Before going further, it helps to be precise about where things actually live on disk, because this is a common source of confusion later. This guide assumes GitHub Actions uses the default runner workspace created by actions/checkout. That means the only persistent directories you need to care about are:

/srv/my-app-staging
/srv/my-app-prod
/srv/my-app-secrets
/srv/actions-runner

Notably, do not rely on /srv/my-app unless you intentionally maintain a persistent local clone for manual debugging or ad-hoc operator commands. The CI flow does not need it, and treating it as required will lead you astray.

Target Architecture

With those assumptions in place, here is the shape of the whole system. It is worth picturing this before touching any commands, because every later step is just filling in one of these boxes.

text

Developer machine
  └── pushes code to GitHub

Build/Staging VPS
  ├── GitHub Actions workspace for CI checkouts
  ├── self-hosted GitHub Actions runner
  ├── staging deploy directory
  ├── staging nginx
  └── Docker daemon used for builds and staging runtime

Production VPS
  ├── production deploy directory
  ├── production nginx with TLS
  ├── production app + worker containers
  ├── production database or direct DB access
  └── production object storage or direct object-store access

The delivery flow connects those two machines through a deliberate two-phase rhythm: staging happens automatically on every push, and production happens only when you choose to promote a verified commit.

text

push to main
  ├── build image on self-hosted runner
  ├── migrate staging
  ├── deploy staging
  └── smoke-test staging

manual workflow_dispatch with SHA
  ├── rebuild or reuse image for that SHA
  ├── verify production backup policy
  ├── migrate production with direct DB connection
  ├── stream image to production over SSH
  ├── restart production
  └── smoke-test production

Canonical Host State Before First Deploy

Architecture diagrams are aspirational until the hosts are actually prepared, so this section lists the minimum required state on each machine. Get this right once and the first deploy stops being a guessing game.

Build/Staging VPS

This machine carries the heaviest load because it both builds images and runs staging. It must have Docker Engine, the Docker Compose plugin, git, the Node.js version required by the app, pnpm, and postgresql-client if backups or direct DB inspection run on the runner. It also needs the self-hosted GitHub Actions runner service, an SSH private key that can reach production, a known_hosts entry for production, a staging deploy directory, and a server-side secret directory. On the network side, it needs access to the staging DB and, if production migrations run from the runner, to the production direct DB port.

The recommended directory layout keeps each concern in its own place:

text

/srv/my-app-staging/            # staging docker compose directory
/srv/my-app-secrets/            # env files and build env sources
/srv/actions-runner/            # GitHub runner

Production VPS

Production is leaner by design, since it should only ever run, never build. It must have Docker Engine, the Docker Compose plugin, a production deploy directory, an nginx config directory, a cert directory, and an env file for runtime. For the database it needs either production Postgres plus PgBouncer on the same host or network access to a production DB host, along with object storage credentials. Finally it needs inbound firewall rules for 80/tcp and 443/tcp.

Its directory layout mirrors that single-purpose intent:

text

/srv/my-app-prod/
├── .env.production
├── docker-compose.yml
├── nginx/
├── certs/example.com/
├── certbot/www/
└── scripts/

Provision Servers

Now that you know the target state, the next job is bringing each server up to it. This starts with users and permissions, because every later command inherits whatever identity you set up here.

Create users

The reusable recommendation is a dedicated deploy user on both the build/staging VPS and the production VPS. For transparency, the Farmica repo this guide is based on currently runs build/staging under the VPS login user and deploys production to root, but a cleaner setup prefers a dedicated deploy user. On the build/staging VPS:

bash

sudo adduser deploy
sudo usermod -aG docker deploy
sudo mkdir -p /srv/my-app /srv/my-app-staging /srv/my-app-secrets /srv/actions-runner
sudo chown -R deploy:deploy /srv/my-app /srv/my-app-staging /srv/my-app-secrets /srv/actions-runner

And on production:

bash

sudo adduser deploy
sudo usermod -aG docker deploy
sudo mkdir -p /srv/my-app-prod/nginx /srv/my-app-prod/certs/example.com /srv/my-app-prod/certbot/www /srv/my-app-prod/scripts
sudo chown -R deploy:deploy /srv/my-app-prod

One easy-to-miss detail: re-login after adding a user to the docker group, otherwise the new group membership won't take effect in your shell.

Install Docker and the Compose plugin

With the user in place, install Docker. The steps differ slightly between distributions, so pick the one that matches your host. If you're on Ubuntu and need the latest Docker Engine version rather than Ubuntu's bundled docker.io, the Docker upgrade guide covers the full migration. On Ubuntu:

bash

sudo apt-get update
sudo apt-get install -y ca-certificates curl gnupg
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu \
  $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
  sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update
sudo apt-get install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
sudo systemctl enable --now docker
docker version
docker compose version

On Debian the only real change is the repository URL pointing at linux/debian instead of linux/ubuntu:

bash

sudo apt-get update
sudo apt-get install -y ca-certificates curl gnupg
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/debian/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/debian \
  $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
  sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update
sudo apt-get install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
sudo systemctl enable --now docker
docker version
docker compose version

Install Node.js and pnpm on the build/staging VPS

This step is only required if migrations run from source on the runner, which this repo does. Install the Node.js version your app requires, deriving it from .nvmrc, the package.json engines field, or your CI or Dockerfile conventions. This repo specifically requires Node.js 24:

bash

curl -fsSL https://deb.nodesource.com/setup_24.x | sudo -E bash -
sudo apt-get install -y nodejs
sudo corepack enable
node -v
pnpm -v

Install Postgres client tools on the build/staging VPS

Closely related, if you create backups or run direct DB checks from the runner, you need the Postgres client tools and a backup directory the runner can write to:

bash

sudo apt-get update
sudo apt-get install -y postgresql-client
sudo mkdir -p /srv/backups
sudo chown deploy:deploy /srv/backups

Basic firewall

Finally, lock down the network surface. The build/staging VPS should allow 22/tcp, plus 80/tcp and 443/tcp if staging is public. Production should allow 22/tcp, 80/tcp, and 443/tcp. With ufw:

bash

sudo ufw allow 22/tcp
sudo ufw allow 80/tcp
sudo ufw allow 443/tcp
sudo ufw enable
sudo ufw status

Provision DNS

Servers are useless without names pointing at them, so DNS comes next. The recommended split gives each environment its own domain space: dev on *.dev.example.com, staging on *.staging.example.com, and production on *.example.com. That translates to records like:

text

example.com            A  <PRODUCTION_VPS_IP>
*.staging.example.com  A  <STAGING_VPS_IP>
staging.example.com    A  <STAGING_VPS_IP>
*.example.com          A  <PRODUCTION_VPS_IP>

If you need wildcard TLS for production, use DNS-01 and keep the production records DNS only, not proxied, during initial validation. For reference, the Farmica repo uses *.farmica.online for staging and *.farmica.si for production, with the apex farmica.si intentionally split for unrelated reasons. That apex split is not something this architecture requires, so don't feel obliged to copy it.

Provision Databases and Object Storage

It is tempting to treat data stores as something you wire up later, but that is exactly how staging ends up writing into production. Treat this as a mandatory first-deploy step.

Databases

Create a separate database and credentials per environment so the three never touch each other:

text

my_app_dev
my_app_staging
my_app_prod

For production specifically, the recommended shape splits pooled and unpooled access: DATABASE_URL points to PgBouncer or another pooled endpoint, DATABASE_URL_UNPOOLED points to direct Postgres, the app runtime uses the pooled connection, and migrations use the unpooled one. For staging and dev, direct Postgres is usually enough, though you should still keep separate DB names and credentials.

Object storage

Apply the same isolation to buckets, with separate buckets and credentials per environment:

text

my-app-dev-media
my-app-staging-media
my-app-prod-media

The non-negotiable rule here is that staging and dev must never write into the production bucket. If you are using S3-compatible storage such as Garage or MinIO, create a separate bucket, access key, and secret key for each environment.

Canonical Secret Locations

This is the single most important clarity rule in the entire guide, so it gets its own section: never copy .env files from the git checkout. Secrets belong on the server, in one canonical place. On the build/staging VPS, those canonical locations are:

text

/srv/my-app-secrets/.env.staging
/srv/my-app-secrets/.env.production
/srv/my-app-secrets/.env.staging.build
/srv/my-app-secrets/.env.production.build

And the canonical deployed env locations are:

text

/srv/my-app-staging/.env.staging
/srv/my-app-prod/.env.production

The meaning behind these is straightforward: .env.staging and .env.production are full runtime env files, while .env.staging.build and .env.production.build are optional pre-trimmed build env files. If you don't maintain separate build env files, you generate them from the full env files before each build. This repo uses that second pattern:

bash

bash deployment-templates/prepare-narocilnica-build-env.sh /srv/my-app-secrets/.env.production /tmp/my-app-build.env

That script extracts only what the build actually needs: PAYLOAD_SECRET, DATABASE_URL, NEXT_PUBLIC_VAPID_PUBLIC_KEY, and optional Sentry build vars. The working implementation lives at prepare-narocilnica-build-env.sh.

Environment Files

To make those canonical locations concrete, here is what the files themselves look like in sanitized form. Notice how staging and production are structurally identical but never share a single value.

`/srv/my-app-secrets/.env.staging`

dotenv

NODE_ENV=production
APP_ENV=staging

SERVER_URL=https://demo.staging.example.com
TENANT_STOREFRONT_BASE_DOMAIN=staging.example.com

DATABASE_URL=postgresql://staging_user:staging_pass@10.0.0.10:5432/my_app_staging
DATABASE_URL_UNPOOLED=postgresql://staging_user:staging_pass@10.0.0.10:5432/my_app_staging

S3_BUCKET=my-app-staging-media
S3_REGION=garage
S3_ENDPOINT=http://10.0.0.20:9000
S3_ACCESS_KEY_ID=staging-access-key
S3_SECRET_ACCESS_KEY=staging-secret-key

PAYLOAD_SECRET=replace-me
CRON_SECRET=replace-me
NEXT_PUBLIC_VAPID_PUBLIC_KEY=replace-me

`/srv/my-app-secrets/.env.production`

Production carries everything staging does, plus the extra credentials needed for TLS issuance and observability:

dotenv

NODE_ENV=production
APP_ENV=production

SERVER_URL=https://demo.example.com
TENANT_STOREFRONT_BASE_DOMAIN=example.com

DATABASE_URL=postgresql://prod_app:prod_pass@10.0.1.10:6432/my_app_prod
DATABASE_URL_UNPOOLED=postgresql://prod_app:prod_pass@10.0.1.10:5432/my_app_prod

S3_BUCKET=my-app-prod-media
S3_REGION=garage
S3_ENDPOINT=http://10.0.1.20:9000
S3_ACCESS_KEY_ID=prod-access-key
S3_SECRET_ACCESS_KEY=prod-secret-key

PAYLOAD_SECRET=replace-me
CRON_SECRET=replace-me
NEXT_PUBLIC_VAPID_PUBLIC_KEY=replace-me

CLOUDFLARE_DNS_API_TOKEN=replace-me
CERTBOT_EMAIL=ops@example.com
GRAFANA_ADMIN_PASSWORD=replace-me
OBSERVABILITY_NGINX_USER=ops
OBSERVABILITY_NGINX_PASSWORD=replace-me

Configure SSH From Runner To Production

With secrets sorted, the next link to forge is the one between the two machines, because production deploys happen by the runner reaching across to production over SSH. The runner user must be able to do this non-interactively.

Generate or place the key on the build/staging VPS

As the runner user, create a dedicated key:

bash

mkdir -p ~/.ssh
chmod 700 ~/.ssh
ssh-keygen -t ed25519 -f ~/.ssh/id_ed25519 -N ""
chmod 600 ~/.ssh/id_ed25519

Install the public key on production

Then push the public half to production:

bash

ssh-copy-id -i ~/.ssh/id_ed25519.pub deploy@<PRODUCTION_IP>

Populate `known_hosts`

And record production's host key so the connection verifies cleanly:

bash

ssh-keyscan -H <PRODUCTION_IP> >> ~/.ssh/known_hosts
chmod 644 ~/.ssh/known_hosts

Optional SSH config

To make the later workflow readable, give production a short alias:

ssh

Host prod-app
  HostName <PRODUCTION_IP>
  User deploy
  IdentityFile ~/.ssh/id_ed25519
  IdentitiesOnly yes

Then confirm the whole chain works:

bash

ssh prod-app 'docker ps'

One security note worth flagging: the current Farmica workflow uses StrictHostKeyChecking=no, but for a new setup you should prefer known_hosts plus normal host verification.

Install the Self-Hosted GitHub Runner

Now that the runner host can talk to production, give it the actual CI brain. Run this on the build/staging VPS as the runner user:

bash

mkdir -p /srv/actions-runner
cd /srv/actions-runner
curl -L -o actions-runner.tar.gz https://github.com/actions/runner/releases/latest/download/actions-runner-linux-x64.tar.gz
tar xzf actions-runner.tar.gz
./config.sh --url https://github.com/your-org/your-repo --token YOUR_RUNNER_TOKEN
sudo ./svc.sh install
sudo ./svc.sh start

For this to be useful, the runner needs a specific set of capabilities: it must be able to run docker build and docker compose, read /srv/my-app-secrets/*, SSH to production, reach the staging DB, and reach the production direct DB port if migrations run from the runner. Verify all of that before moving on:

bash

docker ps
docker compose version
ssh prod-app 'docker ps'
sudo systemctl status actions.runner.*

If you want a persistent manual checkout for debugging, create it separately. The workflow examples in this guide do not depend on it.

Dockerfile and Build-Time Environment Injection

A subtle but important decision is how secrets reach the build without leaking into the image. You must choose one build-time pattern and document it end to end. This repo uses BuildKit secrets for the build env file, which keeps the secret out of the final image layers.

Build env generation

First, generate the trimmed build env from the full production env:

bash

bash deployment-templates/prepare-narocilnica-build-env.sh \
  /srv/my-app-secrets/.env.production \
  /tmp/my-app-build.env

Docker build

Then build, mounting that file as a secret rather than baking it in:

bash

docker build \
  --secret id=env,src=/tmp/my-app-build.env \
  -t my-app:<git-sha> \
  .

Dockerfile consumption

Inside the Dockerfile, the secret is sourced only for the duration of the build step that needs it:

dockerfile

RUN --mount=type=secret,id=env,required=true \
    set -a && . /run/secrets/env && set +a && \
    pnpm run build

The working implementation is at Dockerfile. One practical detail about healthchecks: this repo installs curl in the image, so curl-based healthchecks work out of the box. If your image does not contain curl, either add it or use a Node-based healthcheck. The Dockerfile installs curl for exactly this reason.

Compose Templates

The build produces an image, but Compose is what actually runs it. Keep these deploy templates in git and sync them on every deploy so the deploy directories stay reproducible rather than drifting into hand-edited snowflakes.

Staging compose

The defining rule for staging is that it must not expose the app on 0.0.0.0; it binds to localhost and sits behind nginx. Alongside the app, it runs dedicated workers for media and inventory queues:

yaml

services:
  app:
    image: ${IMAGE}
    restart: unless-stopped
    env_file:
      - .env.staging
    ports:
      - '127.0.0.1:48592:65434'
    healthcheck:
      test: ['CMD', 'curl', '-f', 'http://localhost:65434/']
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 40s

  worker-media:
    image: ${IMAGE}
    restart: unless-stopped
    env_file:
      - .env.staging
    environment:
      NODE_OPTIONS: --no-deprecation
    command:
      - node
      - node_modules/payload/dist/bin/index.js
      - jobs:run
      - --cron
      - '* * * * *'
      - --queue
      - media
      - --limit
      - '5'

  worker-inventory:
    image: ${IMAGE}
    restart: unless-stopped
    env_file:
      - .env.staging
    environment:
      NODE_OPTIONS: --no-deprecation
    command:
      - node
      - node_modules/payload/dist/bin/index.js
      - jobs:run
      - --cron
      - '* * * * *'
      - --queue
      - inventory
      - --limit
      - '10'

The working implementation is at docker-compose.staging.yml.

Production compose

Production builds on the same app-plus-workers core but adds the public-facing layer: nginx, cert mounts, and nginx config mounts, with an optional observability profile. The canonical shape:

yaml

services:
  app:
    image: ${IMAGE}
    restart: unless-stopped
    env_file:
      - .env.production
    healthcheck:
      test: ['CMD', 'curl', '-f', 'http://localhost:65434/']
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 40s

  worker-media:
    image: ${IMAGE}
    restart: unless-stopped
    env_file:
      - .env.production
    command:
      - node
      - node_modules/payload/dist/bin/index.js
      - jobs:run
      - --cron
      - '* * * * *'
      - --queue
      - media
      - --limit
      - '5'

  worker-inventory:
    image: ${IMAGE}
    restart: unless-stopped
    env_file:
      - .env.production
    command:
      - node
      - node_modules/payload/dist/bin/index.js
      - jobs:run
      - --cron
      - '* * * * *'
      - --queue
      - inventory
      - --limit
      - '10'

  nginx:
    image: nginx:1.27-alpine
    restart: unless-stopped
    ports:
      - '80:80'
      - '443:443'
    volumes:
      - ./nginx/default.conf:/etc/nginx/conf.d/default.conf:ro
      - ./certs/example.com:/etc/nginx/certs/example.com:ro
      - ./certbot/www:/var/www/certbot:ro
    depends_on:
      app:
        condition: service_healthy

The working implementation is at docker-compose.production.yml.

nginx and TLS

That nginx service in the production compose needs a config and a certificate, which brings us to TLS. As with the build pattern, the rule is to choose one strategy and document it completely. This guide uses nginx inside the production Compose stack, a Let's Encrypt wildcard certificate, and DNS-01 validation.

Production nginx config

The config does two jobs: it serves the ACME challenge and redirects HTTP to HTTPS on port 80, then terminates TLS and proxies to the app on 443:

nginx

server {
  listen 80;
  server_name ~^(.+)\.example\.com$;

  location ^~ /.well-known/acme-challenge/ {
    root /var/www/certbot;
    default_type "text/plain";
  }

  location / {
    return 301 https://$host$request_uri;
  }
}

server {
  listen 443 ssl;
  http2 on;
  server_name ~^(.+)\.example\.com$;

  ssl_certificate /etc/nginx/certs/example.com/fullchain.pem;
  ssl_certificate_key /etc/nginx/certs/example.com/privkey.pem;

  location / {
    proxy_pass http://app:65434;
    proxy_http_version 1.1;
    proxy_set_header Host $host;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header X-Forwarded-Proto $scheme;
  }
}

The working implementation is at farmica.si.conf.

The Let's Encrypt decision

The reason for DNS-01 is simple: when you need a wildcard certificate like *.example.com, HTTP-01 will not do it because it does not support wildcards. DNS-01 does.

Certbot command

Install certbot and the DNS plugin first, and drop the Cloudflare credentials in a locked-down file:

bash

sudo apt-get update
sudo apt-get install -y certbot python3-certbot-dns-cloudflare
sudo mkdir -p /root/.secrets
sudo sh -c 'printf "%s\n" "dns_cloudflare_api_token = REPLACE_ME" > /root/.secrets/cloudflare.ini'
sudo chmod 600 /root/.secrets/cloudflare.ini

If you use a non-root deploy user, place the credentials file under that user's home and update the command accordingly. The canonical issuance command:

bash

certbot certonly \
  --dns-cloudflare \
  --dns-cloudflare-credentials /root/.secrets/cloudflare.ini \
  --dns-cloudflare-propagation-seconds 60 \
  -d "*.example.com" \
  --email ops@example.com \
  --agree-tos \
  --non-interactive \
  --keep-until-expiring

The working implementation is at setup-farmica-si-letsencrypt.sh.

Certificate install and reload

The certs need to land where the nginx container expects them:

text

/srv/my-app-prod/certs/example.com/fullchain.pem
/srv/my-app-prod/certs/example.com/privkey.pem

After copying or renewing certs, reload nginx so it picks them up:

bash

cd /srv/my-app-prod
docker compose --env-file .env.production exec -T nginx nginx -s reload

Staging nginx

Staging should also sit behind nginx, either via a separate host-level nginx or a separate proxy Compose project. The critical rule, repeated because it matters, is that the staging app itself binds only to 127.0.0.1.

nginx config sync policy

Finally, make one explicit choice about how nginx config is managed: either manage it in CI or treat it as a one-time manual bootstrap. This guide recommends managing nginx config in CI so the production deploy directory stays reproducible. In the workflow examples below, deployment-templates/nginx/example.com.conf and deployment-templates/scripts/setup-example-letsencrypt.sh are placeholders for your real files.

GitHub Actions Workflow

Everything so far has been preparation; the workflow is where it all comes together. The most important thing it must define is one canonical production promotion mechanism, so there is never ambiguity about what is going live.

Required triggers

The workflow responds to two events: an automatic push to main, and a manual dispatch that takes an exact SHA to promote:

yaml

on:
  push:
    branches: [main]
  workflow_dispatch:
    inputs:
      sha:
        description: Commit SHA to promote to production
        required: true
        type: string
      force_rebuild:
        description: Rebuild image even if it exists locally
        required: false
        type: boolean
        default: false

The reasoning behind the SHA input is worth internalizing: staging verifies one exact commit, so production must promote that same commit. Checking out a floating main at click time is not deterministic, and that non-determinism is precisely what bites you during an incident.

Canonical workflow shape

The full workflow splits into three jobs. The first two react to a push, building and migrating staging then deploying and smoke-testing it. The third reacts only to a manual dispatch and handles the more careful production path, including a backup before migration and streaming the image over SSH:

yaml

jobs:
  build:
    runs-on: self-hosted
    if: github.event_name == 'push'
    steps:
      - uses: actions/checkout@v6
      - name: Prepare build env
        run: |
          bash deployment-templates/prepare-narocilnica-build-env.sh \
            /srv/my-app-secrets/.env.staging \
            /tmp/my-app-build.env
      - name: Build image
        run: |
          docker build \
            --secret id=env,src=/tmp/my-app-build.env \
            -t my-app:${{ github.sha }} .
      - name: Install deps for migrate
        run: pnpm install --frozen-lockfile
      - name: Migrate staging
        run: |
          set -a && source /srv/my-app-secrets/.env.staging && set +a
          pnpm payload migrate

  deploy-staging:
    runs-on: self-hosted
    needs: build
    if: github.event_name == 'push'
    steps:
      - uses: actions/checkout@v6
      - name: Sync runtime env
        run: cp /srv/my-app-secrets/.env.staging /srv/my-app-staging/.env.staging
      - name: Sync compose template
        run: cp deployment-templates/docker-compose.staging.yml /srv/my-app-staging/docker-compose.yml
      - name: Deploy staging
        run: |
          cd /srv/my-app-staging
          export IMAGE=my-app:${{ github.sha }}
          docker compose up -d --remove-orphans --no-build
      - name: Smoke test staging
        run: |
          cd /srv/my-app-staging
          docker compose exec -T app curl -fsS http://localhost:65434/

  deploy-production:
    runs-on: self-hosted
    if: github.event_name == 'workflow_dispatch'
    steps:
      - uses: actions/checkout@v6
        with:
          ref: ${{ inputs.sha }}
      - name: Build env
        run: |
          bash deployment-templates/prepare-narocilnica-build-env.sh \
            /srv/my-app-secrets/.env.production \
            /tmp/my-app-build.env
      - name: Build or reuse image
        run: |
          TAG="my-app:${{ inputs.sha }}"
          if [ "${{ inputs.force_rebuild }}" = "true" ] || ! docker image inspect "$TAG" >/dev/null 2>&1; then
            docker build --secret id=env,src=/tmp/my-app-build.env -t "$TAG" .
          fi
      - name: Install deps for migrate
        run: pnpm install --frozen-lockfile
      - name: Create production backup before migrate
        run: |
          set -a && source /srv/my-app-secrets/.env.production && set +a
          mkdir -p /srv/backups
          pg_dump "$DATABASE_URL_UNPOOLED" > /srv/backups/my-app-prod-${{ inputs.sha }}.sql
      - name: Migrate production
        run: |
          set -a && source /srv/my-app-secrets/.env.production && set +a
          DATABASE_URL="$DATABASE_URL_UNPOOLED" pnpm payload migrate
      - name: Sync runtime env
        run: scp /srv/my-app-secrets/.env.production prod-app:/srv/my-app-prod/.env.production
      - name: Sync compose
        run: scp deployment-templates/docker-compose.production.yml prod-app:/srv/my-app-prod/docker-compose.yml
      - name: Sync nginx config and scripts
        run: |
          ssh prod-app 'mkdir -p /srv/my-app-prod/nginx /srv/my-app-prod/scripts /srv/my-app-prod/certbot/www /srv/my-app-prod/certs/example.com'
          scp deployment-templates/nginx/example.com.conf prod-app:/srv/my-app-prod/nginx/default.conf
          scp deployment-templates/scripts/setup-example-letsencrypt.sh prod-app:/srv/my-app-prod/scripts/setup-example-letsencrypt.sh
          ssh prod-app 'chmod +x /srv/my-app-prod/scripts/setup-example-letsencrypt.sh'
      - name: Stream image
        run: docker save my-app:${{ inputs.sha }} | gzip | ssh prod-app 'gzip -d | docker load'
      - name: Restart production
        run: |
          ssh prod-app '
            cd /srv/my-app-prod
            export IMAGE=my-app:${{ inputs.sha }}
            docker compose --env-file .env.production up -d --remove-orphans
          '

For honesty about the current state: the live Farmica workflow still dispatches against current main, but for a new canonical setup you should prefer the SHA input shown above. The working implementation reference is deploy.yml.

Migration Policy

Migrations are where a deploy stops being reversible by a simple image swap, so the policy around them has to be explicit rather than assumed.

The canonical rule

For this repo's shape, the order is: build the image first, install dependencies on the runner, run migrations from source on the runner, and fail the deploy before restart if migration fails. The reason migrations run from source rather than from the image is that the shipped image does not contain runnable TS migration files.

The production backup rule

Before any production migration, you either verify that a restorable backup already exists or you create one. A direct backup looks like:

bash

pg_dump "$DATABASE_URL_UNPOOLED" > /backups/my-app-prod-$(date +%F-%H%M%S).sql

If you do not want to take a fresh dump on every deploy, at minimum enforce a check that the scheduled backup succeeded recently. Either way, this is why the runner needs postgresql-client and write access to /srv/backups or another backup path.

Direct DB connection

Migrations always use the unpooled connection, overriding DATABASE_URL for the duration of the command:

bash

set -a && source /srv/my-app-secrets/.env.production && set +a
DATABASE_URL="$DATABASE_URL_UNPOOLED" pnpm payload migrate

If migration succeeds but restart fails

This is the dangerous in-between state, and it is no longer a pure app rollback. You now have a new schema running against an old or failed application process. Your options are to fix and redeploy a compatible image, or to restore the DB backup if the migration is incompatible and rollback is required.

Irreversible migrations

Some migrations cannot be undone by an app rollback at all, so document them before merge. Dropped columns, renamed tables without a compatibility layer, and destructive data transforms all fall in this category. App rollback alone does not undo them. A working helper reference is docker-run-payload.sh.

First Staging Deploy

With the policy understood, you are ready for the first real deploy. Thanks to the default runner workspace model, you do not need a manual persistent clone under /srv/my-app for CI to work. Before pushing, verify that DNS points to the staging VPS, the staging DB exists, the staging S3 bucket exists, /srv/my-app-secrets/.env.staging exists, the runner service is online, the runner can run docker ps, and staging nginx is already configured.

Then push a test commit to main and confirm the whole chain: the image built on the runner, migrations succeeded, /srv/my-app-staging/docker-compose.yml and /srv/my-app-staging/.env.staging both exist, docker compose ps shows the app and workers healthy, and https://demo.staging.example.com/ returns 200.

First Production Deploy

Once staging is proven, production follows the same spirit with more guardrails. Before the first production deploy, verify that DNS points to the production VPS, the production DB and bucket exist, /srv/my-app-secrets/.env.production exists on the runner host, /srv/my-app-prod/ exists on the production host, the runner can SSH to production, production can run docker compose, the production nginx config is synced, the TLS certificate exists or the issuance script is ready, and the production firewall allows 80 and 443.

If you are using DNS-01 wildcard TLS, issue the certificate before the first public cutover:

bash

ssh prod-app
cd /srv/my-app-prod
set -a && source .env.production && set +a
certbot certonly \
  --dns-cloudflare \
  --dns-cloudflare-credentials /root/.secrets/cloudflare.ini \
  --dns-cloudflare-propagation-seconds 60 \
  -d "*.example.com" \
  --email ops@example.com \
  --agree-tos \
  --non-interactive \
  --keep-until-expiring

Then run the workflow manually with the exact staging-verified SHA, and afterward verify the result:

bash

curl -fsS https://demo.example.com/
cd /srv/my-app-prod
docker compose --env-file .env.production ps
docker compose --env-file .env.production logs --tail=100 app
docker compose --env-file .env.production logs --tail=100 worker-media
docker compose --env-file .env.production logs --tail=100 worker-inventory

Smoke Tests

Those final checks deserve their own section, because internal localhost checks are necessary but not sufficient. A container can answer on localhost while the public route is broken, so you test both layers.

Staging

bash

cd /srv/my-app-staging
docker compose ps
docker compose exec -T app curl -fsS http://localhost:65434/
curl -fsS https://demo.staging.example.com/
docker compose logs --tail=100 app
docker compose logs --tail=100 worker-media
docker compose logs --tail=100 worker-inventory

Production

bash

cd /srv/my-app-prod
docker compose --env-file .env.production ps
docker compose --env-file .env.production exec -T app curl -fsS http://localhost:65434/
curl -fsS https://demo.example.com/
curl -fsS https://tenant-a.example.com/
docker compose --env-file .env.production logs --tail=100 app
docker compose --env-file .env.production logs --tail=100 worker-media
docker compose --env-file .env.production logs --tail=100 worker-inventory

For a tenant-routed app, check at least one real tenant hostname, not only the base domain, since base-domain success can mask broken tenant routing.

Rollback Policy

No matter how careful the deploy is, you eventually need to undo one, and not all rollbacks are equal. It helps to split them into three classes so you reach for the right tool under pressure.

App rollback

Use this when the image or config is bad but the schema is still compatible. It is the cheap, fast case:

bash

ssh prod-app '
  cd /srv/my-app-prod
  export IMAGE=my-app:<previous-good-sha>
  docker compose --env-file .env.production up -d --remove-orphans
'

Schema rollback

Use this when a migration changed the schema incompatibly and the previous image cannot run against it. Here you restore a backup, or run a documented down-migration if your project supports one.

Data rollback

Use this when background jobs changed data format, object storage writes changed structure, or partial job execution created inconsistent state. The procedure is to stop workers if needed, then restore data from backup or run a manual repair plan.

The rule that ties all three together, and the one most worth remembering: image rollback is not data rollback.

Image Retention and Cleanup

Because images are built locally and streamed over SSH, both hosts accumulate tags over time, and left unchecked that quietly fills disks. A minimum policy is to keep the current production image, keep the previous known-good image, and prune unused images regularly.

Inspect what you have:

bash

docker images my-app --format '{{.Repository}}:{{.Tag}}\t{{.CreatedAt}}\t{{.Size}}'

The simple cleanup is a blunt prune:

bash

docker image prune -f

The safer cleanup is more deliberate: list all my-app:* tags, confirm which are still referenced by running containers, and remove only the unreferenced ones. This repo's production workflow already applies that safer pattern after each deploy.

Troubleshooting

Even a well-built pipeline fails sometimes, so here are the failure modes I actually hit and where to look first.

When staging is unreachable, check DNS, staging nginx, that the app bind address is 127.0.0.1:<port>, docker compose ps, and the container healthcheck.

When production restarted but serves the old version, check that the workflow promoted the intended SHA, that the image was loaded on production, that IMAGE in the shell matches the target tag, and that docker compose up -d --remove-orphans actually ran.

When a production migration fails, check DATABASE_URL_UNPOOLED, DB privileges, the network path from the runner to direct Postgres, and recent backup availability.

When a curl healthcheck fails in the container, check whether curl is installed in the image and whether the app listens on the expected internal port.

And when TLS fails in the browser, check that DNS points at the production VPS, that nginx has the expected cert paths mounted, that the wildcard cert actually covers the hostname, and that the cert was renewed and nginx reloaded.

Farmica Working Implementation Map

Since this whole guide is grounded in a real deployment rather than a hypothetical one, here is the concrete mapping between the reusable placeholders and the actual Farmica repo that proves the pattern:

Layer	Current implementation
Build/staging VPS	`build-staging-vps`
Production VPS	`farmica`
Source checkout	`/srv/narocilnica`
Staging deploy dir	`/srv/narocilnica-staging`
Production deploy dir	`/srv/narocilnica-prod`
Runner dir	`/srv/actions-runner/`
Workflow	deploy.yml
Staging compose	docker-compose.staging.yml
Production compose	docker-compose.production.yml
nginx vhost	farmica.si.conf
LE script	setup-farmica-si-letsencrypt.sh
Build env extraction	prepare-narocilnica-build-env.sh

For completeness, the live repo differs from the reusable recommendation in three ways: production is currently accessed as root, production dispatch currently uses current main instead of a required SHA input, and the live workflow still uses StrictHostKeyChecking=no. For a new setup, prefer the stricter canonical pattern described throughout this guide.

Conclusion

The problem this guide set out to solve was the one that quietly blocks most self-hosting efforts: it is easy to deploy an app by hand, and hard to build a deployment system that another developer can reproduce, that keeps environments truly separate, and that gives you a clean answer when something breaks at the worst possible moment. The approach here solves that by being explicit about the things platforms usually hide from you, with one canonical secret location, one runner user, one deploy directory per environment, and one deterministic production promotion driven by a verified commit SHA.

Walking through it, you have set up two VPS hosts, isolated databases and object storage per environment, wired a self-hosted runner that builds images and streams them to production over SSH, terminated wildcard TLS with DNS-01, and given yourself a layered rollback story that distinguishes app, schema, and data. The result is a system you own end to end, where every moving part is visible and reproducible rather than abstracted away.

Let me know in the comments if you have questions, and subscribe for more practical development guides.

Thanks, Matija

For deeper context, see also:

Assumptions

The Runner Workspace Model

/srv/my-app-staging
/srv/my-app-prod
/srv/my-app-secrets
/srv/actions-runner

Target Architecture

With those assumptions in place, here is the shape of the whole system. It is worth picturing this before touching any commands, because every later step is just filling in one of these boxes.

text

Developer machine
  └── pushes code to GitHub

Build/Staging VPS
  ├── GitHub Actions workspace for CI checkouts
  ├── self-hosted GitHub Actions runner
  ├── staging deploy directory
  ├── staging nginx
  └── Docker daemon used for builds and staging runtime

Production VPS
  ├── production deploy directory
  ├── production nginx with TLS
  ├── production app + worker containers
  ├── production database or direct DB access
  └── production object storage or direct object-store access

text

push to main
  ├── build image on self-hosted runner
  ├── migrate staging
  ├── deploy staging
  └── smoke-test staging

manual workflow_dispatch with SHA
  ├── rebuild or reuse image for that SHA
  ├── verify production backup policy
  ├── migrate production with direct DB connection
  ├── stream image to production over SSH
  ├── restart production
  └── smoke-test production

Canonical Host State Before First Deploy

Build/Staging VPS

The recommended directory layout keeps each concern in its own place:

text

/srv/my-app-staging/            # staging docker compose directory
/srv/my-app-secrets/            # env files and build env sources
/srv/actions-runner/            # GitHub runner

Production VPS

Its directory layout mirrors that single-purpose intent:

text

/srv/my-app-prod/
├── .env.production
├── docker-compose.yml
├── nginx/
├── certs/example.com/
├── certbot/www/
└── scripts/

Provision Servers

Now that you know the target state, the next job is bringing each server up to it. This starts with users and permissions, because every later command inherits whatever identity you set up here.

Create users

bash

sudo adduser deploy
sudo usermod -aG docker deploy
sudo mkdir -p /srv/my-app /srv/my-app-staging /srv/my-app-secrets /srv/actions-runner
sudo chown -R deploy:deploy /srv/my-app /srv/my-app-staging /srv/my-app-secrets /srv/actions-runner

And on production:

bash

sudo adduser deploy
sudo usermod -aG docker deploy
sudo mkdir -p /srv/my-app-prod/nginx /srv/my-app-prod/certs/example.com /srv/my-app-prod/certbot/www /srv/my-app-prod/scripts
sudo chown -R deploy:deploy /srv/my-app-prod

One easy-to-miss detail: re-login after adding a user to the docker group, otherwise the new group membership won't take effect in your shell.

Install Docker and the Compose plugin

bash

sudo apt-get update
sudo apt-get install -y ca-certificates curl gnupg
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu \
  $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
  sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update
sudo apt-get install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
sudo systemctl enable --now docker
docker version
docker compose version

On Debian the only real change is the repository URL pointing at linux/debian instead of linux/ubuntu:

bash

sudo apt-get update
sudo apt-get install -y ca-certificates curl gnupg
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/debian/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/debian \
  $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
  sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update
sudo apt-get install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
sudo systemctl enable --now docker
docker version
docker compose version

Install Node.js and pnpm on the build/staging VPS

bash

curl -fsSL https://deb.nodesource.com/setup_24.x | sudo -E bash -
sudo apt-get install -y nodejs
sudo corepack enable
node -v
pnpm -v

Install Postgres client tools on the build/staging VPS

Closely related, if you create backups or run direct DB checks from the runner, you need the Postgres client tools and a backup directory the runner can write to:

bash

sudo apt-get update
sudo apt-get install -y postgresql-client
sudo mkdir -p /srv/backups
sudo chown deploy:deploy /srv/backups

Basic firewall

bash

sudo ufw allow 22/tcp
sudo ufw allow 80/tcp
sudo ufw allow 443/tcp
sudo ufw enable
sudo ufw status

Provision DNS

text

example.com            A  <PRODUCTION_VPS_IP>
*.staging.example.com  A  <STAGING_VPS_IP>
staging.example.com    A  <STAGING_VPS_IP>
*.example.com          A  <PRODUCTION_VPS_IP>

Provision Databases and Object Storage

It is tempting to treat data stores as something you wire up later, but that is exactly how staging ends up writing into production. Treat this as a mandatory first-deploy step.

Databases

Create a separate database and credentials per environment so the three never touch each other:

text

my_app_dev
my_app_staging
my_app_prod

Object storage

Apply the same isolation to buckets, with separate buckets and credentials per environment:

text

my-app-dev-media
my-app-staging-media
my-app-prod-media

Canonical Secret Locations

text

/srv/my-app-secrets/.env.staging
/srv/my-app-secrets/.env.production
/srv/my-app-secrets/.env.staging.build
/srv/my-app-secrets/.env.production.build

And the canonical deployed env locations are:

text

/srv/my-app-staging/.env.staging
/srv/my-app-prod/.env.production

bash

bash deployment-templates/prepare-narocilnica-build-env.sh /srv/my-app-secrets/.env.production /tmp/my-app-build.env

Environment Files

To make those canonical locations concrete, here is what the files themselves look like in sanitized form. Notice how staging and production are structurally identical but never share a single value.

`/srv/my-app-secrets/.env.staging`

dotenv

NODE_ENV=production
APP_ENV=staging

SERVER_URL=https://demo.staging.example.com
TENANT_STOREFRONT_BASE_DOMAIN=staging.example.com

DATABASE_URL=postgresql://staging_user:staging_pass@10.0.0.10:5432/my_app_staging
DATABASE_URL_UNPOOLED=postgresql://staging_user:staging_pass@10.0.0.10:5432/my_app_staging

S3_BUCKET=my-app-staging-media
S3_REGION=garage
S3_ENDPOINT=http://10.0.0.20:9000
S3_ACCESS_KEY_ID=staging-access-key
S3_SECRET_ACCESS_KEY=staging-secret-key

PAYLOAD_SECRET=replace-me
CRON_SECRET=replace-me
NEXT_PUBLIC_VAPID_PUBLIC_KEY=replace-me

`/srv/my-app-secrets/.env.production`

Production carries everything staging does, plus the extra credentials needed for TLS issuance and observability:

dotenv

NODE_ENV=production
APP_ENV=production

SERVER_URL=https://demo.example.com
TENANT_STOREFRONT_BASE_DOMAIN=example.com

DATABASE_URL=postgresql://prod_app:prod_pass@10.0.1.10:6432/my_app_prod
DATABASE_URL_UNPOOLED=postgresql://prod_app:prod_pass@10.0.1.10:5432/my_app_prod

S3_BUCKET=my-app-prod-media
S3_REGION=garage
S3_ENDPOINT=http://10.0.1.20:9000
S3_ACCESS_KEY_ID=prod-access-key
S3_SECRET_ACCESS_KEY=prod-secret-key

PAYLOAD_SECRET=replace-me
CRON_SECRET=replace-me
NEXT_PUBLIC_VAPID_PUBLIC_KEY=replace-me

CLOUDFLARE_DNS_API_TOKEN=replace-me
CERTBOT_EMAIL=ops@example.com
GRAFANA_ADMIN_PASSWORD=replace-me
OBSERVABILITY_NGINX_USER=ops
OBSERVABILITY_NGINX_PASSWORD=replace-me

Configure SSH From Runner To Production

Generate or place the key on the build/staging VPS

As the runner user, create a dedicated key:

bash

mkdir -p ~/.ssh
chmod 700 ~/.ssh
ssh-keygen -t ed25519 -f ~/.ssh/id_ed25519 -N ""
chmod 600 ~/.ssh/id_ed25519

Install the public key on production

Then push the public half to production:

bash

ssh-copy-id -i ~/.ssh/id_ed25519.pub deploy@<PRODUCTION_IP>

Populate `known_hosts`

And record production's host key so the connection verifies cleanly:

bash

ssh-keyscan -H <PRODUCTION_IP> >> ~/.ssh/known_hosts
chmod 644 ~/.ssh/known_hosts

Optional SSH config

To make the later workflow readable, give production a short alias:

ssh

Host prod-app
  HostName <PRODUCTION_IP>
  User deploy
  IdentityFile ~/.ssh/id_ed25519
  IdentitiesOnly yes

Then confirm the whole chain works:

bash

ssh prod-app 'docker ps'

One security note worth flagging: the current Farmica workflow uses StrictHostKeyChecking=no, but for a new setup you should prefer known_hosts plus normal host verification.

Install the Self-Hosted GitHub Runner

Now that the runner host can talk to production, give it the actual CI brain. Run this on the build/staging VPS as the runner user:

bash

mkdir -p /srv/actions-runner
cd /srv/actions-runner
curl -L -o actions-runner.tar.gz https://github.com/actions/runner/releases/latest/download/actions-runner-linux-x64.tar.gz
tar xzf actions-runner.tar.gz
./config.sh --url https://github.com/your-org/your-repo --token YOUR_RUNNER_TOKEN
sudo ./svc.sh install
sudo ./svc.sh start

bash

docker ps
docker compose version
ssh prod-app 'docker ps'
sudo systemctl status actions.runner.*

If you want a persistent manual checkout for debugging, create it separately. The workflow examples in this guide do not depend on it.

Dockerfile and Build-Time Environment Injection

Build env generation

First, generate the trimmed build env from the full production env:

bash

bash deployment-templates/prepare-narocilnica-build-env.sh \
  /srv/my-app-secrets/.env.production \
  /tmp/my-app-build.env

Docker build

Then build, mounting that file as a secret rather than baking it in:

bash

docker build \
  --secret id=env,src=/tmp/my-app-build.env \
  -t my-app:<git-sha> \
  .

Dockerfile consumption

Inside the Dockerfile, the secret is sourced only for the duration of the build step that needs it:

dockerfile

RUN --mount=type=secret,id=env,required=true \
    set -a && . /run/secrets/env && set +a && \
    pnpm run build

Compose Templates

Staging compose

yaml

services:
  app:
    image: ${IMAGE}
    restart: unless-stopped
    env_file:
      - .env.staging
    ports:
      - '127.0.0.1:48592:65434'
    healthcheck:
      test: ['CMD', 'curl', '-f', 'http://localhost:65434/']
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 40s

  worker-media:
    image: ${IMAGE}
    restart: unless-stopped
    env_file:
      - .env.staging
    environment:
      NODE_OPTIONS: --no-deprecation
    command:
      - node
      - node_modules/payload/dist/bin/index.js
      - jobs:run
      - --cron
      - '* * * * *'
      - --queue
      - media
      - --limit
      - '5'

  worker-inventory:
    image: ${IMAGE}
    restart: unless-stopped
    env_file:
      - .env.staging
    environment:
      NODE_OPTIONS: --no-deprecation
    command:
      - node
      - node_modules/payload/dist/bin/index.js
      - jobs:run
      - --cron
      - '* * * * *'
      - --queue
      - inventory
      - --limit
      - '10'

The working implementation is at docker-compose.staging.yml.

Production compose

Production builds on the same app-plus-workers core but adds the public-facing layer: nginx, cert mounts, and nginx config mounts, with an optional observability profile. The canonical shape:

yaml

services:
  app:
    image: ${IMAGE}
    restart: unless-stopped
    env_file:
      - .env.production
    healthcheck:
      test: ['CMD', 'curl', '-f', 'http://localhost:65434/']
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 40s

  worker-media:
    image: ${IMAGE}
    restart: unless-stopped
    env_file:
      - .env.production
    command:
      - node
      - node_modules/payload/dist/bin/index.js
      - jobs:run
      - --cron
      - '* * * * *'
      - --queue
      - media
      - --limit
      - '5'

  worker-inventory:
    image: ${IMAGE}
    restart: unless-stopped
    env_file:
      - .env.production
    command:
      - node
      - node_modules/payload/dist/bin/index.js
      - jobs:run
      - --cron
      - '* * * * *'
      - --queue
      - inventory
      - --limit
      - '10'

  nginx:
    image: nginx:1.27-alpine
    restart: unless-stopped
    ports:
      - '80:80'
      - '443:443'
    volumes:
      - ./nginx/default.conf:/etc/nginx/conf.d/default.conf:ro
      - ./certs/example.com:/etc/nginx/certs/example.com:ro
      - ./certbot/www:/var/www/certbot:ro
    depends_on:
      app:
        condition: service_healthy

The working implementation is at docker-compose.production.yml.

nginx and TLS

Production nginx config

The config does two jobs: it serves the ACME challenge and redirects HTTP to HTTPS on port 80, then terminates TLS and proxies to the app on 443:

nginx

server {
  listen 80;
  server_name ~^(.+)\.example\.com$;

  location ^~ /.well-known/acme-challenge/ {
    root /var/www/certbot;
    default_type "text/plain";
  }

  location / {
    return 301 https://$host$request_uri;
  }
}

server {
  listen 443 ssl;
  http2 on;
  server_name ~^(.+)\.example\.com$;

  ssl_certificate /etc/nginx/certs/example.com/fullchain.pem;
  ssl_certificate_key /etc/nginx/certs/example.com/privkey.pem;

  location / {
    proxy_pass http://app:65434;
    proxy_http_version 1.1;
    proxy_set_header Host $host;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header X-Forwarded-Proto $scheme;
  }
}

The working implementation is at farmica.si.conf.

The Let's Encrypt decision

The reason for DNS-01 is simple: when you need a wildcard certificate like *.example.com, HTTP-01 will not do it because it does not support wildcards. DNS-01 does.

Certbot command

Install certbot and the DNS plugin first, and drop the Cloudflare credentials in a locked-down file:

bash

sudo apt-get update
sudo apt-get install -y certbot python3-certbot-dns-cloudflare
sudo mkdir -p /root/.secrets
sudo sh -c 'printf "%s\n" "dns_cloudflare_api_token = REPLACE_ME" > /root/.secrets/cloudflare.ini'
sudo chmod 600 /root/.secrets/cloudflare.ini

If you use a non-root deploy user, place the credentials file under that user's home and update the command accordingly. The canonical issuance command:

bash

certbot certonly \
  --dns-cloudflare \
  --dns-cloudflare-credentials /root/.secrets/cloudflare.ini \
  --dns-cloudflare-propagation-seconds 60 \
  -d "*.example.com" \
  --email ops@example.com \
  --agree-tos \
  --non-interactive \
  --keep-until-expiring

The working implementation is at setup-farmica-si-letsencrypt.sh.

Certificate install and reload

The certs need to land where the nginx container expects them:

text

/srv/my-app-prod/certs/example.com/fullchain.pem
/srv/my-app-prod/certs/example.com/privkey.pem

After copying or renewing certs, reload nginx so it picks them up:

bash

cd /srv/my-app-prod
docker compose --env-file .env.production exec -T nginx nginx -s reload

Staging nginx

nginx config sync policy

GitHub Actions Workflow

Required triggers

The workflow responds to two events: an automatic push to main, and a manual dispatch that takes an exact SHA to promote:

yaml

on:
  push:
    branches: [main]
  workflow_dispatch:
    inputs:
      sha:
        description: Commit SHA to promote to production
        required: true
        type: string
      force_rebuild:
        description: Rebuild image even if it exists locally
        required: false
        type: boolean
        default: false

Canonical workflow shape

yaml

jobs:
  build:
    runs-on: self-hosted
    if: github.event_name == 'push'
    steps:
      - uses: actions/checkout@v6
      - name: Prepare build env
        run: |
          bash deployment-templates/prepare-narocilnica-build-env.sh \
            /srv/my-app-secrets/.env.staging \
            /tmp/my-app-build.env
      - name: Build image
        run: |
          docker build \
            --secret id=env,src=/tmp/my-app-build.env \
            -t my-app:${{ github.sha }} .
      - name: Install deps for migrate
        run: pnpm install --frozen-lockfile
      - name: Migrate staging
        run: |
          set -a && source /srv/my-app-secrets/.env.staging && set +a
          pnpm payload migrate

  deploy-staging:
    runs-on: self-hosted
    needs: build
    if: github.event_name == 'push'
    steps:
      - uses: actions/checkout@v6
      - name: Sync runtime env
        run: cp /srv/my-app-secrets/.env.staging /srv/my-app-staging/.env.staging
      - name: Sync compose template
        run: cp deployment-templates/docker-compose.staging.yml /srv/my-app-staging/docker-compose.yml
      - name: Deploy staging
        run: |
          cd /srv/my-app-staging
          export IMAGE=my-app:${{ github.sha }}
          docker compose up -d --remove-orphans --no-build
      - name: Smoke test staging
        run: |
          cd /srv/my-app-staging
          docker compose exec -T app curl -fsS http://localhost:65434/

  deploy-production:
    runs-on: self-hosted
    if: github.event_name == 'workflow_dispatch'
    steps:
      - uses: actions/checkout@v6
        with:
          ref: ${{ inputs.sha }}
      - name: Build env
        run: |
          bash deployment-templates/prepare-narocilnica-build-env.sh \
            /srv/my-app-secrets/.env.production \
            /tmp/my-app-build.env
      - name: Build or reuse image
        run: |
          TAG="my-app:${{ inputs.sha }}"
          if [ "${{ inputs.force_rebuild }}" = "true" ] || ! docker image inspect "$TAG" >/dev/null 2>&1; then
            docker build --secret id=env,src=/tmp/my-app-build.env -t "$TAG" .
          fi
      - name: Install deps for migrate
        run: pnpm install --frozen-lockfile
      - name: Create production backup before migrate
        run: |
          set -a && source /srv/my-app-secrets/.env.production && set +a
          mkdir -p /srv/backups
          pg_dump "$DATABASE_URL_UNPOOLED" > /srv/backups/my-app-prod-${{ inputs.sha }}.sql
      - name: Migrate production
        run: |
          set -a && source /srv/my-app-secrets/.env.production && set +a
          DATABASE_URL="$DATABASE_URL_UNPOOLED" pnpm payload migrate
      - name: Sync runtime env
        run: scp /srv/my-app-secrets/.env.production prod-app:/srv/my-app-prod/.env.production
      - name: Sync compose
        run: scp deployment-templates/docker-compose.production.yml prod-app:/srv/my-app-prod/docker-compose.yml
      - name: Sync nginx config and scripts
        run: |
          ssh prod-app 'mkdir -p /srv/my-app-prod/nginx /srv/my-app-prod/scripts /srv/my-app-prod/certbot/www /srv/my-app-prod/certs/example.com'
          scp deployment-templates/nginx/example.com.conf prod-app:/srv/my-app-prod/nginx/default.conf
          scp deployment-templates/scripts/setup-example-letsencrypt.sh prod-app:/srv/my-app-prod/scripts/setup-example-letsencrypt.sh
          ssh prod-app 'chmod +x /srv/my-app-prod/scripts/setup-example-letsencrypt.sh'
      - name: Stream image
        run: docker save my-app:${{ inputs.sha }} | gzip | ssh prod-app 'gzip -d | docker load'
      - name: Restart production
        run: |
          ssh prod-app '
            cd /srv/my-app-prod
            export IMAGE=my-app:${{ inputs.sha }}
            docker compose --env-file .env.production up -d --remove-orphans
          '

Migration Policy

Migrations are where a deploy stops being reversible by a simple image swap, so the policy around them has to be explicit rather than assumed.

The canonical rule

The production backup rule

Before any production migration, you either verify that a restorable backup already exists or you create one. A direct backup looks like:

bash

pg_dump "$DATABASE_URL_UNPOOLED" > /backups/my-app-prod-$(date +%F-%H%M%S).sql

Direct DB connection

Migrations always use the unpooled connection, overriding DATABASE_URL for the duration of the command:

bash

set -a && source /srv/my-app-secrets/.env.production && set +a
DATABASE_URL="$DATABASE_URL_UNPOOLED" pnpm payload migrate

If migration succeeds but restart fails

Irreversible migrations

First Staging Deploy

First Production Deploy

If you are using DNS-01 wildcard TLS, issue the certificate before the first public cutover:

bash

ssh prod-app
cd /srv/my-app-prod
set -a && source .env.production && set +a
certbot certonly \
  --dns-cloudflare \
  --dns-cloudflare-credentials /root/.secrets/cloudflare.ini \
  --dns-cloudflare-propagation-seconds 60 \
  -d "*.example.com" \
  --email ops@example.com \
  --agree-tos \
  --non-interactive \
  --keep-until-expiring

Then run the workflow manually with the exact staging-verified SHA, and afterward verify the result:

bash

curl -fsS https://demo.example.com/
cd /srv/my-app-prod
docker compose --env-file .env.production ps
docker compose --env-file .env.production logs --tail=100 app
docker compose --env-file .env.production logs --tail=100 worker-media
docker compose --env-file .env.production logs --tail=100 worker-inventory

Smoke Tests

Staging

bash

cd /srv/my-app-staging
docker compose ps
docker compose exec -T app curl -fsS http://localhost:65434/
curl -fsS https://demo.staging.example.com/
docker compose logs --tail=100 app
docker compose logs --tail=100 worker-media
docker compose logs --tail=100 worker-inventory

Production

bash

cd /srv/my-app-prod
docker compose --env-file .env.production ps
docker compose --env-file .env.production exec -T app curl -fsS http://localhost:65434/
curl -fsS https://demo.example.com/
curl -fsS https://tenant-a.example.com/
docker compose --env-file .env.production logs --tail=100 app
docker compose --env-file .env.production logs --tail=100 worker-media
docker compose --env-file .env.production logs --tail=100 worker-inventory

For a tenant-routed app, check at least one real tenant hostname, not only the base domain, since base-domain success can mask broken tenant routing.

Rollback Policy

No matter how careful the deploy is, you eventually need to undo one, and not all rollbacks are equal. It helps to split them into three classes so you reach for the right tool under pressure.

App rollback

Use this when the image or config is bad but the schema is still compatible. It is the cheap, fast case:

bash

ssh prod-app '
  cd /srv/my-app-prod
  export IMAGE=my-app:<previous-good-sha>
  docker compose --env-file .env.production up -d --remove-orphans
'

Schema rollback

Use this when a migration changed the schema incompatibly and the previous image cannot run against it. Here you restore a backup, or run a documented down-migration if your project supports one.

Data rollback

The rule that ties all three together, and the one most worth remembering: image rollback is not data rollback.

Image Retention and Cleanup

Inspect what you have:

bash

docker images my-app --format '{{.Repository}}:{{.Tag}}\t{{.CreatedAt}}\t{{.Size}}'

The simple cleanup is a blunt prune:

bash

docker image prune -f

Troubleshooting

Even a well-built pipeline fails sometimes, so here are the failure modes I actually hit and where to look first.

When staging is unreachable, check DNS, staging nginx, that the app bind address is 127.0.0.1:<port>, docker compose ps, and the container healthcheck.

When a production migration fails, check DATABASE_URL_UNPOOLED, DB privileges, the network path from the runner to direct Postgres, and recent backup availability.

When a curl healthcheck fails in the container, check whether curl is installed in the image and whether the app listens on the expected internal port.

Farmica Working Implementation Map

Layer	Current implementation
Build/staging VPS	`build-staging-vps`
Production VPS	`farmica`
Source checkout	`/srv/narocilnica`
Staging deploy dir	`/srv/narocilnica-staging`
Production deploy dir	`/srv/narocilnica-prod`
Runner dir	`/srv/actions-runner/`
Workflow	deploy.yml
Staging compose	docker-compose.staging.yml
Production compose	docker-compose.production.yml
nginx vhost	farmica.si.conf
LE script	setup-farmica-si-letsencrypt.sh
Build env extraction	prepare-narocilnica-build-env.sh

Conclusion

Let me know in the comments if you have questions, and subscribe for more practical development guides.

Thanks, Matija

Assumptions

The Runner Workspace Model

Target Architecture

Canonical Host State Before First Deploy

Build/Staging VPS

Production VPS

Provision Servers

Create users

Install Docker and the Compose plugin

Install Node.js and pnpm on the build/staging VPS

Install Postgres client tools on the build/staging VPS

Basic firewall

Provision DNS

Provision Databases and Object Storage

Databases

Object storage

Canonical Secret Locations

Environment Files

/srv/my-app-secrets/.env.staging

/srv/my-app-secrets/.env.production

Configure SSH From Runner To Production

Generate or place the key on the build/staging VPS

Install the public key on production

Populate known_hosts

Optional SSH config

Install the Self-Hosted GitHub Runner

Dockerfile and Build-Time Environment Injection

Build env generation

Docker build

Dockerfile consumption

Compose Templates

Staging compose

Production compose

nginx and TLS

Production nginx config

The Let's Encrypt decision

Certbot command

Certificate install and reload

Staging nginx

nginx config sync policy

GitHub Actions Workflow

Required triggers

Canonical workflow shape

Migration Policy

The canonical rule

The production backup rule

Direct DB connection

If migration succeeds but restart fails

Irreversible migrations

First Staging Deploy

First Production Deploy

Smoke Tests

Staging

Production

Rollback Policy

App rollback

Schema rollback

Data rollback

Image Retention and Cleanup

Troubleshooting

Farmica Working Implementation Map

Conclusion

⚡ Next.js Implementation Guides

Frequently Asked Questions

Can I still use Vercel or other platforms with this guide?

How do I safely run database migrations?

What if production restart fails after migration?

Assumptions

The Runner Workspace Model

Target Architecture

Canonical Host State Before First Deploy

Build/Staging VPS

Production VPS

Provision Servers

Create users

Install Docker and the Compose plugin

Install Node.js and pnpm on the build/staging VPS

Install Postgres client tools on the build/staging VPS

Basic firewall

Provision DNS

`/srv/my-app-secrets/.env.staging`

`/srv/my-app-secrets/.env.production`

Populate `known_hosts`

`/srv/my-app-secrets/.env.staging`

`/srv/my-app-secrets/.env.production`

Populate `known_hosts`