Dynamic Sitemap & Robots.txt for Next.js Multi-Tenant

Step-by-step guide to tenant detection, scoping Payload queries, and runtime sitemaps/robots for Next.js on Vercel

·Matija Žiberna·
Dynamic Sitemap & Robots.txt for Next.js Multi-Tenant

⚡ Next.js Implementation Guides

In-depth Next.js guides covering App Router, RSC, ISR, and deployment. Get code examples, optimization checklists, and prompts to accelerate development.

No spam. Unsubscribe anytime.

Last week, I was deploying a multi-tenant Payload CMS application to Vercel when the build suddenly failed with a cryptic error: "[queryAllPageSlugs] Tenant is required but was not provided." The issue was that my sitemap.xml was trying to query database without any tenant context, breaking the entire build process. After hours of debugging through the Next.js App Router and Payload's multi-tenant system, I discovered a clean solution that maintains full tenant isolation while keeping the build process efficient. This guide shows you exactly how to configure dynamic sitemap and robots.txt files that work flawlessly across all tenants.

Understanding the Challenge

In a single-tenant setup, sitemap.xml and robots.txt are straightforward - you hardcode your domain and generate URLs from your database. But in a multi-tenant Payload CMS setup, each tenant has its own domain and content subset, making this approach problematic. The challenge is threefold:

  1. Next.js generates these files at build time, before any tenant context is available
  2. Payload's database layer requires tenant parameters for all queries to prevent cross-tenant data leaks
  3. The files must respond differently based on the incoming request hostname

The typical solutions of generating multiple static files or using API routes all have significant drawbacks - either maintenance overhead or SEO implications. The ideal solution is to make these files truly dynamic, responding appropriately based on the tenant that's requesting them.

Setting Up Tenant Detection

The first step is creating a reliable way to detect which tenant is requesting the sitemap or robots file. We'll use the request hostname to identify the tenant, matching first by exact domain then falling back to subdomain pattern.

Create a helper function that queries Payload's tenants collection to find the matching tenant:

// File: src/app/(frontend)/sitemap.ts
import { headers } from 'next/headers'
import { getPayload } from 'payload'
import configPromise from '@payload-config'
import { unstable_cache } from 'next/cache'

const getTenantByDomain = async (domain: string) => {
  return await unstable_cache(
    async () => {
      const payload = await getPayload({ config: configPromise })
      const tenants = await payload.find({
        collection: 'tenants',
        where: {
          or: [
            { domain: { equals: domain } },
            { slug: { equals: domain.split('.')[0] } } // Fallback to slug for subdomain patterns
          ]
        },
        limit: 1,
      })
      return tenants.docs[0] || null
    },
    [`tenant-by-domain-${domain}`],
    {
      tags: ['tenants'],
      revalidate: 3600, // Revalidate every hour
    }
  )()
}

This function does two important things: it queries Payload for a tenant matching either the exact domain or the first part of a subdomain, and it caches the result for one hour to avoid repeated database hits. The fallback logic allows setups like example-app.vercel.app to match the tenant with slug example-app.

Building the Dynamic Sitemap

With tenant detection in place, we can now build a sitemap that responds differently based on the requesting tenant. The key is to use Next.js' headers() function to get the current request hostname, then generate URLs using the tenant's specific domain.

// File: src/app/(frontend)/sitemap.ts
import type { MetadataRoute } from 'next'
import {
  queryAllPageSlugs,
  queryAllPostSlugs,
  queryAllProductSlugs,
  queryAllCaseStudySlugs,
  queryAllJobOpeningSlugs,
} from '@/payload/db'

export default async function sitemap(): Promise<MetadataRoute.Sitemap> {
  // Get hostname from request headers
  const hostname = (await headers()).get('host') || 'www.example.com'
  
  // Try to find tenant by domain or subdomain
  const tenant = await getTenantByDomain(hostname)
  
  // If no tenant found, use fallback (example)
  const baseUrl = tenant?.domain ? `https://${tenant.domain}` : `https://${hostname}`
  const tenantSlug = tenant?.slug || 'example'
  
  const [pages, posts, products, caseStudies, jobOpenings] = await Promise.all([
    queryAllPageSlugs(tenantSlug),
    queryAllPostSlugs(tenantSlug),
    queryAllProductSlugs(tenantSlug),
    queryAllCaseStudySlugs(tenantSlug),
    queryAllJobOpeningSlugs(tenantSlug),
  ])

  const entries: MetadataRoute.Sitemap = []

  // Home page
  entries.push({
    url: baseUrl,
    lastModified: new Date(),
    changeFrequency: 'yearly',
    priority: 1,
  })

  // Dynamic pages
  pages.forEach((slug) => {
    if (slug && slug !== 'home') {
      entries.push({
        url: `${baseUrl}/${slug}`,
        lastModified: new Date(),
        changeFrequency: 'monthly',
        priority: 0.8,
      })
    }
  })

  // Blog posts
  posts.forEach((slug) => {
    if (slug) {
      entries.push({
        url: `${baseUrl}/blog/${slug}`,
        lastModified: new Date(),
        changeFrequency: 'weekly',
        priority: 0.6,
      })
    }
  })

  // Products
  products.forEach((slug) => {
    if (slug) {
      entries.push({
        url: `${baseUrl}/products/${slug}`,
        lastModified: new Date(),
        changeFrequency: 'weekly',
        priority: 0.7,
      })
    }
  })

  // Case studies
  caseStudies.forEach((slug) => {
    if (slug) {
      entries.push({
        url: `${baseUrl}/case-studies/${slug}`,
        lastModified: new Date(),
        changeFrequency: 'monthly',
        priority: 0.7,
      })
    }
  })

  // Job openings
  jobOpenings.forEach((slug) => {
    if (slug) {
      entries.push({
        url: `${baseUrl}/careers/${slug}`,
        lastModified: new Date(),
        changeFrequency: 'weekly',
        priority: 0.6,
      })
    }
  })

  return entries
}

The critical insight here is that we're passing the tenant slug to all the database query functions. This ensures each query is properly scoped to the correct tenant, maintaining the security boundary that Payload's multi-tenant system provides. The baseUrl is constructed using the tenant's configured domain when available, falling back to the request hostname if needed.

Implementing the Robots.txt File

The robots.txt implementation follows the same pattern but with a different structure since it needs to return a single object rather than an array of URLs.

// File: src/app/(frontend)/robots.ts
import type { MetadataRoute } from "next"
import { headers } from "next/headers"
import { getPayload } from "payload"
import configPromise from "@payload-config"
import { unstable_cache } from "next/cache"

export default async function robots(): Promise<MetadataRoute.Robots> {
  // Get hostname from request headers
  const hostname = (await headers()).get('host') || 'www.example.com'
  
  // Try to find tenant by domain or subdomain
  const tenant = await getTenantByDomain(hostname)
  
  // If no tenant found, use fallback (example)
  const baseUrl = tenant?.domain ? `https://${tenant.domain}` : `https://${hostname}`
  
  return {
    rules: [
      {
        userAgent: "*",
        allow: "/",
        disallow: [
          "/admin",
          "/api",
        ],
        crawlDelay: 1,
      },
      {
        userAgent: "Googlebot",
        allow: "/",
        disallow: [
          "/admin",
          "/api",
        ],
      },
    ],
    sitemap: `${baseUrl}/sitemap.xml`,
    host: baseUrl,
  }
}

The key difference here is the return structure - robots.txt returns a single object with rules and references to the tenant-specific sitemap. This ensures search engines get the correct sitemap URL for each tenant while maintaining consistent crawling rules.

Common Pitfalls and Solutions

During implementation, I encountered several critical issues that you'll want to avoid:

Headers API is Asynchronous

The Next.js headers() function returns a promise, but it's easy to forget this and write synchronous code. This causes TypeScript errors and runtime failures. Always remember to await the headers call:

// ❌ This will fail
const headersList = headers()
const hostname = headersList.get('host')

// ✅ This works
const hostname = (await headers()).get('host') || 'www.example.com'

Build-Time vs Runtime Context

Initially, I tried to access request context during build time, which fails because there's no actual request. The solution is to keep the files dynamic and let Next.js handle the runtime execution. This is why the files work perfectly in production but may show fallback content during static analysis.

Tenant Parameter Enforcement

Payload's database layer is designed to prevent cross-tenant data leaks by requiring tenant parameters. This caused our initial build error. The solution isn't to bypass this security but to properly provide the tenant context:

// ❌ This throws "Tenant is required but was not provided"
await queryAllPageSlugs()

// ✅ This works and maintains tenant isolation
await queryAllPageSlugs(tenantSlug)

Cache Key Collisions

When caching tenant queries, ensure your cache keys include the tenant identifier. Otherwise, a request for tenant A might return cached data from tenant B:

// ❌ Cache key doesn't include tenant
['tenants']

// ✅ Tenant-specific cache key
[`tenant-by-domain-${domain}`]

Testing and Verification

To verify your implementation works correctly, test both the sitemap and robots endpoints for each tenant:

# Test sitemap for different tenants
curl -H "Host: tenant-a.com" http://localhost:3000/sitemap.xml
curl -H "Host: tenant-b.com" http://localhost:3000/sitemap.xml

# Test robots.txt for different tenants  
curl -H "Host: tenant-a.com" http://localhost:3000/robots.txt
curl -H "Host: tenant-b.com" http://localhost:3000/robots.txt

Each request should return URLs and configuration specific to the respective tenant. The sitemap should only include URLs for pages belonging to that tenant, and the robots.txt should reference the correct sitemap URL.

Performance Considerations

The caching strategy we implemented ensures that tenant lookups don't become a bottleneck. By caching for one hour with tenant-specific keys, we balance freshness with performance. You can adjust the revalidation period based on how frequently your tenant domains change.

For high-traffic sites, consider implementing a more aggressive caching strategy or using a CDN edge function to handle these endpoints, but for most applications, the built-in Next.js caching with our tenant-specific keys provides excellent performance.

Conclusion

By implementing dynamic tenant detection and properly scoping all database queries, we've solved the core challenge of multi-tenant sitemap and robots.txt generation in Payload 3 with Next.js. The solution maintains security boundaries, eliminates hardcoded URLs, and scales efficiently across any number of tenants.

You now have a complete understanding of how to configure dynamic sitemap.xml and robots.txt files that respond correctly to each tenant's domain while maintaining proper data isolation and performance. This approach works seamlessly with Payload's multi-tenant system and follows Next.js best practices for metadata file generation.

Let me know in the comments if you have questions, and subscribe for more practical development guides.

Thanks, Matija

0

Frequently Asked Questions

Comments

Leave a Comment

Your email will not be published

10-2000 characters

• Comments are automatically approved and will appear immediately

• Your name and email will be saved for future comments

• Be respectful and constructive in your feedback

• No spam, self-promotion, or off-topic content

Matija Žiberna
Matija Žiberna
Full-stack developer, co-founder

I'm Matija Žiberna, a self-taught full-stack developer and co-founder passionate about building products, writing clean code, and figuring out how to turn ideas into businesses. I write about web development with Next.js, lessons from entrepreneurship, and the journey of learning by doing. My goal is to provide value through code—whether it's through tools, content, or real-world software.