- Dynamic robots.txt in Next.js for Multi-Tenant Sites
Dynamic robots.txt in Next.js for Multi-Tenant Sites
Serve per-tenant robots.txt, sitemap.xml, and humans.txt with Next.js App Router and Payload CMS—caching and…

⚡ Next.js Implementation Guides
In-depth Next.js guides covering App Router, RSC, ISR, and deployment. Get code examples, optimization checklists, and prompts to accelerate development.
Related Posts:
I recently tackled a common challenge in multi-tenant architectures: how to serve unique robots.txt and sitemap.xml files for different domains running on the same application. A static file in /public just doesn't cut it when Tenant A needs to block AI bots while Tenant B wants full indexing.
This guide walks through the robust, cached solution I implemented using Next.js App Router and Payload CMS.
1. The Core Utility: Centralized Tenant Lookups
The first step was to stop repeating ourselves. We needed a single, reliable way to resolve the current tenant from the hostname—whether it's a custom domain (example.com) or a subdomain (tenant.app.com).
I centralized this logic in src/payload/db/index.ts using unstable_cache to keep performance high. This function is the backbone of our SEO strategy.
// File: src/payload/db/index.ts
export const getTenantByDomain = async (domain: string) => {
return await unstable_cache(
async () => {
const payload = await getPayloadClient();
const tenants = await payload.find({
collection: "tenants",
where: {
or: [
{ domain: { equals: domain } },
{ slug: { equals: domain.split('.')[0] } } // Fallback to slug for subdomain patterns
]
},
limit: 1,
});
return tenants.docs[0] || null;
},
[CACHE_KEY.TENANT_BY_DOMAIN(domain)],
{
tags: [TAGS.TENANTS],
revalidate: 3600, // Revalidate every hour
}
)();
};
Why this matters:
This function handles the heavy lifting of database queries and caching. By centralizing it, we ensure that robots.txt, sitemap.xml, and humans.txt all "agree" on which tenant is active.
2. Dynamic Robots.txt with AI Protection
With the tenant lookup in place, I created a dynamic route handler for robots.txt. This isn't just a static file anymore; it's code. This allows us to inject the correct sitemap URL for the specific tenant and apply global rules, like blocking AI scrapers.
// File: src/app/robots.ts
import type { MetadataRoute } from "next";
import { headers } from "next/headers";
import { getTenantByDomain } from "@/payload/db";
export default async function robots(): Promise<MetadataRoute.Robots> {
// Get hostname from request headers
const hostname = (await headers()).get('host') || 'www.adart.com';
// Try to find tenant by domain or subdomain
const tenant = await getTenantByDomain(hostname);
// If no tenant found, use fallback (adart)
const baseUrl = tenant?.domain ? \`https://\${tenant.domain}\` : \`https://\${hostname}\`;
return {
rules: [
// Block AI Scraping Bots
{
userAgent: ["GPTBot", "CCBot", "Google-Extended"],
disallow: ["/"],
},
// Standard bots
{
userAgent: "*",
allow: "/",
disallow: [
"/admin",
"/api",
],
crawlDelay: 1,
},
],
sitemap: \`\${baseUrl}/sitemap.xml\`,
host: baseUrl,
};
}
Key Features:
- Dynamic Host: The
sitemaplink automatically matches the visitor's domain. - AI Blocking: explicit blocks for
GPTBot,CCBot, andGoogle-Extendedprotecting our content intelligence.
3. Dynamic Humans.txt
To give credit where it's due, I also implemented a humans.txt endpoint. This is a nice touch that adds personality and transparency to the site, dynamically acknowledging the specific tenant.
// File: src/app/humans.ts
import { headers } from "next/headers";
import { getTenantByDomain } from "@/payload/db";
export default async function humans() {
const hostname = (await headers()).get('host') || '';
const tenant = await getTenantByDomain(hostname);
const tenantName = tenant?.name || 'Ad Art';
const content = \`/* TEAM */
Site built by: Ad Art Team
For: \${tenantName}
/* SITE */
Standards: HTML5, CSS3, TypeScript
Components: Payload CMS, Next.js\`;
return new Response(content, {
headers: { 'Content-Type': 'text/plain; charset=utf-8' },
});
}
4. The Critical Fix: Middleware Matcher
This was the tricky part. Even with the files in place, robots.txt was returning a 404.
The culprit was src/middleware.ts. The matcher regex was swallowing requests to files if they didn't match specific patterns. I updated the negative lookahead to explicitly exclude any path with a file extension (like .txt or .xml).
// File: src/middleware.ts
export const config = {
matcher: [
/*
* Match all request paths except for:
* ...
* 5. Static files (e.g. /favicon.ico, /robots.txt) - Matched by .*\\..*
*/
'/((?!api|_next|_static|_vercel|.*\\..*).*)',
],
};
The Lesson:
If your middleware runs on file routes, it might try to rewrite them to tenant paths (e.g., /tenant-slugs/.../robots.txt), which don't exist. Excluding files from middleware ensures they hit the App Router handlers directly.
Conclusion
By moving away from static files and leveraging Next.js Route Handlers, we've created a SEO infrastructure that allows:
- Automatic Sitemaps per tenant.
- Smart Indexing Rules that protect against AI scraping.
- Zero Maintenance when onboarding new tenants.
Let me know if you have questions!
Thanks, Matija


