In-depth Next.js guides covering App Router, RSC, ISR, and deployment. Get code examples, optimization checklists, and prompts to accelerate development.
I recently tackled a common challenge in multi-tenant architectures: how to serve unique robots.txt and sitemap.xml files for different domains running on the same application. A static file in /public just doesn't cut it when Tenant A needs to block AI bots while Tenant B wants full indexing.
This guide walks through the robust, cached solution I implemented using Next.js App Router and Payload CMS.
1. The Core Utility: Centralized Tenant Lookups
The first step was to stop repeating ourselves. We needed a single, reliable way to resolve the current tenant from the hostname—whether it's a custom domain (example.com) or a subdomain (tenant.app.com).
I centralized this logic in src/payload/db/index.ts using unstable_cache to keep performance high. This function is the backbone of our SEO strategy.
Why this matters:
This function handles the heavy lifting of database queries and caching. By centralizing it, we ensure that robots.txt, sitemap.xml, and humans.txt all "agree" on which tenant is active.
2. Dynamic Robots.txt with AI Protection
With the tenant lookup in place, I created a dynamic route handler for robots.txt. This isn't just a static file anymore; it's code. This allows us to inject the correct sitemap URL for the specific tenant and apply global rules, like blocking AI scrapers.
typescript
// File: src/app/robots.tsimporttype { MetadataRoute } from"next";
import { headers } from"next/headers";
import { getTenantByDomain } from"@/payload/db";
exportdefaultasyncfunctionrobots(): Promise<MetadataRoute.Robots> {
// Get hostname from request headersconst hostname = (awaitheaders()).get('host') || 'www.adart.com';
// Try to find tenant by domain or subdomainconst tenant = awaitgetTenantByDomain(hostname);
// If no tenant found, use fallback (adart)const baseUrl = tenant?.domain ? \`https://\${tenant.domain}\` : \`https://\${hostname}\`;
return {
rules: [
// Block AI Scraping Bots
{
userAgent: ["GPTBot", "CCBot", "Google-Extended"],
disallow: ["/"],
},
// Standard bots
{
userAgent: "*",
allow: "/",
disallow: [
"/admin",
"/api",
],
crawlDelay: 1,
},
],
sitemap: \`\${baseUrl}/sitemap.xml\`,
host: baseUrl,
};
}
Key Features:
Dynamic Host: The sitemap link automatically matches the visitor's domain.
AI Blocking: explicit blocks for GPTBot, CCBot, and Google-Extended protecting our content intelligence.
3. Dynamic Humans.txt
To give credit where it's due, I also implemented a humans.txt endpoint. This is a nice touch that adds personality and transparency to the site, dynamically acknowledging the specific tenant.
typescript
// File: src/app/humans.tsimport { headers } from"next/headers";
import { getTenantByDomain } from"@/payload/db";
exportdefaultasyncfunctionhumans() {
const hostname = (awaitheaders()).get('host') || '';
const tenant = awaitgetTenantByDomain(hostname);
const tenantName = tenant?.name || 'Ad Art';
const content = \`/* TEAM */
Site built by: Ad Art Team
For: \${tenantName}
/* SITE */
Standards: HTML5, CSS3, TypeScript
Components: Payload CMS, Next.js\`;
return new Response(content, {
headers: { 'Content-Type': 'text/plain; charset=utf-8' },
});
}
4. The Critical Fix: Middleware Matcher
This was the tricky part. Even with the files in place, robots.txt was returning a 404.
The culprit was src/middleware.ts. The matcher regex was swallowing requests to files if they didn't match specific patterns. I updated the negative lookahead to explicitly exclude any path with a file extension (like .txt or .xml).
The Lesson:
If your middleware runs on file routes, it might try to rewrite them to tenant paths (e.g., /tenant-slugs/.../robots.txt), which don't exist. Excluding files from middleware ensures they hit the App Router handlers directly.
Conclusion
By moving away from static files and leveraging Next.js Route Handlers, we've created a SEO infrastructure that allows:
Automatic Sitemaps per tenant.
Smart Indexing Rules that protect against AI scraping.