- Payload CMS Large Imports: Fast Transactions & Drizzle
Payload CMS Large Imports: Fast Transactions & Drizzle
Use payload.db, PostgreSQL transactions, or raw Drizzle for safe, batched Payload imports and idempotent upserts

Need Help Making the Switch?
Moving to Next.js and Payload CMS? I offer advisory support on an hourly basis.
Book Hourly AdvisoryRelated Posts:
If you need to import tens of thousands of rows into Payload without taking the system down, there are two fast paths: payload.db.* direct database methods and raw Drizzle ORM via the Postgres adapter. Both bypass the Local API's hook and validation pipeline, which is exactly what you want for bulk writes. The key to keeping imports safe is PostgreSQL transactions — small batches committed often — so a failure mid-import rolls back cleanly and you can retry from a checkpoint instead of starting over.
This article covers how both approaches work, when to pick one over the other, and the production gotchas that only show up once you run it for real. I recently ran this pattern to seed an apparel demo catalog: 23 products and 290 variants inserted in 2.78 seconds using payload.db.create() with short committed batches. The full seeding script is included throughout.
What I was building
I needed a repeatable seed script for a Payload-powered apparel store: 23 products across 6 categories and 6 collections, each with color and size variants. The script had to be idempotent — run it twice and get the same result without duplicates — and fast enough that developers wouldn't wait on it between resets.
The Local API was the obvious first choice. It worked in development but was the wrong tool for the job: every document went through the full Payload lifecycle — hooks, access control, validation, re-fetching the document after save. For a single document that overhead is fine. For 290 variants in a loop, it turns a 3-second job into a minutes-long one.
What transactions are and why they matter here
A PostgreSQL transaction is a unit of work that either commits entirely or rolls back entirely. When you write to the database inside a transaction, the changes are invisible to other connections until you commit. If anything goes wrong before the commit — an error, a crash, a constraint violation — the entire batch reverts automatically.
For large imports, transactions give you two things. First, atomicity within a batch: a partial write never lands in the database. Second, a recovery point: if you checkpoint after each committed batch, a failure only loses the batch in progress, not everything before it.
The pattern that works is small batches committed often. A transaction holding 10,000 rows open is risky: it holds locks longer, consumes memory, and a failure loses the whole thing. A transaction holding 100–500 rows committed in a loop is safe and fast. The overhead of opening and closing a transaction is negligible compared to the risk reduction.
Three ways to write to Payload — and which one to use for imports
Payload gives you three layers to write data.
The Local API (payload.create, payload.update) runs the full document lifecycle: access control, validation, hooks, re-fetching. It is the right default for application logic but the wrong tool for bulk imports. Every row triggers the same overhead as a web request.
payload.db.* is Payload's direct database layer. It skips hooks, access control, and validation and writes straight to the adapter. It still lives inside Payload's abstraction, so it works correctly with Payload's transaction model (beginTransaction, commitTransaction, rollbackTransaction). This is the right default for production imports into Payload collections.
Raw Drizzle is the Postgres adapter's underlying ORM, exposed directly via payload.db.drizzle. It gives you full SQL expressiveness: bulk upserts, set-based updates, join-heavy inserts, conflict handling. The trade-off is that you are working outside Payload's abstraction entirely and need to know the generated table schema. This is the right tool when a single SQL statement can replace thousands of row-by-row operations.
| Approach | Hooks & validation | Transaction model | Best for |
|---|---|---|---|
| Local API | Yes | Auto-managed | Application logic, single documents |
payload.db.* | No | Manual via beginTransaction | Production bulk imports into Payload collections |
| Raw Drizzle | No | Drizzle tx callback | Set-based bulk upserts, join-heavy writes, conflict handling |
The hidden requirement for versioned collections
This is the part that catches almost everyone. If your collection has versions: { drafts: true } — which the Payload ecommerce plugin sets automatically via autosave: true — the Payload admin does not query the main products table for the list view. It calls queryDrafts(), which queries _products_v WHERE latest = true.
Raw inserts into products only → the admin shows 0 results, even though the data is there.
The reason payload.find() via the Local API returns results while the admin shows nothing is that the Local API hits the main table directly. This masks the bug during debugging: your script seems to work because payload.find() returns rows, but anyone opening the admin panel sees an empty collection.
The fix is to call payload.db.createVersion() after every payload.db.create() call on a versioned collection. It inserts a row into _products_v with latest = true, which is what the admin queries.
const productDoc = await payload.db.create({
collection: 'products',
data: productData,
req: { transactionID } as any,
});
await payload.db.createVersion({
collectionSlug: 'products',
parent: productDoc.id as number,
versionData: productDoc,
autosave: false,
createdAt: new Date().toISOString(),
updatedAt: new Date().toISOString(),
req: { transactionID } as any,
});
The required fields are collectionSlug, parent (the document ID from payload.db.create()), versionData (the full created document), and createdAt / updatedAt. Pass the same transactionID in req so the version entry lands in the same transaction as the document.
This applies to every versioned collection in your script — products, variants, or any other collection with drafts enabled. Skipping it on variants means variant rows land in _variants_v without a latest = true entry and become invisible in the admin.
The two-phase transaction pattern
The seeding script uses two separate phases because two separate transaction systems are involved, and mixing them is unreliable.
The raw pg client (payload.db.pool.connect()) is a direct PostgreSQL connection. It handles idempotent setup writes — upsert categories, upsert collections, ensure variant types and options exist. These writes use ON CONFLICT DO UPDATE, so running them twice is safe. No Payload transaction is needed here because every individual write is already idempotent.
payload.db.beginTransaction() is Payload's own transaction primitive. It wraps the product and variant inserts in a single atomic unit. A failure anywhere in the product loop rolls back all product and variant rows from that batch cleanly.
The key constraint: do not mix these two systems in the same atomic unit. The raw pg client connection and Payload's transactionID model are separate transaction propagation mechanisms. Mixing them produces undefined behavior in most Payload versions.
The structure looks like this:
// Phase 1: idempotent setup via raw SQL
const client = await payload.db.pool.connect();
try {
await upsertCategory(client, category);
await upsertCollection(client, collection);
await ensureVariantType(client, 'size', 'Size');
await ensureVariantType(client, 'color', 'Color');
// ... variant options
} finally {
client.release(); // always release in finally
}
// Phase 2: product/variant inserts via payload.db
const transactionID = await payload.db.beginTransaction();
try {
for (const product of products) {
const productDoc = await payload.db.create({ /* ... */ });
await payload.db.createVersion({ /* ... */ });
for (const variant of variants) {
const variantDoc = await payload.db.create({ /* ... */ });
await payload.db.createVersion({ /* ... */ });
}
}
await payload.db.commitTransaction(transactionID);
} catch (error) {
await payload.db.rollbackTransaction(transactionID!);
throw error;
} finally {
await payload.db.destroy?.();
}
Release the pg client in finally so it always returns to the pool, even on error. Call payload.db.destroy?.() in the Phase 2 finally block rather than after the try-catch, for the same reason.
Importing with payload.db.* and PostgreSQL transactions
This is the recommended default for most Payload import jobs. You get direct-to-database performance, Payload's transaction primitives, and no need to manage the generated schema yourself.
// File: src/scripts/import-products.ts
import { getPayload } from 'payload'
import config from '@payload-config'
type SourceRow = {
externalId: string
title: string
slug: string
tags: string[]
}
async function importBatch(rows: SourceRow[]): Promise<void> {
const payload = await getPayload({ config })
const transactionID = await payload.db.beginTransaction()
try {
for (const row of rows) {
const doc = await payload.db.create({
collection: 'products',
data: {
externalId: row.externalId,
title: row.title,
slug: row.slug,
// Array fields need explicit IDs — payload.db.create() does not auto-generate them
tags: row.tags.map((tag) => ({ id: crypto.randomUUID(), tag })),
},
req: { transactionID } as any,
})
// Required for versioned collections — without this the admin shows nothing
await payload.db.createVersion({
collectionSlug: 'products',
parent: doc.id as number,
versionData: doc,
autosave: false,
createdAt: new Date().toISOString(),
updatedAt: new Date().toISOString(),
req: { transactionID } as any,
})
}
await payload.db.commitTransaction(transactionID)
} catch (err) {
await payload.db.rollbackTransaction(transactionID)
throw err
}
}
async function runImport(source: SourceRow[]): Promise<void> {
const BATCH_SIZE = 250
for (let i = 0; i < source.length; i += BATCH_SIZE) {
const batch = source.slice(i, i + BATCH_SIZE)
await importBatch(batch)
console.log(`Committed batch ${i / BATCH_SIZE + 1} — rows ${i + 1}–${i + batch.length}`)
}
}
A few things worth understanding in this pattern.
payload.db.beginTransaction() returns a transaction ID string. You pass that ID into req: { transactionID } on each create and createVersion call, which tells Payload's adapter to run the write inside the open transaction. After the loop, commitTransaction flushes everything. If any write throws, rollbackTransaction clears the whole batch cleanly.
The tags field uses crypto.randomUUID() on each item. The full Local API (payload.create()) runs field normalization that auto-generates IDs for array items — payload.db.create() skips this step entirely. If your collection has array fields with a varchar primary key on each item, you must provide those IDs yourself. Without them, the insert fails with a primary key violation.
The runImport function slices the source into 250-row batches. That number is a starting point — tune it based on row size and relation complexity. Wider rows with many relations should use smaller batches. Simple flat rows can go higher.
Importing with raw Drizzle
Raw Drizzle is the better tool when your import logic is naturally set-based: upsert by external key, bulk-update from a staging table, resolve conflicts with onConflictDoUpdate. One SQL statement can replace thousands of per-row operations.
First, generate and import your schema:
npx payload generate:db-schema
This outputs a file — typically payload-generated-schema.ts — containing the Drizzle table definitions, relations, and enums derived from your Payload collections. Every time you add or change a collection field you need to re-run this command to keep the schema in sync.
Then write the import:
// File: src/scripts/import-drizzle.ts
import { getPayload } from 'payload'
import config from '@payload-config'
import { products } from './payload-generated-schema'
import { sql } from '@payloadcms/db-postgres/drizzle'
type SourceRow = {
externalId: string
title: string
slug: string
}
async function importBatchDrizzle(rows: SourceRow[]): Promise<void> {
const payload = await getPayload({ config })
await payload.db.drizzle.transaction(async (tx) => {
const values = rows.map((row) => ({
external_id: row.externalId,
title: row.title,
slug: row.slug,
}))
await tx
.insert(products)
.values(values)
.onConflictDoUpdate({
target: products.external_id,
set: {
title: sql`excluded.title`,
slug: sql`excluded.slug`,
},
})
})
}
This sends one insert statement per batch instead of one per row. The onConflictDoUpdate clause turns every batch into an idempotent upsert: rows that already exist update in place, new rows insert. That means you can replay a failed batch without creating duplicates.
The transaction callback here is Drizzle's own API — tx is a Drizzle transaction object, not a Payload transaction ID. Do not mix Payload's beginTransaction / req.transactionID model with Drizzle's tx callback in the same batch. They are separate transaction propagation mechanisms and mixing them is untested in most Payload versions.
Note that raw Drizzle writes do not create version entries — you are bypassing Payload's abstraction entirely. If the target collection uses drafts, you need to insert version rows manually via direct SQL into _collection_v.
Production gotchas
Always run migrations before seeding. After payload migrate:reset or any database wipe, run pnpm payload migrate before executing the seed script. The products table and its enum types are created by your migrations. Without them, any code that introspects the schema — like resolving enum values for status or _status columns — fails with an error like Could not find enum for products.status. The table doesn't exist yet.
Resolve enum values at runtime, not compile time. Payload generates PostgreSQL enums for fields like status and _status, but the exact enum labels can vary depending on your Payload version and collection configuration ("published", "active", "draft"). Hard-coding a value like "published" works until you upgrade Payload or rename a status option, at which point your seed script silently inserts a null or throws a constraint violation. The safer pattern is to query the database at the start of the script and resolve the correct enum value from what's actually defined:
async function resolveColumnValue(
client: { query: Function },
tableName: string,
columnName: string,
preferred: string[],
): Promise<string> {
const enumNameResult = await client.query(
`SELECT udt_name FROM information_schema.columns
WHERE table_schema = 'public' AND table_name = $1 AND column_name = $2 LIMIT 1`,
[tableName, columnName],
)
const enumName = enumNameResult.rows[0]?.udt_name
if (!enumName) throw new Error(`Could not find enum for ${tableName}.${columnName}`)
const enumValuesResult = await client.query(
`SELECT e.enumlabel as value FROM pg_type t
JOIN pg_enum e ON e.enumtypid = t.oid
WHERE t.typname = $1 ORDER BY e.enumsortorder`,
[enumName],
)
const values = enumValuesResult.rows.map((r: { value: string }) => r.value)
const resolved = preferred.find((candidate) => values.includes(candidate))
if (!resolved) throw new Error(`Could not resolve enum value for ${tableName}.${columnName}. Available: ${values.join(', ')}`)
return resolved
}
// Usage at the start of the script, before any inserts
const productStatus = await resolveColumnValue(client, 'products', 'status', ['active', 'published', 'draft'])
const productPublishStatus = await resolveColumnValue(client, 'products', '_status', ['published', 'active', 'draft'])
Relationship IDs use camelCase field names, not column names. payload.db.create() uses upsertRow internally, which maps Payload field names to database columns. Pass the Payload field name, not the database column name — category not category_id, collections not products_rels. Pass hasMany relationships as arrays of IDs:
const productData = {
category: categoryId, // maps to category_id column
collections: [id1, id2], // maps to products_rels (hasMany)
variantTypes: [colorTypeId, sizeTypeId], // same pattern
}
Call process.exit(0) explicitly. payload.db.destroy?.() does not always close all pool connections cleanly, which leaves the Node process hanging after the script finishes. The reliable fix is to call process.exit(0) after a successful seed and process.exit(1) after a failure:
main()
.then(() => process.exit(0))
.catch((error) => {
console.error(error)
process.exit(1)
})
When to use which
Use payload.db.* when you are inserting or updating Payload-managed documents one by one within a batch, your rows map directly to collection fields, and you do not need SQL-level conflict resolution. This covers most production import jobs. The 290-variant apparel seed ran in 2.78 seconds using this approach — fast enough that it slots cleanly into a development reset workflow.
Use raw Drizzle when you can express the batch as a single SQL statement — a bulk upsert by external key, an update from a joined staging table, or a set-based deduplication pass. The gain is real: one statement replacing 500 round-trips is a meaningful difference. Row-by-row loops in raw Drizzle without set-based SQL offer no meaningful advantage over payload.db.* and add the burden of managing the generated schema yourself, plus manual version entry creation for draft collections.
In both cases, preprocess and validate data outside the transaction. Keep transactions short — open, write, commit, move on. Record a checkpoint after each successful commit so a failure only costs you the current batch.
FAQ
Can I use the Local API for imports if I run hooks later?
You can, but it is usually the wrong shape. The Local API runs validation, access control, and all hooks synchronously on every document. For hundreds or thousands of rows that overhead compounds quickly. A cleaner approach is to import directly with payload.db.* or Drizzle, then run any post-import hook logic as a separate background task using Payload Jobs.
How large should batches be?
Start at 100–250 rows and tune from there. Smaller batches reduce the blast radius of a failure, are easier to checkpoint, and hold locks for less time. Larger batches reduce transaction overhead. For flat collections with no relations, 500–1000 rows per batch is often fine. For collections with array fields or nested relations, keep batches smaller.
What happens if I forget to call rollbackTransaction after an error?
PostgreSQL will eventually roll back the transaction automatically when the connection closes or times out, but you should not rely on that. Always call payload.db.rollbackTransaction(transactionID) in a catch block. Leaving open transactions is a common source of lock contention in production.
Do direct DB writes fire Payload hooks?
No. Both payload.db.* and raw Drizzle bypass the Payload document lifecycle entirely. That means beforeChange, afterChange, beforeOperation, and afterOperation hooks do not run. If your collection relies on hooks to derive fields, generate slugs, or sync related documents, you need to handle that logic separately in your import pipeline.
Can I read back an inserted row inside the same transaction?
With payload.db.*, yes — the created document is returned directly from payload.db.create() and you can use it immediately (for example, to pass the product ID to variant inserts within the same loop). With raw Drizzle, you can also query inside the same tx callback and the write will be visible within the transaction. This is useful for resolving foreign key IDs during the import.
Why does my admin show 0 results after a successful seed?
Your collection almost certainly has versions: { drafts: true } enabled. The Payload admin queries _products_v WHERE latest = true for the list view — it does not query the main products table. Add payload.db.createVersion() after each payload.db.create() call, passing the created document as versionData. The admin will show results immediately after.
Conclusion
For large Payload imports, payload.db.* with short committed batches covers most production scenarios: it is fast, stays inside Payload's adapter model, and gives you clean transaction control. The 290-variant apparel seed ran in 2.78 seconds using this approach with a loop of payload.db.create() + payload.db.createVersion() per document.
The two critical additions over a basic payload.db.* loop are version entries for draft collections and explicit IDs for array fields — both are things the Local API handles automatically but payload.db.* skips entirely. Miss either one and you get data in the database that is either invisible in the admin or fails on insert.
Raw Drizzle via payload.db.drizzle is the right escalation when your logic is genuinely set-based — one SQL statement doing what thousands of row writes would otherwise do. Keep batches small, checkpoint after every commit, validate data before the transaction opens, and use the two-phase pattern to keep idempotent setup writes separate from your atomic import transaction.
Let me know in the comments if you have questions, and subscribe for more practical development guides.
Thanks, Matija
📚 Comprehensive Payload CMS Guides
Detailed Payload guides with field configuration examples, custom components, and workflow optimization tips to speed up your CMS development process.
Frequently Asked Questions
Comments
No comments yet
Be the first to share your thoughts on this post!


