---
title: "Payload CMS Large Imports: Fast Transactions & Drizzle"
slug: "payload-cms-large-data-imports-transactions-drizzle"
published: "2026-03-26"
updated: "2026-04-06"
categories:
  - "Payload"
tags:
  - "Payload CMS imports"
  - "payload.db"
  - "Drizzle ORM"
  - "PostgreSQL transactions"
  - "bulk import Payload"
  - "batched imports"
  - "idempotent upsert"
  - "generate:db-schema"
  - "payload-generated-schema"
  - "payload.db.drizzle"
llm-intent: "reference"
audience-level: "intermediate"
framework-versions:
  - "payload@2"
  - "drizzle-orm@2"
  - "postgres@15"
  - "typescript@5"
status: "stable"
llm-purpose: "Payload CMS large imports: learn payload.db and Drizzle patterns with PostgreSQL transactions to import tens of thousands safely—batch, checkpoint, and…"
llm-prereqs:
  - "Access to Payload CMS"
  - "Access to payload.db"
  - "Access to payload.db.drizzle"
  - "Access to Drizzle ORM"
  - "Access to Postgres / PostgreSQL"
llm-outputs:
  - "Completed outcome: Payload CMS large imports: learn payload.db and Drizzle patterns with PostgreSQL transactions to import tens of thousands safely—batch, checkpoint, and…"
---

**Summary Triples**
- (payload.db direct methods, bypass, Payload Local API lifecycle (hooks, access control, validation) to achieve faster bulk writes)
- (raw Drizzle ORM with Postgres adapter, bypass, Payload Local API pipeline and enable raw SQL-level operations and typed queries)
- (PostgreSQL transactions, provide, atomic batch commits and rollback safety for large imports)
- (Batching strategy, uses, small, repeated transactions (e.g., 500–2000 rows) and checkpoints to limit data loss and retry scope)
- (Checkpoints, store, last committed offset or identifier externally (file, table, or job state) so imports can resume from the last successful batch)
- (Idempotent upserts, implemented with, Postgres ON CONFLICT or Drizzle upsert patterns using a unique key (external id or unique constraint))
- (npx payload generate:db-schema, produces, a Drizzle-compatible DB schema generated from Payload collections)
- (payload.db.drizzle, exposes, the generated schema and helpers to integrate Drizzle with Payload)
- (Local API per-document lifecycle, creates, significant overhead (hooks, re-fetch, validation) and is a bottleneck for tens of thousands of records)
- (When to pick payload.db, recommended for, simpler import scripts that need fast CRUD bypassing lifecycle without complex SQL)
- (When to pick raw Drizzle, recommended for, complex queries, joins, native upserts, and highly optimized SQL with typed schema)

### {GOAL}
Payload CMS large imports: learn payload.db and Drizzle patterns with PostgreSQL transactions to import tens of thousands safely—batch, checkpoint, and…

### {PREREQS}
- Access to Payload CMS
- Access to payload.db
- Access to payload.db.drizzle
- Access to Drizzle ORM
- Access to Postgres / PostgreSQL

### {STEPS}
1. Understand PostgreSQL transactions
2. Choose the right write layer
3. Generate the Drizzle schema
4. Implement payload.db.* batched import
5. Implement raw Drizzle bulk upserts
6. Tune batch size and checkpoints
7. Handle hooks and post-import tasks

<!-- llm:goal="Payload CMS large imports: learn payload.db and Drizzle patterns with PostgreSQL transactions to import tens of thousands safely—batch, checkpoint, and…" -->
<!-- llm:prereq="Access to Payload CMS" -->
<!-- llm:prereq="Access to payload.db" -->
<!-- llm:prereq="Access to payload.db.drizzle" -->
<!-- llm:prereq="Access to Drizzle ORM" -->
<!-- llm:prereq="Access to Postgres / PostgreSQL" -->
<!-- llm:output="Completed outcome: Payload CMS large imports: learn payload.db and Drizzle patterns with PostgreSQL transactions to import tens of thousands safely—batch, checkpoint, and…" -->

# Payload CMS Large Imports: Fast Transactions & Drizzle
> Payload CMS large imports: learn payload.db and Drizzle patterns with PostgreSQL transactions to import tens of thousands safely—batch, checkpoint, and…
Matija Žiberna · 2026-03-26

If you need to import tens of thousands of rows into Payload without taking the system down, there are two fast paths: `payload.db.*` direct database methods and raw Drizzle ORM via the Postgres adapter. Both bypass the Local API's hook and validation pipeline, which is exactly what you want for bulk writes. The key to keeping imports safe is PostgreSQL transactions — small batches committed often — so a failure mid-import rolls back cleanly and you can retry from a checkpoint instead of starting over.

This article covers how both approaches work, when to pick one over the other, and the production gotchas that only show up once you run it for real. I recently ran this pattern to seed an apparel demo catalog: 23 products and 290 variants inserted in **2.78 seconds** using `payload.db.create()` with short committed batches. The full seeding script is included throughout.

## What I was building

I needed a repeatable seed script for a Payload-powered apparel store: 23 products across 6 categories and 6 collections, each with color and size variants. The script had to be idempotent — run it twice and get the same result without duplicates — and fast enough that developers wouldn't wait on it between resets.

The Local API was the obvious first choice. It worked in development but was the wrong tool for the job: every document went through the full Payload lifecycle — hooks, access control, validation, re-fetching the document after save. For a single document that overhead is fine. For 290 variants in a loop, it turns a 3-second job into a minutes-long one.

## What transactions are and why they matter here

A PostgreSQL transaction is a unit of work that either commits entirely or rolls back entirely. When you write to the database inside a transaction, the changes are invisible to other connections until you commit. If anything goes wrong before the commit — an error, a crash, a constraint violation — the entire batch reverts automatically.

For large imports, transactions give you two things. First, atomicity within a batch: a partial write never lands in the database. Second, a recovery point: if you checkpoint after each committed batch, a failure only loses the batch in progress, not everything before it.

The pattern that works is small batches committed often. A transaction holding 10,000 rows open is risky: it holds locks longer, consumes memory, and a failure loses the whole thing. A transaction holding 100–500 rows committed in a loop is safe and fast. The overhead of opening and closing a transaction is negligible compared to the risk reduction.

## Three ways to write to Payload — and which one to use for imports

Payload gives you three layers to write data.

The **Local API** (`payload.create`, `payload.update`) runs the full document lifecycle: access control, validation, hooks, re-fetching. It is the right default for application logic but the wrong tool for bulk imports. Every row triggers the same overhead as a web request.

**`payload.db.*`** is Payload's direct database layer. It skips hooks, access control, and validation and writes straight to the adapter. It still lives inside Payload's abstraction, so it works correctly with Payload's transaction model (`beginTransaction`, `commitTransaction`, `rollbackTransaction`). This is the right default for production imports into Payload collections.

**Raw Drizzle** is the Postgres adapter's underlying ORM, exposed directly via `payload.db.drizzle`. It gives you full SQL expressiveness: bulk upserts, set-based updates, join-heavy inserts, conflict handling. The trade-off is that you are working outside Payload's abstraction entirely and need to know the generated table schema. This is the right tool when a single SQL statement can replace thousands of row-by-row operations.

| Approach | Hooks & validation | Transaction model | Best for |
|---|---|---|---|
| Local API | Yes | Auto-managed | Application logic, single documents |
| `payload.db.*` | No | Manual via `beginTransaction` | Production bulk imports into Payload collections |
| Raw Drizzle | No | Drizzle `tx` callback | Set-based bulk upserts, join-heavy writes, conflict handling |

## The hidden requirement for versioned collections

This is the part that catches almost everyone. If your collection has `versions: { drafts: true }` — which the Payload ecommerce plugin sets automatically via `autosave: true` — the Payload admin does not query the main `products` table for the list view. It calls `queryDrafts()`, which queries `_products_v WHERE latest = true`.

Raw inserts into `products` only → the admin shows 0 results, even though the data is there.

The reason `payload.find()` via the Local API returns results while the admin shows nothing is that the Local API hits the main table directly. This masks the bug during debugging: your script seems to work because `payload.find()` returns rows, but anyone opening the admin panel sees an empty collection.

The fix is to call `payload.db.createVersion()` after every `payload.db.create()` call on a versioned collection. It inserts a row into `_products_v` with `latest = true`, which is what the admin queries.

```typescript
const productDoc = await payload.db.create({
  collection: 'products',
  data: productData,
  req: { transactionID } as any,
});

await payload.db.createVersion({
  collectionSlug: 'products',
  parent: productDoc.id as number,
  versionData: productDoc,
  autosave: false,
  createdAt: new Date().toISOString(),
  updatedAt: new Date().toISOString(),
  req: { transactionID } as any,
});
```

The required fields are `collectionSlug`, `parent` (the document ID from `payload.db.create()`), `versionData` (the full created document), and `createdAt` / `updatedAt`. Pass the same `transactionID` in `req` so the version entry lands in the same transaction as the document.

This applies to every versioned collection in your script — products, variants, or any other collection with drafts enabled. Skipping it on variants means variant rows land in `_variants_v` without a `latest = true` entry and become invisible in the admin.

## The two-phase transaction pattern

The seeding script uses two separate phases because two separate transaction systems are involved, and mixing them is unreliable.

The **raw pg client** (`payload.db.pool.connect()`) is a direct PostgreSQL connection. It handles idempotent setup writes — upsert categories, upsert collections, ensure variant types and options exist. These writes use `ON CONFLICT DO UPDATE`, so running them twice is safe. No Payload transaction is needed here because every individual write is already idempotent.

**`payload.db.beginTransaction()`** is Payload's own transaction primitive. It wraps the product and variant inserts in a single atomic unit. A failure anywhere in the product loop rolls back all product and variant rows from that batch cleanly.

The key constraint: do not mix these two systems in the same atomic unit. The raw pg client connection and Payload's `transactionID` model are separate transaction propagation mechanisms. Mixing them produces undefined behavior in most Payload versions.

The structure looks like this:

```typescript
// Phase 1: idempotent setup via raw SQL
const client = await payload.db.pool.connect();
try {
  await upsertCategory(client, category);
  await upsertCollection(client, collection);
  await ensureVariantType(client, 'size', 'Size');
  await ensureVariantType(client, 'color', 'Color');
  // ... variant options
} finally {
  client.release(); // always release in finally
}

// Phase 2: product/variant inserts via payload.db
const transactionID = await payload.db.beginTransaction();
try {
  for (const product of products) {
    const productDoc = await payload.db.create({ /* ... */ });
    await payload.db.createVersion({ /* ... */ });

    for (const variant of variants) {
      const variantDoc = await payload.db.create({ /* ... */ });
      await payload.db.createVersion({ /* ... */ });
    }
  }
  await payload.db.commitTransaction(transactionID);
} catch (error) {
  await payload.db.rollbackTransaction(transactionID!);
  throw error;
} finally {
  await payload.db.destroy?.();
}
```

Release the pg client in `finally` so it always returns to the pool, even on error. Call `payload.db.destroy?.()` in the Phase 2 `finally` block rather than after the try-catch, for the same reason.

## Importing with payload.db.* and PostgreSQL transactions

This is the recommended default for most Payload import jobs. You get direct-to-database performance, Payload's transaction primitives, and no need to manage the generated schema yourself.

```typescript
// File: src/scripts/import-products.ts
import { getPayload } from 'payload'
import config from '@payload-config'

type SourceRow = {
  externalId: string
  title: string
  slug: string
  tags: string[]
}

async function importBatch(rows: SourceRow[]): Promise<void> {
  const payload = await getPayload({ config })

  const transactionID = await payload.db.beginTransaction()

  try {
    for (const row of rows) {
      const doc = await payload.db.create({
        collection: 'products',
        data: {
          externalId: row.externalId,
          title: row.title,
          slug: row.slug,
          // Array fields need explicit IDs — payload.db.create() does not auto-generate them
          tags: row.tags.map((tag) => ({ id: crypto.randomUUID(), tag })),
        },
        req: { transactionID } as any,
      })

      // Required for versioned collections — without this the admin shows nothing
      await payload.db.createVersion({
        collectionSlug: 'products',
        parent: doc.id as number,
        versionData: doc,
        autosave: false,
        createdAt: new Date().toISOString(),
        updatedAt: new Date().toISOString(),
        req: { transactionID } as any,
      })
    }

    await payload.db.commitTransaction(transactionID)
  } catch (err) {
    await payload.db.rollbackTransaction(transactionID)
    throw err
  }
}

async function runImport(source: SourceRow[]): Promise<void> {
  const BATCH_SIZE = 250

  for (let i = 0; i < source.length; i += BATCH_SIZE) {
    const batch = source.slice(i, i + BATCH_SIZE)
    await importBatch(batch)
    console.log(`Committed batch ${i / BATCH_SIZE + 1} — rows ${i + 1}–${i + batch.length}`)
  }
}
```

A few things worth understanding in this pattern.

`payload.db.beginTransaction()` returns a transaction ID string. You pass that ID into `req: { transactionID }` on each `create` and `createVersion` call, which tells Payload's adapter to run the write inside the open transaction. After the loop, `commitTransaction` flushes everything. If any write throws, `rollbackTransaction` clears the whole batch cleanly.

The `tags` field uses `crypto.randomUUID()` on each item. The full Local API (`payload.create()`) runs field normalization that auto-generates IDs for array items — `payload.db.create()` skips this step entirely. If your collection has array fields with a varchar primary key on each item, you must provide those IDs yourself. Without them, the insert fails with a primary key violation.

The `runImport` function slices the source into 250-row batches. That number is a starting point — tune it based on row size and relation complexity. Wider rows with many relations should use smaller batches. Simple flat rows can go higher.

## Importing with raw Drizzle

Raw Drizzle is the better tool when your import logic is naturally set-based: upsert by external key, bulk-update from a staging table, resolve conflicts with `onConflictDoUpdate`. One SQL statement can replace thousands of per-row operations.

First, generate and import your schema:

```bash
npx payload generate:db-schema
```

This outputs a file — typically `payload-generated-schema.ts` — containing the Drizzle table definitions, relations, and enums derived from your Payload collections. Every time you add or change a collection field you need to re-run this command to keep the schema in sync.

Then write the import:

```typescript
// File: src/scripts/import-drizzle.ts
import { getPayload } from 'payload'
import config from '@payload-config'
import { products } from './payload-generated-schema'
import { sql } from '@payloadcms/db-postgres/drizzle'

type SourceRow = {
  externalId: string
  title: string
  slug: string
}

async function importBatchDrizzle(rows: SourceRow[]): Promise<void> {
  const payload = await getPayload({ config })

  await payload.db.drizzle.transaction(async (tx) => {
    const values = rows.map((row) => ({
      external_id: row.externalId,
      title: row.title,
      slug: row.slug,
    }))

    await tx
      .insert(products)
      .values(values)
      .onConflictDoUpdate({
        target: products.external_id,
        set: {
          title: sql`excluded.title`,
          slug: sql`excluded.slug`,
        },
      })
  })
}
```

This sends one insert statement per batch instead of one per row. The `onConflictDoUpdate` clause turns every batch into an idempotent upsert: rows that already exist update in place, new rows insert. That means you can replay a failed batch without creating duplicates.

The transaction callback here is Drizzle's own API — `tx` is a Drizzle transaction object, not a Payload transaction ID. Do not mix Payload's `beginTransaction` / `req.transactionID` model with Drizzle's `tx` callback in the same batch. They are separate transaction propagation mechanisms and mixing them is untested in most Payload versions.

Note that raw Drizzle writes do not create version entries — you are bypassing Payload's abstraction entirely. If the target collection uses drafts, you need to insert version rows manually via direct SQL into `_collection_v`.

## Production gotchas

**Always run migrations before seeding.** After `payload migrate:reset` or any database wipe, run `pnpm payload migrate` before executing the seed script. The products table and its enum types are created by your migrations. Without them, any code that introspects the schema — like resolving enum values for `status` or `_status` columns — fails with an error like `Could not find enum for products.status`. The table doesn't exist yet.

**Resolve enum values at runtime, not compile time.** Payload generates PostgreSQL enums for fields like `status` and `_status`, but the exact enum labels can vary depending on your Payload version and collection configuration (`"published"`, `"active"`, `"draft"`). Hard-coding a value like `"published"` works until you upgrade Payload or rename a status option, at which point your seed script silently inserts a null or throws a constraint violation. The safer pattern is to query the database at the start of the script and resolve the correct enum value from what's actually defined:

```typescript
async function resolveColumnValue(
  client: { query: Function },
  tableName: string,
  columnName: string,
  preferred: string[],
): Promise<string> {
  const enumNameResult = await client.query(
    `SELECT udt_name FROM information_schema.columns
     WHERE table_schema = 'public' AND table_name = $1 AND column_name = $2 LIMIT 1`,
    [tableName, columnName],
  )
  const enumName = enumNameResult.rows[0]?.udt_name
  if (!enumName) throw new Error(`Could not find enum for ${tableName}.${columnName}`)

  const enumValuesResult = await client.query(
    `SELECT e.enumlabel as value FROM pg_type t
     JOIN pg_enum e ON e.enumtypid = t.oid
     WHERE t.typname = $1 ORDER BY e.enumsortorder`,
    [enumName],
  )
  const values = enumValuesResult.rows.map((r: { value: string }) => r.value)
  const resolved = preferred.find((candidate) => values.includes(candidate))
  if (!resolved) throw new Error(`Could not resolve enum value for ${tableName}.${columnName}. Available: ${values.join(', ')}`)
  return resolved
}

// Usage at the start of the script, before any inserts
const productStatus = await resolveColumnValue(client, 'products', 'status', ['active', 'published', 'draft'])
const productPublishStatus = await resolveColumnValue(client, 'products', '_status', ['published', 'active', 'draft'])
```

**Relationship IDs use camelCase field names, not column names.** `payload.db.create()` uses `upsertRow` internally, which maps Payload field names to database columns. Pass the Payload field name, not the database column name — `category` not `category_id`, `collections` not `products_rels`. Pass hasMany relationships as arrays of IDs:

```typescript
const productData = {
  category: categoryId,       // maps to category_id column
  collections: [id1, id2],   // maps to products_rels (hasMany)
  variantTypes: [colorTypeId, sizeTypeId], // same pattern
}
```

**Call `process.exit(0)` explicitly.** `payload.db.destroy?.()` does not always close all pool connections cleanly, which leaves the Node process hanging after the script finishes. The reliable fix is to call `process.exit(0)` after a successful seed and `process.exit(1)` after a failure:

```typescript
main()
  .then(() => process.exit(0))
  .catch((error) => {
    console.error(error)
    process.exit(1)
  })
```

## When to use which

Use `payload.db.*` when you are inserting or updating Payload-managed documents one by one within a batch, your rows map directly to collection fields, and you do not need SQL-level conflict resolution. This covers most production import jobs. The 290-variant apparel seed ran in 2.78 seconds using this approach — fast enough that it slots cleanly into a development reset workflow.

Use raw Drizzle when you can express the batch as a single SQL statement — a bulk upsert by external key, an update from a joined staging table, or a set-based deduplication pass. The gain is real: one statement replacing 500 round-trips is a meaningful difference. Row-by-row loops in raw Drizzle without set-based SQL offer no meaningful advantage over `payload.db.*` and add the burden of managing the generated schema yourself, plus manual version entry creation for draft collections.

In both cases, preprocess and validate data outside the transaction. Keep transactions short — open, write, commit, move on. Record a checkpoint after each successful commit so a failure only costs you the current batch.

## FAQ

**Can I use the Local API for imports if I run hooks later?**

You can, but it is usually the wrong shape. The Local API runs validation, access control, and all hooks synchronously on every document. For hundreds or thousands of rows that overhead compounds quickly. A cleaner approach is to import directly with `payload.db.*` or Drizzle, then run any post-import hook logic as a separate background task using Payload Jobs.

**How large should batches be?**

Start at 100–250 rows and tune from there. Smaller batches reduce the blast radius of a failure, are easier to checkpoint, and hold locks for less time. Larger batches reduce transaction overhead. For flat collections with no relations, 500–1000 rows per batch is often fine. For collections with array fields or nested relations, keep batches smaller.

**What happens if I forget to call rollbackTransaction after an error?**

PostgreSQL will eventually roll back the transaction automatically when the connection closes or times out, but you should not rely on that. Always call `payload.db.rollbackTransaction(transactionID)` in a catch block. Leaving open transactions is a common source of lock contention in production.

**Do direct DB writes fire Payload hooks?**

No. Both `payload.db.*` and raw Drizzle bypass the Payload document lifecycle entirely. That means `beforeChange`, `afterChange`, `beforeOperation`, and `afterOperation` hooks do not run. If your collection relies on hooks to derive fields, generate slugs, or sync related documents, you need to handle that logic separately in your import pipeline.

**Can I read back an inserted row inside the same transaction?**

With `payload.db.*`, yes — the created document is returned directly from `payload.db.create()` and you can use it immediately (for example, to pass the product ID to variant inserts within the same loop). With raw Drizzle, you can also query inside the same `tx` callback and the write will be visible within the transaction. This is useful for resolving foreign key IDs during the import.

**Why does my admin show 0 results after a successful seed?**

Your collection almost certainly has `versions: { drafts: true }` enabled. The Payload admin queries `_products_v WHERE latest = true` for the list view — it does not query the main `products` table. Add `payload.db.createVersion()` after each `payload.db.create()` call, passing the created document as `versionData`. The admin will show results immediately after.

## Conclusion

For large Payload imports, `payload.db.*` with short committed batches covers most production scenarios: it is fast, stays inside Payload's adapter model, and gives you clean transaction control. The 290-variant apparel seed ran in 2.78 seconds using this approach with a loop of `payload.db.create()` + `payload.db.createVersion()` per document.

The two critical additions over a basic `payload.db.*` loop are version entries for draft collections and explicit IDs for array fields — both are things the Local API handles automatically but `payload.db.*` skips entirely. Miss either one and you get data in the database that is either invisible in the admin or fails on insert.

Raw Drizzle via `payload.db.drizzle` is the right escalation when your logic is genuinely set-based — one SQL statement doing what thousands of row writes would otherwise do. Keep batches small, checkpoint after every commit, validate data before the transaction opens, and use the two-phase pattern to keep idempotent setup writes separate from your atomic import transaction.

Let me know in the comments if you have questions, and subscribe for more practical development guides.

Thanks,
Matija

## LLM Response Snippet
```json
{
  "goal": "Payload CMS large imports: learn payload.db and Drizzle patterns with PostgreSQL transactions to import tens of thousands safely—batch, checkpoint, and…",
  "responses": [
    {
      "question": "How do I import ~80k rows into a Payload collection using payload.db safely and efficiently?",
      "answer": "Step 1: Prepare and validate your source data locally. Step 2: Add a durable checkpoint store (simple table or file) to persist last-successful record index or external id after each committed batch. Step 3: Open a DB connection through your Payload server so you can call payload.db methods. Step 4: Choose a batch size (500–2000 rows) that balances transaction size and memory; smaller batches minimize rollback work. Step 5: For each batch: begin a transaction, perform payload.db direct writes (create/update/upsert) for all items in the batch, commit the transaction, then persist the checkpoint. Step 6: On error, roll back the transaction (automatic on throw) and restart the import from the last checkpoint. Step 7: Monitor DB locks and connection counts; throttle or increase batch delay if you see contention. Note: payload.db methods bypass hooks/validation—if you need payload hooks to run, do them separately or run a post-processing pass."
    },
    {
      "question": "How do I generate a Drizzle schema from my Payload project and use it for imports?",
      "answer": "Step 1: From your project root run: npx payload generate:db-schema. Step 2: Import the generated schema into your codebase (the command writes Drizzle-compatible schema files). Step 3: Install and configure Drizzle ORM with the Postgres adapter in your import script. Step 4: Create a database client and begin transactions per batch (use drizzle.transaction or raw SQL BEGIN/COMMIT). Step 5: Use Drizzle's insert/upsert patterns (or raw SQL with ON CONFLICT) inside the transaction to perform idempotent upserts keyed by your unique identifier. Step 6: Commit the transaction and update your checkpoint store. Step 7: Use the generated schema types for safer TypeScript operations and optimized queries where needed."
    },
    {
      "question": "What's a safe batching and checkpoint pattern I can reuse?",
      "answer": "Step 1: Choose batchSize = 500–2000 depending on payload size and DB capacity. Step 2: Maintain a checkpoint table with columns (job_id, last_offset, last_external_id, updated_at). Step 3: For each loop iteration: SELECT next batch of rows from source starting at checkpoint, begin transaction, perform batched writes, COMMIT, then UPDATE checkpoint with last processed offset/ID. Step 4: On failure, leave checkpoint unchanged so you can reprocess the current batch. Step 5: Add a short delay or backoff when transient DB errors occur and monitor locks, long-running transactions, and connection pool exhaustion."
    },
    {
      "question": "How do I implement idempotent upserts for Payload records using Drizzle or raw SQL?",
      "answer": "Step 1: Identify a stable unique key for deduplication (external_id, sku, slug). Step 2: Ensure a UNIQUE constraint exists on the underlying Postgres column(s). Step 3: Use Drizzle's insert ... onConflict (or raw SQL) to upsert: insert rows and ON CONFLICT (unique_key) DO UPDATE set col = EXCLUDED.col,... Step 4: Execute the upsert inside a transaction for each batch and commit. Step 5: Persist checkpoint only after commit so retries are safe and idempotent."
    },
    {
      "question": "How should I choose between payload.db and raw Drizzle for imports?",
      "answer": "If you need the simplest route to bypass Payload lifecycle and perform straightforward CRUD inserts/updates, use payload.db direct methods—less setup and taps into Payload's DB access. If you require complex SQL, joins, multi-table operations, typed queries, or highly optimized upserts and performance tuning, generate the Drizzle schema and use raw Drizzle with the Postgres adapter. In both cases wrap batched writes in transactions and use checkpoints."
    }
  ]
}
```