How to Stress Test PostgreSQL with pgbench and PgBouncer Using Production-Like Load

Q: Should I use -S for production testing?

Use -S as one scenario to measure warm-cache read performance, but combine it with mixed, reconnect, and ceiling tests to model production-like behavior.

Q: Does PgBouncer make slow queries faster?

No—PgBouncer reduces connection overhead but does not fix inefficient queries, missing indexes, or bad transaction design.

Q: What matters more, TPS or p95 latency?

For user experience, p95/p99 latency is typically more meaningful than average TPS; monitor both but prioritize tail latency for UX impact.

Q: How do I test cold cache versus warm cache behavior?

Run the same workload after clearing or bypassing caches and compare results to a pre-warmed run to measure the operational impact of cache misses.

Running a PostgreSQL benchmark is easy. Running one that actually tells you something useful about production behavior is a different job entirely.

This guide walks through stress testing PostgreSQL behind PgBouncer using pgbench, with a focus on realistic workload shapes rather than dramatic synthetic numbers. The approach uses four deliberate scenarios — mixed transactional load, read-heavy load, connection churn, and a deliberate ceiling test — and compares the results side by side to find where latency, queueing, and connection pressure start to dominate.

The goal is a benchmark that answers a specific operational question: how will this database path behave when real traffic, cache misses, and connection pressure show up at the same time?

Who this is for

This guide is written for engineers testing:

PostgreSQL behind PgBouncer
CMS or content-heavy applications
read-heavy workloads with occasional write bursts
systems where caching absorbs most steady-state database traffic
teams trying to validate realistic load, not win synthetic benchmark numbers

Test setup and environment

The examples below come from exploratory benchmarking runs using pgbench 17.6 on macOS against a remote PostgreSQL 17.9 endpoint behind PgBouncer. The initial runs were 60 seconds each — long enough to compare workload shapes, not long enough to produce final benchmark claims.

For publishable numbers, rerun each scenario at least three times at 3–5 minutes each and report the median. Single 60-second runs are useful for learning. They are thin ground for strong operational conclusions.

If you publish a version of this benchmark, include these environment details near the top so readers can judge the results against their own setup:

PostgreSQL and pgbench versions
Client machine, OS, and region
Database region and approximate network RTT
PgBouncer pooling mode
Whether TPS is reported including or excluding connection establishment
Whether the table shows single-run values or medians

Link to the official pgbench documentation and the PgBouncer documentation somewhere in this section.

Why most PostgreSQL benchmarks mislead people

I've seen teams run something like this, look at the output, and call it a performance audit:

pgbench -c 200 -j 8 -T 60

That command is easy to run. It is also easy to misread.

The result says very little unless production traffic genuinely looks like 200 constantly active clients hitting the database with that exact transaction profile and little help from caching. In most real applications, the bigger problem is unrealistic test design, not raw scale.

Benchmarking becomes useful when you ask the right questions before you start:

How many users are truly active at the database level at once?
Is the application read-heavy or write-heavy?
Does caching absorb most requests before they reach the database?
Does the app reuse connections, or does it churn through them?
Are you testing direct PostgreSQL connections or pooled connections through PgBouncer?
Are you measuring query cost, connection overhead, or built-in benchmark script contention?

Without those answers first, a benchmark is often just noise dressed up as data.

The four workload shapes worth testing first

A practical PostgreSQL benchmark should cover four scenarios in order. Each one exposes something different about the system.

1. Mixed transactional load

This is the default pgbench behavior — a simple mix of reads and writes under low concurrency. It is the right starting point for any benchmark because it establishes a stable baseline before you introduce pressure.

pgbench \
  -h HOST \
  -p PORT \
  -U USER \
  -d DBNAME \
  -c 10 \
  -j 2 \
  -T 60 \
  -P 5

Ten clients, two threads, 60-second duration, with a 5-second progress report. This is a reasonable approximation of modest production load for most small-to-medium applications.

2. Read-heavy load

For content apps, admin panels, API reads, and cached page misses, read-heavy traffic is often a more representative workload shape. The -S flag switches pgbench to select-only mode.

pgbench \
  -h HOST \
  -p PORT \
  -U USER \
  -d DBNAME \
  -c 10 \
  -j 2 \
  -T 60 \
  -S \
  -P 5

This reflects best-case performance for warm-cache, read-heavy traffic. Run it as a ceiling measurement for what steady-state reads look like.

3. Connection churn

When PgBouncer is in the path, reconnect behavior matters. The -C flag forces a new connection for every transaction — this is not a normal production profile, but it is a useful stress scenario to expose pooling limits and connection overhead.

pgbench \
  -h HOST \
  -p PORT \
  -U USER \
  -d DBNAME \
  -c 50 \
  -j 4 \
  -T 60 \
  -C \
  -P 5

Think of this as a probe for how PgBouncer holds up under connection pressure, not a simulation of typical app behavior.

4. Deliberate ceiling test

Only after the realistic runs should you push concurrency high enough to force queueing and saturation. That tells you where the system stops behaving comfortably and what degrades first.

This is where scale factor matters. For built-in pgbench workloads, the scale factor needs to be large enough for the client count you are testing. At scale factor 10 with 200 clients, results will reflect built-in script contention as much as actual pooling or connection behavior.

Benchmark results from these runs

The most useful part of this exercise was not the command syntax. It was what happened when all four workload shapes were compared side by side.

Scenario	Clients	Threads	Key flags	TPS	Avg latency	What it showed
Mixed transactional load	10	2	default	33.45	298 ms	Stable at modest concurrency
Mixed transactional load	20	4	default	52.90	377 ms	Latency rose but stayed stable
Read-heavy load	10	2	`-S`	220.78	45.25 ms	Very healthy warm-path performance
Reconnect-heavy upper-bound stress	200	8	`-C`	23.26	8598 ms	Survived, but queueing and contention dominated

Because the built-in pgbench workload was initialized at scale factor 10, the 200-client reconnect test should be treated as an upper-bound stress signal, not a clean measurement of PgBouncer pooling efficiency. At that client count, the result reflects built-in script contention as well as pooling behavior.

What these results actually mean

The mixed workload at 10 clients averaged 298 ms latency with zero failed transactions. At 20 clients, average latency climbed to 377 ms, and the run stayed stable throughout. That suggests a system that tolerates modest concurrency increases without obvious collapse, while also confirming that latency is not free as concurrency grows.

The read-heavy result was more reassuring. At 10 clients in select-only mode, the system sustained 220 TPS at 45 ms average latency with zero failures. For a content-heavy application with aggressive caching, this is often a more realistic steady-state signal than a write-mixed benchmark.

The reconnect-heavy result told a different story. At 200 clients with -C, throughput dropped sharply and latency moved into multi-second territory. Given the scale factor caveat above, this result is best read as a ceiling test — the path remained available, but it was spending significant time waiting, contending, and reconnecting.

The path stayed alive. It was no longer comfortable.

Why caching changes the interpretation

In many production applications, especially CMS-driven or content-heavy systems, the database does not serve every request directly. A strong cache layer changes the shape of database load dramatically.

With caching in place, the database tends to see:

cache misses
admin actions
writes and updates
background jobs
cold-start bursts
invalidation storms

This means the most important benchmark is often not the absolute maximum TPS under synthetic load. It is whether the database path stays responsive when the cache stops protecting it.

A read-heavy pgbench -S run is useful, but it should not be the only scenario. You also want to know what happens during a cold cache after deploy, a burst of invalidations, several users hitting the same uncached resource simultaneously, or concurrent admin activity alongside regular traffic.

A system that looks strong under warm-cache reads but stalls under cache misses is not actually production-ready.

What pgbench does not simulate

pgbench is a solid database-path benchmark tool, and it has real limits worth understanding.

It does not simulate ORM overhead, application-layer caching logic, background workers, long-lived transactions in application code, real user think time, mixed endpoint patterns across routes, serialization logic, framework middleware, or auth hooks and permissions.

That does not make pgbench less valuable. It means you should read it as a database-path benchmark, not a complete system benchmark. The gap between your pgbench results and real user experience is filled by everything that lives between the user and the database.

What to measure beyond TPS

TPS is a useful operational signal, and it is not sufficient on its own. Two systems can have similar throughput and behave very differently for real users.

For each run, capture at minimum:

average latency
p95 latency
p99 latency
failed transactions
connection wait time
CPU saturation
disk I/O pressure
lock waits
cache hit ratio
active versus waiting clients in PgBouncer
active versus idle server connections in PgBouncer

If replication is involved, also watch replication lag during heavy write tests.

For PgBouncer specifically, these admin commands give you the pooling picture:

SHOW POOLS;
SHOW STATS;
SHOW CLIENTS;
SHOW SERVERS;

Rising latency that does not show up in database-side metrics usually points to pool exhaustion and waiting clients — not the database engine itself. These commands help you tell the difference.

A repeatable testing sequence for production-like load

A benchmarking routine does not need to be complicated. It needs to be disciplined and run in the right order.

Step 1: Baseline mixed workload

Start small and realistic.

pgbench \
  -h HOST \
  -p PORT \
  -U USER \
  -d DBNAME \
  -c 10 \
  -j 2 \
  -T 60 \
  -P 5

Step 2: Read-heavy workload

Test the warm path.

pgbench \
  -h HOST \
  -p PORT \
  -U USER \
  -d DBNAME \
  -c 10 \
  -j 2 \
  -T 60 \
  -S \
  -P 5

Step 3: Modest concurrency increase

See whether latency climbs gently or sharply as you double client count.

pgbench \
  -h HOST \
  -p PORT \
  -U USER \
  -d DBNAME \
  -c 20 \
  -j 4 \
  -T 60 \
  -P 5

Step 4: Connection churn

Introduce reconnect overhead after the realistic scenarios are established.

pgbench \
  -h HOST \
  -p PORT \
  -U USER \
  -d DBNAME \
  -c 50 \
  -j 4 \
  -T 60 \
  -C \
  -P 5

Step 5: Ceiling test

Push concurrency past your likely production range and study what degrades first. Make sure the scale factor is appropriate for the client count — otherwise contention artifacts dominate the result.

Step 6: Publishable reruns

Rerun each scenario at least three times at 3–5 minutes each and report the median. Exploratory 60-second runs are fine for learning. They are thin evidence for strong operational conclusions.

How to use the results to optimize

A benchmark matters only if it changes what you do next. Here is a more useful optimization loop than simply chasing a bigger TPS number.

If read-heavy performance is good but mixed workload latency is high, look at slow writes, row locking, index coverage, transaction size, and ORM query generation.

If reconnect-heavy tests collapse first, look at PgBouncer pool sizing, app-side connection pool size, how many application instances are running, and whether the application is reusing connections properly.

If warm-cache performance is good but cold starts hurt, look at cache warming strategy, invalidation design, stampede protection, and whether expensive endpoints can be precomputed or served from a background process.

If throughput looks acceptable but users still report slowness, look at p95 and p99 latency, specific slow queries, app-layer serialization and auth costs, and network path latency.

This is where benchmarking becomes engineering rather than theater.

Common PostgreSQL stress testing mistakes

The most common mistakes are straightforward to avoid once you have seen them:

Testing with unrealistic concurrency before establishing a realistic baseline
Trusting average latency without looking at tail latency
Treating -S as a complete production simulation
Ignoring PgBouncer wait states when diagnosing rising latency
Benchmarking a warm cache and assuming cold-cache behavior will match
Using synthetic database traffic while forgetting the application layer adds its own costs
Drawing universal conclusions from a single hardware profile
Running high-client built-in pgbench tests with too small a scale factor

Each one makes the benchmark look more decisive than it really is.

FAQ

How many pgbench clients should I start with?

Start with the number that approximates truly concurrent database-active users, not total signed-in users or total monthly actives. For many small applications, that number is lower than expected. A good rule: instrument your production connection pool first and see what the real peak active connection count looks like.

Should I use -S for production testing?

Use it as one scenario, not the whole story. It is useful for measuring read-heavy, warm-cache behavior, but it does not represent write pressure or the many application-side costs that typically accompany real traffic.

Does PgBouncer make slow queries faster?

No. PgBouncer manages connection overhead. It does not fix inefficient queries, missing indexes, bad transaction design, or ORM overhead. A well-configured PgBouncer in front of a poorly optimized database is still a poorly optimized database.

What matters more, TPS or p95 latency?

For user experience, p95 and p99 latency are typically more meaningful than average TPS. A system with good average throughput can still feel broken if tail latency is bad enough. Monitor both, and weight latency more heavily when the question is about user impact.

How do I test cold cache versus warm cache behavior?

Run the same workload after clearing or bypassing your application cache, then compare it against a pre-warmed run. The gap between those two states is often more operationally important than the absolute TPS figure.

Final thoughts

A good PostgreSQL benchmark is not the one that produces the most dramatic graph. It is the one that tells you, with reasonable honesty, what will happen under production-like conditions.

In these exploratory runs, the most important finding was not that the system survived an extreme 200-client reconnect storm. It was that realistic concurrency stayed stable, read-heavy traffic was comfortably fast, and the first serious degradation appeared under connection pressure and high-concurrency contention — not under normal load.

That is a result you can actually use. It tells you where to invest optimization effort: connection management, cache-miss handling, query efficiency, and realistic concurrency planning. It keeps you from solving the wrong problem.

If you have questions about interpreting your own results or configuring PgBouncer for your setup, drop them in the comments below. And if you found this useful, subscribe for more practical PostgreSQL and infrastructure guides.

Thanks, Matija

Last tested: [Date] PostgreSQL version: 17.9 pgbench version: 17.6 PgBouncer: [Your PgBouncer version and pooling mode]

References:

How to Stress Test PostgreSQL with pgbench and PgBouncer Using Production-Like Load

Running a PostgreSQL benchmark is easy. Running one that actually tells you something useful about production behavior is a different job entirely.

The goal is a benchmark that answers a specific operational question: how will this database path behave when real traffic, cache misses, and connection pressure show up at the same time?

Who this is for

This guide is written for engineers testing:

PostgreSQL behind PgBouncer
CMS or content-heavy applications
read-heavy workloads with occasional write bursts
systems where caching absorbs most steady-state database traffic
teams trying to validate realistic load, not win synthetic benchmark numbers

Test setup and environment

If you publish a version of this benchmark, include these environment details near the top so readers can judge the results against their own setup:

PostgreSQL and pgbench versions
Client machine, OS, and region
Database region and approximate network RTT
PgBouncer pooling mode
Whether TPS is reported including or excluding connection establishment
Whether the table shows single-run values or medians

Link to the official pgbench documentation and the PgBouncer documentation somewhere in this section.

Why most PostgreSQL benchmarks mislead people

I've seen teams run something like this, look at the output, and call it a performance audit:

pgbench -c 200 -j 8 -T 60

That command is easy to run. It is also easy to misread.

Benchmarking becomes useful when you ask the right questions before you start:

How many users are truly active at the database level at once?
Is the application read-heavy or write-heavy?
Does caching absorb most requests before they reach the database?
Does the app reuse connections, or does it churn through them?
Are you testing direct PostgreSQL connections or pooled connections through PgBouncer?
Are you measuring query cost, connection overhead, or built-in benchmark script contention?

Without those answers first, a benchmark is often just noise dressed up as data.

The four workload shapes worth testing first

A practical PostgreSQL benchmark should cover four scenarios in order. Each one exposes something different about the system.

1. Mixed transactional load

pgbench \
  -h HOST \
  -p PORT \
  -U USER \
  -d DBNAME \
  -c 10 \
  -j 2 \
  -T 60 \
  -P 5

Ten clients, two threads, 60-second duration, with a 5-second progress report. This is a reasonable approximation of modest production load for most small-to-medium applications.

2. Read-heavy load

For content apps, admin panels, API reads, and cached page misses, read-heavy traffic is often a more representative workload shape. The -S flag switches pgbench to select-only mode.

pgbench \
  -h HOST \
  -p PORT \
  -U USER \
  -d DBNAME \
  -c 10 \
  -j 2 \
  -T 60 \
  -S \
  -P 5

This reflects best-case performance for warm-cache, read-heavy traffic. Run it as a ceiling measurement for what steady-state reads look like.

3. Connection churn

pgbench \
  -h HOST \
  -p PORT \
  -U USER \
  -d DBNAME \
  -c 50 \
  -j 4 \
  -T 60 \
  -C \
  -P 5

Think of this as a probe for how PgBouncer holds up under connection pressure, not a simulation of typical app behavior.

4. Deliberate ceiling test

Only after the realistic runs should you push concurrency high enough to force queueing and saturation. That tells you where the system stops behaving comfortably and what degrades first.

Benchmark results from these runs

The most useful part of this exercise was not the command syntax. It was what happened when all four workload shapes were compared side by side.

Scenario	Clients	Threads	Key flags	TPS	Avg latency	What it showed
Mixed transactional load	10	2	default	33.45	298 ms	Stable at modest concurrency
Mixed transactional load	20	4	default	52.90	377 ms	Latency rose but stayed stable
Read-heavy load	10	2	`-S`	220.78	45.25 ms	Very healthy warm-path performance
Reconnect-heavy upper-bound stress	200	8	`-C`	23.26	8598 ms	Survived, but queueing and contention dominated

What these results actually mean

The path stayed alive. It was no longer comfortable.

Why caching changes the interpretation

With caching in place, the database tends to see:

cache misses
admin actions
writes and updates
background jobs
cold-start bursts
invalidation storms

This means the most important benchmark is often not the absolute maximum TPS under synthetic load. It is whether the database path stays responsive when the cache stops protecting it.

A system that looks strong under warm-cache reads but stalls under cache misses is not actually production-ready.

What pgbench does not simulate

pgbench is a solid database-path benchmark tool, and it has real limits worth understanding.

What to measure beyond TPS

TPS is a useful operational signal, and it is not sufficient on its own. Two systems can have similar throughput and behave very differently for real users.

For each run, capture at minimum:

average latency
p95 latency
p99 latency
failed transactions
connection wait time
CPU saturation
disk I/O pressure
lock waits
cache hit ratio
active versus waiting clients in PgBouncer
active versus idle server connections in PgBouncer

If replication is involved, also watch replication lag during heavy write tests.

For PgBouncer specifically, these admin commands give you the pooling picture:

SHOW POOLS;
SHOW STATS;
SHOW CLIENTS;
SHOW SERVERS;

Rising latency that does not show up in database-side metrics usually points to pool exhaustion and waiting clients — not the database engine itself. These commands help you tell the difference.

A repeatable testing sequence for production-like load

A benchmarking routine does not need to be complicated. It needs to be disciplined and run in the right order.

Step 1: Baseline mixed workload

Start small and realistic.

pgbench \
  -h HOST \
  -p PORT \
  -U USER \
  -d DBNAME \
  -c 10 \
  -j 2 \
  -T 60 \
  -P 5

Step 2: Read-heavy workload

Test the warm path.

pgbench \
  -h HOST \
  -p PORT \
  -U USER \
  -d DBNAME \
  -c 10 \
  -j 2 \
  -T 60 \
  -S \
  -P 5

Step 3: Modest concurrency increase

See whether latency climbs gently or sharply as you double client count.

pgbench \
  -h HOST \
  -p PORT \
  -U USER \
  -d DBNAME \
  -c 20 \
  -j 4 \
  -T 60 \
  -P 5

Step 4: Connection churn

Introduce reconnect overhead after the realistic scenarios are established.

pgbench \
  -h HOST \
  -p PORT \
  -U USER \
  -d DBNAME \
  -c 50 \
  -j 4 \
  -T 60 \
  -C \
  -P 5

Step 5: Ceiling test

Step 6: Publishable reruns

Rerun each scenario at least three times at 3–5 minutes each and report the median. Exploratory 60-second runs are fine for learning. They are thin evidence for strong operational conclusions.

How to use the results to optimize

A benchmark matters only if it changes what you do next. Here is a more useful optimization loop than simply chasing a bigger TPS number.

If read-heavy performance is good but mixed workload latency is high, look at slow writes, row locking, index coverage, transaction size, and ORM query generation.

If throughput looks acceptable but users still report slowness, look at p95 and p99 latency, specific slow queries, app-layer serialization and auth costs, and network path latency.

This is where benchmarking becomes engineering rather than theater.

Common PostgreSQL stress testing mistakes

The most common mistakes are straightforward to avoid once you have seen them:

Testing with unrealistic concurrency before establishing a realistic baseline
Trusting average latency without looking at tail latency
Treating -S as a complete production simulation
Ignoring PgBouncer wait states when diagnosing rising latency
Benchmarking a warm cache and assuming cold-cache behavior will match
Using synthetic database traffic while forgetting the application layer adds its own costs
Drawing universal conclusions from a single hardware profile
Running high-client built-in pgbench tests with too small a scale factor

Each one makes the benchmark look more decisive than it really is.

FAQ

How many pgbench clients should I start with?

Should I use -S for production testing?

Does PgBouncer make slow queries faster?

What matters more, TPS or p95 latency?

How do I test cold cache versus warm cache behavior?

Final thoughts

A good PostgreSQL benchmark is not the one that produces the most dramatic graph. It is the one that tells you, with reasonable honesty, what will happen under production-like conditions.

Thanks, Matija

Last tested: [Date] PostgreSQL version: 17.9 pgbench version: 17.6 PgBouncer: [Your PgBouncer version and pooling mode]

References:

📚 Get Practical Development Guides

How to Stress Test PostgreSQL with pgbench and PgBouncer Using Production-Like Load

Who this is for

Test setup and environment

Why most PostgreSQL benchmarks mislead people

The four workload shapes worth testing first

1. Mixed transactional load

2. Read-heavy load

3. Connection churn

4. Deliberate ceiling test

Benchmark results from these runs

What these results actually mean

Why caching changes the interpretation

What pgbench does not simulate

What to measure beyond TPS

A repeatable testing sequence for production-like load

Step 1: Baseline mixed workload

Step 2: Read-heavy workload

Step 3: Modest concurrency increase

Step 4: Connection churn

Step 5: Ceiling test

Step 6: Publishable reruns

How to use the results to optimize

Common PostgreSQL stress testing mistakes

FAQ

Final thoughts

Frequently Asked Questions

How many pgbench clients should I start with?

Should I use -S for production testing?

Does PgBouncer make slow queries faster?

What matters more, TPS or p95 latency?

How do I test cold cache versus warm cache behavior?

Comments

No comments yet

📚 Get Practical Development Guides

How to Stress Test PostgreSQL with pgbench and PgBouncer Using Production-Like Load

Who this is for

Test setup and environment

Why most PostgreSQL benchmarks mislead people

The four workload shapes worth testing first

1. Mixed transactional load

2. Read-heavy load

3. Connection churn

4. Deliberate ceiling test

Benchmark results from these runs

What these results actually mean

Why caching changes the interpretation

What pgbench does not simulate

What to measure beyond TPS

A repeatable testing sequence for production-like load

Step 1: Baseline mixed workload

Step 2: Read-heavy workload

Step 3: Modest concurrency increase

Step 4: Connection churn

Step 5: Ceiling test

Step 6: Publishable reruns

How to use the results to optimize

Common PostgreSQL stress testing mistakes

FAQ

Final thoughts

Frequently Asked Questions

How many pgbench clients should I start with?

Should I use -S for production testing?

Does PgBouncer make slow queries faster?

What matters more, TPS or p95 latency?

How do I test cold cache versus warm cache behavior?

Comments

No comments yet