- Practical PostgreSQL Stress Test with pgbench & PgBouncer
Practical PostgreSQL Stress Test with pgbench & PgBouncer
Step-by-step benchmark guide using pgbench and PgBouncer to reproduce production-like workloads, expose latency, and…

📚 Get Practical Development Guides
Join developers getting comprehensive guides, code examples, optimization tips, and time-saving prompts to accelerate their development workflow.
How to Stress Test PostgreSQL with pgbench and PgBouncer Using Production-Like Load
Running a PostgreSQL benchmark is easy. Running one that actually tells you something useful about production behavior is a different job entirely.
This guide walks through stress testing PostgreSQL behind PgBouncer using pgbench, with a focus on realistic workload shapes rather than dramatic synthetic numbers. The approach uses four deliberate scenarios — mixed transactional load, read-heavy load, connection churn, and a deliberate ceiling test — and compares the results side by side to find where latency, queueing, and connection pressure start to dominate.
The goal is a benchmark that answers a specific operational question: how will this database path behave when real traffic, cache misses, and connection pressure show up at the same time?
Who this is for
This guide is written for engineers testing:
- PostgreSQL behind PgBouncer
- CMS or content-heavy applications
- read-heavy workloads with occasional write bursts
- systems where caching absorbs most steady-state database traffic
- teams trying to validate realistic load, not win synthetic benchmark numbers
Test setup and environment
The examples below come from exploratory benchmarking runs using pgbench 17.6 on macOS against a remote PostgreSQL 17.9 endpoint behind PgBouncer. The initial runs were 60 seconds each — long enough to compare workload shapes, not long enough to produce final benchmark claims.
For publishable numbers, rerun each scenario at least three times at 3–5 minutes each and report the median. Single 60-second runs are useful for learning. They are thin ground for strong operational conclusions.
If you publish a version of this benchmark, include these environment details near the top so readers can judge the results against their own setup:
- PostgreSQL and
pgbenchversions - Client machine, OS, and region
- Database region and approximate network RTT
- PgBouncer pooling mode
- Whether TPS is reported including or excluding connection establishment
- Whether the table shows single-run values or medians
Link to the official pgbench documentation and the PgBouncer documentation somewhere in this section.
Why most PostgreSQL benchmarks mislead people
I've seen teams run something like this, look at the output, and call it a performance audit:
pgbench -c 200 -j 8 -T 60
That command is easy to run. It is also easy to misread.
The result says very little unless production traffic genuinely looks like 200 constantly active clients hitting the database with that exact transaction profile and little help from caching. In most real applications, the bigger problem is unrealistic test design, not raw scale.
Benchmarking becomes useful when you ask the right questions before you start:
- How many users are truly active at the database level at once?
- Is the application read-heavy or write-heavy?
- Does caching absorb most requests before they reach the database?
- Does the app reuse connections, or does it churn through them?
- Are you testing direct PostgreSQL connections or pooled connections through PgBouncer?
- Are you measuring query cost, connection overhead, or built-in benchmark script contention?
Without those answers first, a benchmark is often just noise dressed up as data.
The four workload shapes worth testing first
A practical PostgreSQL benchmark should cover four scenarios in order. Each one exposes something different about the system.
1. Mixed transactional load
This is the default pgbench behavior — a simple mix of reads and writes under low concurrency. It is the right starting point for any benchmark because it establishes a stable baseline before you introduce pressure.
pgbench \ -h HOST \ -p PORT \ -U USER \ -d DBNAME \ -c 10 \ -j 2 \ -T 60 \ -P 5
Ten clients, two threads, 60-second duration, with a 5-second progress report. This is a reasonable approximation of modest production load for most small-to-medium applications.
2. Read-heavy load
For content apps, admin panels, API reads, and cached page misses, read-heavy traffic is often a more representative workload shape. The -S flag switches pgbench to select-only mode.
pgbench \ -h HOST \ -p PORT \ -U USER \ -d DBNAME \ -c 10 \ -j 2 \ -T 60 \ -S \ -P 5
This reflects best-case performance for warm-cache, read-heavy traffic. Run it as a ceiling measurement for what steady-state reads look like.
3. Connection churn
When PgBouncer is in the path, reconnect behavior matters. The -C flag forces a new connection for every transaction — this is not a normal production profile, but it is a useful stress scenario to expose pooling limits and connection overhead.
pgbench \ -h HOST \ -p PORT \ -U USER \ -d DBNAME \ -c 50 \ -j 4 \ -T 60 \ -C \ -P 5
Think of this as a probe for how PgBouncer holds up under connection pressure, not a simulation of typical app behavior.
4. Deliberate ceiling test
Only after the realistic runs should you push concurrency high enough to force queueing and saturation. That tells you where the system stops behaving comfortably and what degrades first.
This is where scale factor matters. For built-in pgbench workloads, the scale factor needs to be large enough for the client count you are testing. At scale factor 10 with 200 clients, results will reflect built-in script contention as much as actual pooling or connection behavior.
Benchmark results from these runs
The most useful part of this exercise was not the command syntax. It was what happened when all four workload shapes were compared side by side.
| Scenario | Clients | Threads | Key flags | TPS | Avg latency | Failures | What it showed |
|---|---|---|---|---|---|---|---|
| Mixed transactional load | 10 | 2 | default | 33.45 | 298 ms | 0 | Stable at modest concurrency |
| Mixed transactional load | 20 | 4 | default | 52.90 | 377 ms | 0 | Latency rose but stayed stable |
| Read-heavy load | 10 | 2 | -S | 220.78 | 45.25 ms | 0 | Very healthy warm-path performance |
| Reconnect-heavy upper-bound stress | 200 | 8 | -C | 23.26 | 8598 ms | 0 | Survived, but queueing and contention dominated |
Because the built-in pgbench workload was initialized at scale factor 10, the 200-client reconnect test should be treated as an upper-bound stress signal, not a clean measurement of PgBouncer pooling efficiency. At that client count, the result reflects built-in script contention as well as pooling behavior.
What these results actually mean
The mixed workload at 10 clients averaged 298 ms latency with zero failed transactions. At 20 clients, average latency climbed to 377 ms, and the run stayed stable throughout. That suggests a system that tolerates modest concurrency increases without obvious collapse, while also confirming that latency is not free as concurrency grows.
The read-heavy result was more reassuring. At 10 clients in select-only mode, the system sustained 220 TPS at 45 ms average latency with zero failures. For a content-heavy application with aggressive caching, this is often a more realistic steady-state signal than a write-mixed benchmark.
The reconnect-heavy result told a different story. At 200 clients with -C, throughput dropped sharply and latency moved into multi-second territory. Given the scale factor caveat above, this result is best read as a ceiling test — the path remained available, but it was spending significant time waiting, contending, and reconnecting.
The path stayed alive. It was no longer comfortable.
Why caching changes the interpretation
In many production applications, especially CMS-driven or content-heavy systems, the database does not serve every request directly. A strong cache layer changes the shape of database load dramatically.
With caching in place, the database tends to see:
- cache misses
- admin actions
- writes and updates
- background jobs
- cold-start bursts
- invalidation storms
This means the most important benchmark is often not the absolute maximum TPS under synthetic load. It is whether the database path stays responsive when the cache stops protecting it.
A read-heavy pgbench -S run is useful, but it should not be the only scenario. You also want to know what happens during a cold cache after deploy, a burst of invalidations, several users hitting the same uncached resource simultaneously, or concurrent admin activity alongside regular traffic.
A system that looks strong under warm-cache reads but stalls under cache misses is not actually production-ready.
What pgbench does not simulate
pgbench is a solid database-path benchmark tool, and it has real limits worth understanding.
It does not simulate ORM overhead, application-layer caching logic, background workers, long-lived transactions in application code, real user think time, mixed endpoint patterns across routes, serialization logic, framework middleware, or auth hooks and permissions.
That does not make pgbench less valuable. It means you should read it as a database-path benchmark, not a complete system benchmark. The gap between your pgbench results and real user experience is filled by everything that lives between the user and the database.
What to measure beyond TPS
TPS is a useful operational signal, and it is not sufficient on its own. Two systems can have similar throughput and behave very differently for real users.
For each run, capture at minimum:
- average latency
- p95 latency
- p99 latency
- failed transactions
- connection wait time
- CPU saturation
- disk I/O pressure
- lock waits
- cache hit ratio
- active versus waiting clients in PgBouncer
- active versus idle server connections in PgBouncer
If replication is involved, also watch replication lag during heavy write tests.
For PgBouncer specifically, these admin commands give you the pooling picture:
SHOW POOLS;
SHOW STATS;
SHOW CLIENTS;
SHOW SERVERS;
Rising latency that does not show up in database-side metrics usually points to pool exhaustion and waiting clients — not the database engine itself. These commands help you tell the difference.
A repeatable testing sequence for production-like load
A benchmarking routine does not need to be complicated. It needs to be disciplined and run in the right order.
Step 1: Baseline mixed workload
Start small and realistic.
pgbench \ -h HOST \ -p PORT \ -U USER \ -d DBNAME \ -c 10 \ -j 2 \ -T 60 \ -P 5
Step 2: Read-heavy workload
Test the warm path.
pgbench \ -h HOST \ -p PORT \ -U USER \ -d DBNAME \ -c 10 \ -j 2 \ -T 60 \ -S \ -P 5
Step 3: Modest concurrency increase
See whether latency climbs gently or sharply as you double client count.
pgbench \ -h HOST \ -p PORT \ -U USER \ -d DBNAME \ -c 20 \ -j 4 \ -T 60 \ -P 5
Step 4: Connection churn
Introduce reconnect overhead after the realistic scenarios are established.
pgbench \ -h HOST \ -p PORT \ -U USER \ -d DBNAME \ -c 50 \ -j 4 \ -T 60 \ -C \ -P 5
Step 5: Ceiling test
Push concurrency past your likely production range and study what degrades first. Make sure the scale factor is appropriate for the client count — otherwise contention artifacts dominate the result.
Step 6: Publishable reruns
Rerun each scenario at least three times at 3–5 minutes each and report the median. Exploratory 60-second runs are fine for learning. They are thin evidence for strong operational conclusions.
How to use the results to optimize
A benchmark matters only if it changes what you do next. Here is a more useful optimization loop than simply chasing a bigger TPS number.
If read-heavy performance is good but mixed workload latency is high, look at slow writes, row locking, index coverage, transaction size, and ORM query generation.
If reconnect-heavy tests collapse first, look at PgBouncer pool sizing, app-side connection pool size, how many application instances are running, and whether the application is reusing connections properly.
If warm-cache performance is good but cold starts hurt, look at cache warming strategy, invalidation design, stampede protection, and whether expensive endpoints can be precomputed or served from a background process.
If throughput looks acceptable but users still report slowness, look at p95 and p99 latency, specific slow queries, app-layer serialization and auth costs, and network path latency.
This is where benchmarking becomes engineering rather than theater.
Common PostgreSQL stress testing mistakes
The most common mistakes are straightforward to avoid once you have seen them:
- Testing with unrealistic concurrency before establishing a realistic baseline
- Trusting average latency without looking at tail latency
- Treating
-Sas a complete production simulation - Ignoring PgBouncer wait states when diagnosing rising latency
- Benchmarking a warm cache and assuming cold-cache behavior will match
- Using synthetic database traffic while forgetting the application layer adds its own costs
- Drawing universal conclusions from a single hardware profile
- Running high-client built-in
pgbenchtests with too small a scale factor
Each one makes the benchmark look more decisive than it really is.
FAQ
How many pgbench clients should I start with?
Start with the number that approximates truly concurrent database-active users, not total signed-in users or total monthly actives. For many small applications, that number is lower than expected. A good rule: instrument your production connection pool first and see what the real peak active connection count looks like.
Should I use -S for production testing?
Use it as one scenario, not the whole story. It is useful for measuring read-heavy, warm-cache behavior, but it does not represent write pressure or the many application-side costs that typically accompany real traffic.
Does PgBouncer make slow queries faster?
No. PgBouncer manages connection overhead. It does not fix inefficient queries, missing indexes, bad transaction design, or ORM overhead. A well-configured PgBouncer in front of a poorly optimized database is still a poorly optimized database.
What matters more, TPS or p95 latency?
For user experience, p95 and p99 latency are typically more meaningful than average TPS. A system with good average throughput can still feel broken if tail latency is bad enough. Monitor both, and weight latency more heavily when the question is about user impact.
How do I test cold cache versus warm cache behavior?
Run the same workload after clearing or bypassing your application cache, then compare it against a pre-warmed run. The gap between those two states is often more operationally important than the absolute TPS figure.
Final thoughts
A good PostgreSQL benchmark is not the one that produces the most dramatic graph. It is the one that tells you, with reasonable honesty, what will happen under production-like conditions.
In these exploratory runs, the most important finding was not that the system survived an extreme 200-client reconnect storm. It was that realistic concurrency stayed stable, read-heavy traffic was comfortably fast, and the first serious degradation appeared under connection pressure and high-concurrency contention — not under normal load.
That is a result you can actually use. It tells you where to invest optimization effort: connection management, cache-miss handling, query efficiency, and realistic concurrency planning. It keeps you from solving the wrong problem.
If you have questions about interpreting your own results or configuring PgBouncer for your setup, drop them in the comments below. And if you found this useful, subscribe for more practical PostgreSQL and infrastructure guides.
Thanks, Matija
Last tested: [Date] PostgreSQL version: 17.9 pgbench version: 17.6 PgBouncer: [Your PgBouncer version and pooling mode]
References:
Frequently Asked Questions
Comments
No comments yet
Be the first to share your thoughts on this post!