Benchmark PostgreSQL Performance Using BenchmarkSQL

By John Doe December 9, 2024

Summary: In this article, we will learn how to benchmark PostgreSQL performance using BenchmarkSQL.

Table of Contents

BenchmarkSQL

Introduction to BenchmarkSQL

BenchmarkSQL is implemented in Java, using JDBC to stress test SQL databases. The overall architecture is a series of data structures, queues and thread groups that handle the simulated terminals, users and application threads.

Its architecture allows BenchmarkSQL to drive TPC-C configurations up to many thousands of warehouses (known as the scaling factor) without overwhelming the job scheduler of the test driver itself. Yet it is capable of doing so without sacrificing one of the most important measurements in a TPC-C, the end-user experienced response time at the terminal.

Create a user and a database

As Unix user postgres use the psql shell to connect to the postgres database and issue the CREATE USER and CREATE DATABASE commands.

CREATE USER benchmarksql WITH ENCRYPTED PASSWORD 'changeme';
CREATE DATABASE benchmarksql OWNER benchmarksql;

Building BenchmarkSQL

BenchmarkSQL V6 is meant to be built into a Docker container and controlled via its Flask based WEB UI and/or API. This allows for easy deployment of the benchmark driver on servers and cloud systems while controlling it through a browser or scripted.

Requirements

The requirements to run BenchmarkSQL are:

  • Java development environment (java-1.8.0-openjdk-devel or newer).
  • Maven build tool for Java.
  • Docker and a user account authorized to use it. This depends on your OS. On RedHat based systems the usual way is to install Docker via sudo yum install -y docker and make the users, who are allowed to use, its members of the group docker by running the command sudo usermod -a -G docker <USERNAME>.

Building process

The Java development environment and Maven are required on the build machine because the Docker container will only have the Java runtime installed. So the BenchmarkSQL.jar file needs to be built outside the container.

After installing the above requirements and cloning the BenchmarkSQL git repository (assuming username wieck and cloned into ~/benchmarksql):

$ git clone https://github.com/wieck/benchmarksql.git
$ cd ~/benchmarksql
$ mvn

This will create a lot of output:

[INFO] Scanning for projects...
[INFO] 
[INFO] ------------------< com.github.pgsql-io:benchmarksql >------------------
[INFO] Building A TPC-C like test tool 6.0.0-SNAPSHOT
[INFO] --------------------------------[ jar ]---------------------------------
[INFO] 
[INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ benchmarksql ---
[INFO] Deleting /Users/wieck/git/benchmarksql-6/target
[INFO] 
[INFO] --- maven-resources-plugin:3.2.0:resources (default-resources) @ benchmarksql ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] Using 'UTF-8' encoding to copy filtered properties files.
[INFO] Copying 49 resources
[INFO] 
[INFO] --- maven-compiler-plugin:3.8.0:compile (default-compile) @ benchmarksql ---
[INFO] Changes detected - recompiling the module!
[INFO] Compiling 18 source files to /Users/wieck/git/benchmarksql-6/target/classes
[INFO] 
[INFO] --- maven-resources-plugin:3.2.0:testResources (default-testResources) @ benchmarksql ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] Using 'UTF-8' encoding to copy filtered properties files.
[INFO] skip non existing resourceDirectory /Users/wieck/git/benchmarksql-6/src/test/resources
[INFO] 
[INFO] --- maven-compiler-plugin:3.8.0:testCompile (default-testCompile) @ benchmarksql ---
[INFO] No sources to compile
[INFO] 
[INFO] --- maven-surefire-plugin:2.12.4:test (default-test) @ benchmarksql ---
[INFO] No tests to run.
[INFO] 
[INFO] --- maven-jar-plugin:2.4:jar (default-jar) @ benchmarksql ---
[INFO] Building jar: /Users/wieck/git/benchmarksql-6/target/BenchmarkSQL.jar
[INFO] 
[INFO] --- maven-dependency-plugin:3.0.0:copy-dependencies (copy-dependencies) @ benchmarksql ---
[INFO] Copying postgresql-42.2.19.jar to /Users/wieck/git/benchmarksql-6/target/lib/postgresql-42.2.19.jar
[INFO] Copying jcc-11.5.5.0.jar to /Users/wieck/git/benchmarksql-6/target/lib/jcc-11.5.5.0.jar
[INFO] Copying mysql-connector-java-8.0.23.jar to /Users/wieck/git/benchmarksql-6/target/lib/mysql-connector-java-8.0.23.jar
[INFO] Copying protobuf-java-3.11.4.jar to /Users/wieck/git/benchmarksql-6/target/lib/protobuf-java-3.11.4.jar
[INFO] Copying jaybird-4.0.3.java11.jar to /Users/wieck/git/benchmarksql-6/target/lib/jaybird-4.0.3.java11.jar
[INFO] Copying mssql-jdbc-9.2.1.jre8.jar to /Users/wieck/git/benchmarksql-6/target/lib/mssql-jdbc-9.2.1.jre8.jar
[INFO] Copying antlr4-runtime-4.7.2.jar to /Users/wieck/git/benchmarksql-6/target/lib/antlr4-runtime-4.7.2.jar
[INFO] Copying log4j-api-2.14.1.jar to /Users/wieck/git/benchmarksql-6/target/lib/log4j-api-2.14.1.jar
[INFO] Copying ojdbc8-21.1.0.0.jar to /Users/wieck/git/benchmarksql-6/target/lib/ojdbc8-21.1.0.0.jar
[INFO] Copying connector-api-1.5.jar to /Users/wieck/git/benchmarksql-6/target/lib/connector-api-1.5.jar
[INFO] Copying checker-qual-3.5.0.jar to /Users/wieck/git/benchmarksql-6/target/lib/checker-qual-3.5.0.jar
[INFO] Copying log4j-core-2.14.1.jar to /Users/wieck/git/benchmarksql-6/target/lib/log4j-core-2.14.1.jar
[INFO] Copying mariadb-java-client-2.7.2.jar to /Users/wieck/git/benchmarksql-6/target/lib/mariadb-java-client-2.7.2.jar
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  5.360 s
[INFO] Finished at: 2021-04-13T17:56:05-05:00
[INFO] ------------------------------------------------------------------------

Your “Total time” will most likely not be 5 seconds on the first run.

Expect it to run for a few minutes as the resulting Docker image is about 1.7GB in size and a lot of that will be pulled in over your Internet connection.

Create the benchmark configuration file

Under the target directory created by Maven, change to the run directory, copy the properties file of your RDBMS and edit the copy to match your system setup and desired scaling.

$ cd target/run
$ cp sample.RDBMS.properties my.properties
$ vi my.properties

Note that the provided example configuration is meant to test the functionality of your setup. BenchmarkSQL can connect to the database and execute transactions. That configuration is NOT a benchmark run. To make it into one you need to have a configuration that matches your database server size and workload. Leave the sizing for now and perform a first functional test.

The BenchmarkSQL database has an initial size of approximately 100MB per configured warehouse. A typical setup would be a database of 2-5 times the physical RAM of the server.

Likewise, the number of concurrent database connections (configuration parameter terminals) should be something about 2-6 times the number of CPU threads.

Last but not least, benchmark runs are normally done for hours, if not days. This is because on the database sizes above, it will take that long to reach a steady state and make sure that all performance relevant functionality of the database, like checkpointing and vacuuming is included in the measurement.

So, you can see that with a modern server, that has 32-256 CPU threads and 64-512GBi, of RAM we are talking about thousands of warehouses and hundreds of concurrent database connections.

Build the schema and initial database load

Execute the runDatabaseBuild.sh script with your configuration file.

$ ./runDatabaseBuild.sh my.properties
# ------------------------------------------------------------
# Loading SQL file ./sql.common/tableCreates.sql
# ------------------------------------------------------------
create table bmsql_config (
cfg_namevarchar(30) primary key,
cfg_value   varchar(50)
);
create table bmsql_warehouse (
w_id        integer   not null,
w_ytd       decimal(12,2),
[...]
Starting BenchmarkSQL LoadData

driver=org.postgresql.Driver
conn=jdbc:postgresql://localhost:5432/benchmarksql
user=benchmarksql
password=***********
warehouses=30
loadWorkers=10
fileLocation (not defined)
csvNullValue (not defined - using default 'NULL')

Worker 000: Loading ITEM
Worker 001: Loading Warehouse      1
Worker 002: Loading Warehouse      2
Worker 003: Loading Warehouse      3
[...]
Worker 000: Loading Warehouse     30 done
Worker 008: Loading Warehouse     29 done
# ------------------------------------------------------------
# Loading SQL file ./sql.common/indexCreates.sql
# ------------------------------------------------------------
alter table bmsql_warehouse add constraint bmsql_warehouse_pkey
primary key (w_id);
alter table bmsql_district add constraint bmsql_district_pkey
primary key (d_w_id, d_id);
[...]
vacuum analyze;

Run the configured benchmark

Once the tables have the necessary data, you can run the benchmark.

$ ./runBenchmark.sh my.properties

The benchmark should run for the number of configured concurrent connections (terminals) and the duration or number of transactions.

The end result of the benchmark will be reported like this:

01:58:09,081 [Thread-1] INFO   jTPCC : Term-00,
01:58:09,082 [Thread-1] INFO   jTPCC : Term-00, Measured tpmC (NewOrders) = 179.55
01:58:09,082 [Thread-1] INFO   jTPCC : Term-00, Measured tpmTOTAL = 329.17
01:58:09,082 [Thread-1] INFO   jTPCC : Term-00, Session Start     = 2016-05-25 01:58:07
01:58:09,082 [Thread-1] INFO   jTPCC : Term-00, Session End       = 2016-05-25 01:58:09
01:58:09,082 [Thread-1] INFO   jTPCC : Term-00, Transaction Count = 10

At this point you have a working setup.

Scale the benchmark configuration

Change the my.properties file to the correct scaling (number of warehouses and concurrent connections/terminals). Switch from using a transaction count to time based:

runTxnsPerTerminal=0
runMins=180

Rebuild the database (if needed) by running:

$ ./runDatabaseDestroy.sh my.properties
$ ./runDatabaseBuild.sh my.properties

Then run the benchmark again.

Rebuild and repeat.

Result report

BenchmarkSQL collects detailed performance statistics and (if configured) OS performance data. The example configuration file defaults to a directory starting with my_result_.

Use the generateReport.sh script to create an HTML file with graphs. This requires R to be installed, which is beyond the scope of this article.