PostgreSQL Tutorial: JSON Index

June 3, 2024

Summary: In this tutorial, you will learn how to create a PostgreSQL JSON index for a JSONB column to improve query performance.

Table of Contents

Introduction to PostgreSQL JSON index

JSONB (binary JSON) is a data type that allows you to store JSON data and query it efficiently.

When a JSONB column has a complex JSON structure, utilizing an index can significantly improve query performance.

PostgreSQL uses the GIN index type for indexing a column with JSONB data type. GIN stands for Generalized Inverted Index.

Note that you can utilize the GIN index for tsvector or array columns.

To create a GIN index for a JSONB column, you can use the following CREATE INDEX statement:

CREATE INDEX index_name
ON table_name 
USING GIN(jsonb_column);

This statement creates a GIN index on the jsonb_column. This GIN index is suitable for general-purpose queries on JSONB data.

When creating a GIN index on a JSONB column, you can use a specific GIN operator class.

The operator class determines how PostgreSQL builds the index and how it optimizes the queries on the indexed column.

For example, The following CREATE INDEX statement creates a GIN index on the jsonb_coumn with jsonb_path_ops operator class:

CREATE INDEX index_name
ON table_name 
USING GIN(jsonb_column jsonb_path_ops);

This index is optimized for the queries that use the @> (contains), ? (exists), and @@ JSONB operators. It can be useful for searches involving keys or values within JSONB documents.

The following table displays the GIN operator classes:

Name Indexable Operators
array_ops && (anyarray,anyarray)
@> (anyarray,anyarray)
<@ (anyarray,anyarray)
= (anyarray,anyarray)
jsonb_ops @> (jsonb,jsonb)
@? (jsonb,jsonpath)
@@ (jsonb,jsonpath)
? (jsonb,text)
?| (jsonb,text[])
?& (jsonb,text[])
jsonb_path_ops @> (jsonb,jsonb)
@? (jsonb,jsonpath)
@@ (jsonb,jsonpath)
tsvector_ops @@ (tsvector,tsquery)
@@@ (tsvector,tsquery)

Note that if you don’t explicitly specify a GIN operator class, the statement will use the jsonb_ops operator by default, which is suitable for most cases.

Additionally, PostgreSQL allows you to create a GIN index for a specific field in JSON documents as follows:

CREATE INDEX index_name 
ON table_name 
USING GIN ((data->'field_name') jsonb_path_ops);

This index can improve the queries that involve searching values within the field_name of JSON documents stored in the JSONB column (data).

PostgreSQL JSON index examples

We’ll use the tables in the sample database.

1) Setting up a sample table

First, create a new table called customer_json that stores the customer information in JSON format:

CREATE TABLE customer_json(
   id SERIAL PRIMARY KEY,
   data JSONB NOT NULL
);

Second, insert data from the customer, address, city, and country tables into the customer_json table:

WITH json_cte AS(
  SELECT 
    jsonb_build_object(
      'first_name',  first_name, 
      'last_name',  last_name, 
      'email',  email, 
      'phone',  a.phone, 
      'address', 
      jsonb_build_object(
        'address', a.address, 
        'city', i.city, 
        'postal_code', a.postal_code, 
        'district',  a.district, 
        'country', o.country
      )
    ):: jsonb AS data 
  FROM 
    customer c 
    INNER JOIN address a ON a.address_id = c.address_id 
    INNER JOIN city i ON i.city_id = a.city_id 
    INNER JOIN country o ON o.country_id = i.country_id
) 
INSERT INTO customer_json(data) 
SELECT data FROM json_cte;

Third, retrieve the email of the customer whose first name is John:

SELECT
   data ->> 'first_name' first_name,
   data ->> 'last_name' last_name,
   data ->> 'phone' phone
FROM
   customer_json
WHERE
   data @> '{"first_name": "John"}'; 

Output:

 first_name | last_name  |    phone
------------+------------+-------------
 John       | Farnsworth | 51917807050
(1 row)

Finally, explain and analyze the above query:

EXPLAIN ANALYZE  
SELECT
   data ->> 'first_name' first_name,
   data ->> 'last_name' last_name,
   data ->> 'phone' phone
FROM
   customer_json
WHERE
   data @> '{"first_name": "John"}'; 

Output:

                                               QUERY PLAN
---------------------------------------------------------------------------------------------------------
 Seq Scan on customer_json  (cost=0.00..31.50 rows=1 width=96) (actual time=0.063..0.118 rows=1 loops=1)
   Filter: (data @> '{"first_name": "John"}'::jsonb)
   Rows Removed by Filter: 598
 Planning Time: 1.109 ms
 Execution Time: 0.128 ms
(5 rows)

The output indicates that PostgreSQL has to scan the entire customer_json table to search for the customer.

To improve the performance of the query, you can create a GIN index on the data column of the customer_json table.

2) Creating an index on the JSONB column

First, create an index on the data column of the customer_json table:

CREATE INDEX customer_json_index 
ON customer_json 
USING GIN(data);

Second, execute the query that searches for the customer whose first name is John:

EXPLAIN ANALYZE  
SELECT
   data ->> 'first_name' first_name,
   data ->> 'last_name' last_name,
   data ->> 'phone' phone
FROM
   customer_json
WHERE
   data @> '{"first_name": "John"}'; 

Output:

                                                         QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on customer_json  (cost=21.51..25.53 rows=1 width=96) (actual time=0.024..0.024 rows=1 loops=1)
   Recheck Cond: (data @> '{"first_name": "John"}'::jsonb)
   Heap Blocks: exact=1
   ->  Bitmap Index Scan on customer_json_index  (cost=0.00..21.51 rows=1 width=0) (actual time=0.014..0.014 rows=1 loops=1)
         Index Cond: (data @> '{"first_name": "John"}'::jsonb)
 Planning Time: 0.164 ms
 Execution Time: 0.045 ms
(7 rows)

The query plan indicates that PostgreSQL uses the customer_json_index to improve the performance.

This time, the execution time is significantly smaller 0.045ms vs. 0.128 ms, about 2 – 3 times faster than a query without using the GIN index.

3) Creating an index on the JSONB column with the GIN operator class

First, drop the customer_json_index index:

DROP INDEX customer_json_index;

Second, create a GIN index on the data column of the customer_json table with a GIN operator class:

CREATE INDEX customer_json_index 
ON customer_json 
USING GIN(data jsonb_path_ops);

Third, explain the query that finds the customer whose first name is John:

EXPLAIN ANALYZE  
SELECT
   data ->> 'first_name' first_name,
   data ->> 'last_name' last_name,
   data ->> 'phone' phone
FROM
   customer_json
WHERE
   data @> '{"first_name": "John"}'; 

Output:

                                                         QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on customer_json  (cost=12.82..16.84 rows=1 width=96) (actual time=0.014..0.015 rows=1 loops=1)
   Recheck Cond: (data @> '{"first_name": "John"}'::jsonb)
   Heap Blocks: exact=1
   ->  Bitmap Index Scan on customer_json_index  (cost=0.00..12.82 rows=1 width=0) (actual time=0.008..0.008 rows=1 loops=1)
         Index Cond: (data @> '{"first_name": "John"}'::jsonb)
 Planning Time: 0.120 ms
 Execution Time: 0.034 ms
(7 rows)

The query plan shows that the query does use the customer_json_index for improved performance.

Finally, explain the query that searches for the customer where the value in the first_name field within the data column is John:

EXPLAIN ANALYZE 
SELECT * FROM customer_json
WHERE data->>'first_name' = 'John';

Output:

                                                QUERY PLAN
----------------------------------------------------------------------------------------------------------
 Seq Scan on customer_json  (cost=0.00..32.98 rows=3 width=275) (actual time=0.161..0.284 rows=1 loops=1)
   Filter: ((data ->> 'first_name'::text) = 'John'::text)
   Rows Removed by Filter: 598
 Planning Time: 0.085 ms
 Execution Time: 0.298 ms
(5 rows)

In this plan, the query cannot fully utilize the GIN index customer_json_index. The reason is that the query does not use the JSONB operator (@, @?, @@) that the jsonb_path_ops operator class is optimized for.

4) Creating an index on a specific field of a JSONB column

First, drop the customer_json_index index:

DROP INDEX customer_json_index;

Second, create a GIN index on the first_name field of the customer_json table using the GIN operator class:

CREATE INDEX customer_json_index 
ON customer_json 
USING GIN((data->'first_name'));

Third, explain the query that finds the rows where the “first_name” field in the data JSONB column contains the value "John":

EXPLAIN ANALYZE  
SELECT
   data ->> 'first_name' first_name,
   data ->> 'last_name' last_name,
   data ->> 'phone' phone
FROM
   customer_json
WHERE
   data->'first_name' @> '"John"';

Output:

                                                         QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on customer_json  (cost=8.58..23.72 rows=6 width=96) (actual time=0.031..0.032 rows=1 loops=1)
   Recheck Cond: ((data -> 'first_name'::text) @> '"John"'::jsonb)
   Heap Blocks: exact=1
   ->  Bitmap Index Scan on customer_json_index  (cost=0.00..8.58 rows=6 width=0) (actual time=0.015..0.015 rows=1 loops=1)
         Index Cond: ((data -> 'first_name'::text) @> '"John"'::jsonb)
 Planning Time: 0.167 ms
 Execution Time: 0.133 ms
(7 rows)

The output indicates that the query uses the customer_json_index index.

Summary

Use the GIN index to create an index for a JSONB column of a table to improve query performance.