Introduction
A materialized view is a view that stores results of SELECT query in the database, and this enables you to get the query results quickly comparing to executing the query every time. On the other hand, when a table underlying the view is updated, the data in the materialized view gets out of date, so you need to update the materialized view.
PostgreSQL provides REFRESH MATERIALIZED VIEW
command as a way to update materialized views . This command refreshes the materialized view by re-running the query.
However, this is not very efficient because it recalculates all the data. If only a small part of the table is updated and only a part of the materialized view can be updated, this must be more efficient than recalculation.
This method is called Incremental View Maintenance (IVM). However, this feature is not implemented in the current PostgreSQL. So we are developing this feature in PostgreSQL and proposing it to the development community.
In this article, I will explain the IVM feature under development.
Note that the feature descried in this article is about the “under development” function, and not available in the current PostgreSQL . If you have any comments or requests for this feature, please give us your feedback!
Features
Overview
The proposed IVM feature allows materialized views to be updated automatically and incrementally just after a underlying table is modified. Because this is an incremental update, it is faster than running a regular REFRESH
. And, of course, you don’t have to write trigger functions for your self.
Currently, the SQL queries used to define views are SELECT ... FROM ... WHERE ...
, joins (inner joins, outer joins, self-joins), some built-in aggregate functions (count, sum, avg, min, max), GROUP BY
clause, DISTINCT
clause, simple sub-queries in FROM
clause, simple CTEs ( WITH
clauses), and EXISTS
sub-queries in WHERE
clause, where “simple” means that a sub-query or CTE does not include aggregates, outer joins, or DISTINCT
. The view can contain multiple tuples with the same content (duplicate tuple support).
On the other hand, aggregates of other than the above (such as user-defined aggregate), complex subqueries or CTEs, window functions, ORDER BY
, LIMIT
, OFFSET
, set operations ( UNION
, EXCEPT
, INTERSECT
) are not supported.
Creating a View
Use CREATE INCREMENTAL MATERIALIZED VIEW
command to create a materialized view that is updated automatically and incrementally . This INCREMENTAL
is the new keyword added to CREATE MATERIALIZED VIEW
in this implementation.
You can also check the syntax with the psql
command.
postgres=# \h CREATE MATERIALIZED VIEW
Command: CREATE MATERIALIZED VIEW
Description: define a new materialized view
Syntax:
CREATE [ INCREMENTAL ] MATERIALIZED VIEW [ IF NOT EXISTS ] table_name
[ (column_name [, ...] )
[ USING method ]
[ WITH ( storage_parameter [= value] [, ... ] ) ]
[ TABLESPACE tablespace_name ]
AS query
[ WITH [ NO ] DATA ]
For example, we can create a materialized view using tables of pgbench as follows. Here, the scale factor is 100.
test=# CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm AS
SELECT a.aid, b.bid, abalance, bbalance
FROM pgbench_accounts a JOIN pgbench_branches b ON a.bid = b.bid:
NOTICE: created index "mv_ivm_index" on materialized view "mv_ivm"
SELECT 10000000
This materialized view was created with IVM support because INCREMENTAL
option was specified. In addition, an index was automatically created on this materialized view that is necessary for efficient IVM. These can be confirmed by \d
meta command of psql
command. We can see in the result “Incremental view maintenance: yes” and an unique index defined on aid
and bid
, which are primary key columns of the base tables.
test=# \d+ mv_ivm
Materialized view "public.mv_ivm"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Descrip
tion
----------+---------+-----------+----------+---------+---------+-------------+--------------+--------
-----
aid | integer | | | | plain | | |
bid | integer | | | | plain | | |
abalance | integer | | | | plain | | |
bbalance | integer | | | | plain | | |
Indexes:
"mv_ivm_index" UNIQUE, btree (aid, bid)
View definition:
SELECT a.aid,
b.bid,
a.abalance,
b.bbalance
FROM pgbench_accounts a
JOIN pgbench_branches b ON a.bid = b.bid;
Access method: heap
Incremental view maintenance: yes
Updating a View Incrementally
Let’s see the effect of IVM. First, when I executed REFRESH
command on this view, it took about 33 seconds.
test=# REFRESH MATERIALIZED VIEW mv_ivm ;
REFRESH MATERIALIZED VIEW
Time: 32863.279 ms (00:32.863)
On the other hand, when I updated a row in pgbench_accounts
, it took about 16ms, which was more than 1000 times faster than the REFRESH
command. In addition, of course, the contents of the view was updated correctly.
test=# SELECT * FROM mv_ivm WHERE aid = 1;
aid | bid | abalance | bbalance
-----+-----+----------+----------
1 | 1 | 100 | 1000
(1 row)
Time: 1.583 ms
test=# UPDATE pgbench_accounts SET abalance = 11111 WHERE aid = 1;
UPDATE 1
Time: 16.202 ms
test=# SELECT * FROM mv_ivm WHERE aid = 1;
aid | bid | abalance | bbalance
-----+-----+----------+----------
1 | 1 | 11111 | 1000
(1 row)
Time: 1.465 ms
Aggregate View
Next, let’s create a materialized view on a aggregate query . Here, pgbench
's scale factor is 1000.
test2=# CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm2 AS
SELECT bid, count(abalance), sum(abalance), avg(abalance)
FROM pgbench_accounts GROUP BY bid;
NOTICE: created index "mv_ivm2_index" on materialized view "mv_ivm2"
SELECT 1000
It took more than 40 seconds for REFRESH
of this view .
test2=# REFRESH MATERIALIZED VIEW mv_ivm2;
REFRESH MATERIALIZED VIEW
Time: 42427.666 ms (00:42.428)
On the other hand, when I updated a row in the table, the view was automatically updated due to the IVM function. It took about 18ms, which is more than 2000 times faster than REFRESH
.
test2=# SELECT * FROM mv_ivm2 WHERE bid = 1;
bid | count | sum | avg
-----+--------+------+------------------------
1 | 100000 | 1000 | 0.01000000000000000000
(1 row)
Time: 2.199 ms
test2=# UPDATE pgbench_accounts SET abalance = abalance + 5000 WHERE aid = 1;
UPDATE 1
Time: 17.951 ms
test2=# SELECT * FROM mv_ivm2 WHERE bid = 1;
bid | count | sum | avg
-----+--------+------+------------------------
1 | 100000 | 6000 | 0.06000000000000000000
(1 row)
Time: 2.152 ms
Conclusion
In this article, I showed that the proposed IVM feature enables materialized views to be updated automatically and more rapidly than the REFRESH
command. In the next post, I will explain the performance of IVM in more detailed.