Introduction
A materialized view is a view that stores results of SELECT query in the database, and this enables you to get the query results quickly comparing to executing the query every time. On the other hand, when a table underlying the view is updated, the data in the materialized view gets out of date, so you need to update the materialized view.
PostgreSQL provides REFRESH MATERIALIZED VIEW
command as a way to update materialized views . This command refreshes the materialized view by re-running the query.
However, this is not very efficient because it recalculates all the data. If only a small part of the table is updated and only a part of the materialized view can be updated, this must be more efficient than recalculation.
This method is called Incremental View Maintenance (IVM). However, this feature is not implemented in the current PostgreSQL. So we are developing this feature in PostgreSQL and proposing it to the development community.
In this article, I will explain the IVM feature under development.
Note that the feature descried in this article is about the “under development” function, and not available in the current PostgreSQL . If you have any comments or requests for this feature, please give us your feedback!
Features
Overview
The proposed IVM feature allows materialized views to be updated automatically and incrementally just after a underlying table is modified. Because this is an incremental update, it is faster than running a regular REFRESH
. And, of course, you don’t have to write trigger functions for your self.
Currently, the SQL queries used to define views are SELECT ... FROM ... WHERE ...
, joins (inner joins, outer joins, self-joins), some built-in aggregate functions (count, sum, avg, min, max), GROUP BY
clause, DISTINCT
clause, simple sub-queries in FROM
clause, simple CTEs ( WITH
clauses), and EXISTS
sub-queries in WHERE
clause, where “simple” means that a sub-query or CTE does not include aggregates, outer joins, or DISTINCT
. The view can contain multiple tuples with the same content (duplicate tuple support).
On the other hand, aggregates of other than the above (such as user-defined aggregate), complex subqueries or CTEs, window functions, ORDER BY
, LIMIT
, OFFSET
, set operations ( UNION
, EXCEPT
, INTERSECT
) are not supported.
Creating a View
Use CREATE INCREMENTAL MATERIALIZED VIEW
command to create a materialized view that is updated automatically and incrementally . This INCREMENTAL
is the new keyword added to CREATE MATERIALIZED VIEW
in this implementation.
You can also check the syntax with the psql
command.
postgres=# \h CREATE MATERIALIZED VIEW
Command: CREATE MATERIALIZED VIEW
Description: define a new materialized view
Syntax:
CREATE [ INCREMENTAL ] MATERIALIZED VIEW [ IF NOT EXISTS ] table_name
[ (column_name [, ...] )
[ USING method ]
[ WITH ( storage_parameter [= value] [, ... ] ) ]
[ TABLESPACE tablespace_name ]
AS query
[ WITH [ NO ] DATA ]
For example, we can create a materialized view using tables of pgbench as follows. Here, the scale factor is 100.
test=# CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm AS
SELECT a.aid, b.bid, abalance, bbalance
FROM pgbench_accounts a JOIN pgbench_branches b ON a.bid = b.bid:
NOTICE: created index "mv_ivm_index" on materialized view "mv_ivm"
SELECT 10000000
This materialized view was created with IVM support because INCREMENTAL
option was specified. In addition, an index was automatically created on this materialized view that is necessary for efficient IVM. These can be confirmed by \d
meta command of psql
command. We can see in the result “Incremental view maintenance: yes” and an unique index defined on aid
and bid
, which are primary key columns of the base tables.
test=# \d+ mv_ivm
Materialized view "public.mv_ivm"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Descrip
tion
----------+---------+-----------+----------+---------+---------+-------------+--------------+--------
-----
aid | integer | | | | plain | | |
bid | integer | | | | plain | | |
abalance | integer | | | | plain | | |
bbalance | integer | | | | plain | | |
Indexes:
"mv_ivm_index" UNIQUE, btree (aid, bid)
View definition:
SELECT a.aid,
b.bid,
a.abalance,
b.bbalance
FROM pgbench_accounts a
JOIN pgbench_branches b ON a.bid = b.bid;
Access method: heap
Incremental view maintenance: yes
Updating a View Incrementally
Let’s see the effect of IVM. First, when I executed REFRESH
command on this view, it took about 33 seconds.
test=# REFRESH MATERIALIZED VIEW mv_ivm ;
REFRESH MATERIALIZED VIEW
Time: 32863.279 ms (00:32.863)
On the other hand, when I updated a row in pgbench_accounts
, it took about 16ms, which was more than 1000 times faster than the REFRESH
command. In addition, of course, the contents of the view was updated correctly.
test=# SELECT * FROM mv_ivm WHERE aid = 1;
aid | bid | abalance | bbalance
-----+-----+----------+----------
1 | 1 | 100 | 1000
(1 row)
Time: 1.583 ms
test=# UPDATE pgbench_accounts SET abalance = 11111 WHERE aid = 1;
UPDATE 1
Time: 16.202 ms
test=# SELECT * FROM mv_ivm WHERE aid = 1;
aid | bid | abalance | bbalance
-----+-----+----------+----------
1 | 1 | 11111 | 1000
(1 row)
Time: 1.465 ms
Aggregate View
Next, let’s create a materialized view on a aggregate query . Here, pgbench
's scale factor is 1000.
test2=# CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm2 AS
SELECT bid, count(abalance), sum(abalance), avg(abalance)
FROM pgbench_accounts GROUP BY bid;
NOTICE: created index "mv_ivm2_index" on materialized view "mv_ivm2"
SELECT 1000
It took more than 40 seconds for REFRESH
of this view .
test2=# REFRESH MATERIALIZED VIEW mv_ivm2;
REFRESH MATERIALIZED VIEW
Time: 42427.666 ms (00:42.428)
On the other hand, when I updated a row in the table, the view was automatically updated due to the IVM function. It took about 18ms, which is more than 2000 times faster than REFRESH
.
test2=# SELECT * FROM mv_ivm2 WHERE bid = 1;
bid | count | sum | avg
-----+--------+------+------------------------
1 | 100000 | 1000 | 0.01000000000000000000
(1 row)
Time: 2.199 ms
test2=# UPDATE pgbench_accounts SET abalance = abalance + 5000 WHERE aid = 1;
UPDATE 1
Time: 17.951 ms
test2=# SELECT * FROM mv_ivm2 WHERE bid = 1;
bid | count | sum | avg
-----+--------+------+------------------------
1 | 100000 | 6000 | 0.06000000000000000000
(1 row)
Time: 2.152 ms
Conclusion
In this article, I showed that the proposed IVM feature enables materialized views to be updated automatically and more rapidly than the REFRESH
command. In the next post, I will explain the performance of IVM in more detailed.
Nice work! Looking forward to see this merged to PostgreSQL core. I hope you can get rid of the locking issues somehow. I started working on a general purpose version of IVM a while ago that ought to be independent of the database. The PoC implementation idea of it is to generate DDL and triggers that do the IVM. The analysis parts are implemented with Apache Calcite which offers many nice out of the box utilities. I didn't do much in that area recently though as I am working on other projects as well. In case you are interested in that approach as well, I would love to work together on that: https://github.com/Blazebit/blaze-ivm
ReplyDeleteLooking forward to read your next post!
> I hope you can get rid of the locking issues somehow.
DeleteI also hope so, and I am trying to avoid the strong locking if possible.
>I started working on a general purpose version of IVM a while ago that ought to be independent of the database. The PoC implementation idea of it is to generate DDL and triggers that do the IVM. The analysis parts are implemented with Apache Calcite which offers many nice out of the box utilities. I didn't do much in that area recently though as I am working on other projects as well. In case you are interested in that approach as well, I would love to work together on that: https://github.com/Blazebit/blaze-ivm
Thank you for your information. I think implementing IVM on PostgreSQL as a built-in feature would be beneficial for PostgreSQL users, but I am interested in your project, too. I'll look into your github.
I appreciate the effort. IMHO I'd prefer to see the Sql Server approach where a Materialized View must be expressible as an index, and the engine monitors the index and updates the Materialized View automatically whenever a change to the index is detected.
ReplyDeleteI am not familiar with the Sql Server approach, so I would like learn it. Thanks!
DeleteWe are moving from Oracle to PG and this is one of the pains of transition. In Oracle we had FAST Refresh of MV that was based on 'MATERIALIZED VIEW LOG'. In PG we have FULL refresh, so the idea of using MV to generate BI/BA reports (not complex) is worthless. Now we are back to ugly triggers, due to lack if any alternatives in PG.
DeleteWe're also holding on to MS SQL Server because of the indexed (materialised) views. Having this in PG would mean there's nothing left in MS SQL that we can't do in PG, and this would open the way to migrate our enterprise system to PG.
ReplyDeletePS: Thank you for all your work on this feature!
ReplyDeleteThank you for all your work indeed! In my organization, we are creating materialized view selecting from foreign tables (using a foreign data wrapper). Do you think your patch will work on those views ?
ReplyDeleteI also would like to support foreign tables in IVM, but for this purpose we need a way to extract changes that occurred on a foreign table. I don't have the good idea for now. I'll investigate it more any way for the future support of foreign tables.
Delete